CSE 344 section 07 -- Some transaction-related topics
=====================================================

SQL TRANSACTIONS IN JDBC - THE BASICS

For homework 5, you will be using SQL transactions from Java code.
You've seen how to start and finish transactions from PostgreSQL:
  
  BEGIN TRANSACTION [ISOLATION LEVEL blah] starts a transaction.
  COMMIT completes a transaction.
  ROLLBACK forces Postgres to undo all transaction changes and 
    abort the transaction.

One way to do database transactions from a Java program is just to
run the SQL code for transactions from Java.  
Below is an example of this approach (see Section7.java for the 
complete code):

  static void sqlTxCode() throws SQLException {
    Statement stmt = conn.createStatement();
    stmt.executeUpdate("BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE");
    
    // [code that runs SQL queries agains the database...]
    
    stmt.executeUpdate("COMMIT");
  }

In homework 5 we've made this slightly easier for you, by adding 
PreparedStatements (commented-out) to the starter code for
BEGIN TRAN, COMMIT, and ROLLBACK.  You may need to use SQL transactions
for any application "transaction" that reads or writes your customer
database, as other instances of your application may be doing the same
thing.  (Emphasis on "may".  It is up to you to decide where DBMS transactions
are needed, and where they are not.)  

You may want to mark some transactions as "read-only" ones, to tell Postgres
that they will never *cause* conflicts with other transactions (though
they CAN still conflict with other transactions, that do do writes).
To add this hint, you can say:

  BEGIN TRANSACTION READ ONLY [ISOLATION LEVEL blah]...

You can add the SQL code for these as well to Query.java.

You can also use another approach, that uses JDBC method calls instead
of directly executing SQL strings.

  static void javaTxCode() throws SQLException {
    conn.setAutoCommit(false);
    
    // [code that runs SQL queries agains the database...]
    
    conn.commit();
    conn.setAutoCommit(true);
  }

The main difference between this style and the one with executing 
the transactional SQL strings is that here, we force all code to be
in a transaction following the `conn.setAutoCommit(false)' call.
Each commit() or rollback() thus implicitly starts a new transaction.
This is especially useful if you think you are going to forget
to use transactions --- BUT, it may also induce you to the opposite
problem, overuse of transactions.  Be cautious in either style.

================================
FORCING TRANSACTION INTERLEAVING

It's easy to force Postgres to interleave statements from two concurrent
transactions when you're entering the SQL statements directly into
the `psql' command line.  But this won't work when the SQL is run mechanically
from another program, because all but the largest transactions finish in a 
fraction of a second after they start - making it hard for you to get 
two transactions to even run concurrently.

To get around this problem, we suggest you do (and expect you to do, as 
we will do it too) the following: Add a pause in the middle of the
two transactions you are interested in, by reading a line from the 
standard input (you can just throw away the input line, what matters
is that the transaction is blocked in the Java program waiting for you
to hit Enter).  Here is a cut-and-paste ready code snippet to do this:

  Scanner console = new Scanner (System.in);
  console.nextLine();

Now, you can start the two concurrent transactions in two instances
of your program, let one or both halt at the pause, and then let them
proceed in the order that you want.  You can even have multiple pauses
if you need to allow for more complex interleavings.

=================================
WHAT TO DO IF A TRANSACTION FAILS

Here is a simplified and modified version of the 2-transaction schedule 
for Postgres you saw in lecture yesterday:

T1:
   begin transaction isolation level serializable;
   select * from r;
   update r set b=11 where a=1;

T2:
   begin transaction isolation level serializable;
   select * from r;
   update r set b=12 where a=1;

T1:
  commit;

T2:
  commit;

Notice here that the two transactions execute nearly identical SQL statements;
in particular, both write to rows of relation R with a=1, causing a conflict
between them.  If both transactions are started and their code is interleaved
according to the above schedule, one of the transactions will be aborted
due to the conflict.

Here is Java code for both of the two transactions, in 
Section7Conflicting.java.  The updateValue field varies
for which transaction it is, per Section7_T1.java and Section7_T2.java :

  void badTransaction() throws SQLException {
    Statement stmt = conn.createStatement();
    stmt.executeUpdate(
      "begin transaction isolation level serializable");
    
    ResultSet results = stmt.executeQuery("select * from R");
    results.close();
    
    PreparedStatement pstmt = conn.prepareStatement(
      "update r set b= ? where a=1");
    pstmt.setInt(1, updateValue);
    pstmt.executeUpdate();
    
    // pause for debugging
    Scanner stdin = new Scanner(System.in);
    stdin.nextLine();
    
    stmt.executeUpdate("commit");
  }

When you run these two transactions' SQL code directly in two different 
psql sessions, according to the above schedule, we should get an abort 
error message for one of them.  In Java, this Postgres error message
is propagated to the Java application as an exception from the Postgres
JDBC driver:

[window 1]
Running T1()


[window 2]
Exception in thread "main" org.postgresql.util.PSQLException: ERROR: could not serialize access due to concurrent update
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1795)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:367)
	at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:321)
	at Section7Conflicting.badTransaction(Section7Conflicting.java:26)
	at Section7_T2.main(Section7_T2.java:18)

Recall that once a Postgres transaction fails, we must explicitly ROLLBACK 
the transaction before we can continue running SQL code in that psql session.  
We can do something similar in the Java code, by wrapping the Java code
in a try/catch block:

  void rolledBack() throws SQLException {
    try {
      // same code as before
    } catch (SQLException ex) {
      stmt.executeUpdate("rollback");
      System.out.println("Caught a Postgres error:");
      ex.printStackTrace();
    }
  }

Now we get the same error message as before, but at least we remember
to revert our changes ourselves, rather than forcing the database server
to do it automatically when our program disconnects from the database
on exit.

Is it enough to just give up and roll back, as we've done here?
Sometimes.  Other times we may wish to restart the transaction
from the beginning, hoping that this time, there will be no conflict.
We can do this by simply wrapping the above in a loop:

  void retryUntilSuccess() throws SQLException {
    int attemptCount = 1;
    outer: while (true) {
      System.out.println("Attempt " + attemptCount);
      try {
        badTransaction();
        break outer;
      } catch (SQLException ex) {
        conn.createStatement().executeUpdate("rollback");
        System.out.println("...failed with this Postgres error:");
        ex.printStackTrace();
        attemptCount++;
      }
    }
  }

Our final refinement would be to put a bound on how many times the
loop gets executed, so that one transaction is not stuck forever.
We can do this simply by changing the forever loop
to exit once the number of attempts reaches some maximum value.

========
PHANTOMS

Recall the phantom tuple problem:

T1:
  BEGIN TRANSACTION;
  SELECT COUNT(*) FROM R; -- returns 99

T2:
  BEGIN TRANSACTION;
  INSERT INTO R(a,b) VALUES(4, 44);
  COMMIT;

T1:
  SELECT COUNT(*) FROM R; -- returns 100!
  COMMIT;
  
In class, you discussed how Postgres handles phantoms.  In summary - 
Postgres does not handle them correctly in general.  In SERIALIZABLE
isolation, Postgres handles the exact case above correctly - transaction T1
does not count the new tuple introduced by T2 at any time.  And sometimes,
this is what we want.  

At other times, however, this is not so.  Suppose T1 does its own database
update, but whether the update occurs is controlled by the result of
the query susceptible to phantoms (which is now run only once).
Because the order in which T1 and T2 are scheduled now matters when T1 
decides whether or not to update, it would be better given the above schedule 
to roll back either T2 (to prevent the phantom being inserted) or T1 
(to prevent the phantom from going unnoticed) when T2 tries to commit.  
But Postgres does not do either, because the Postgres authors feel that the 
work required to detect that a new tuple is a phantom for some other 
transaction isn't worth it.
(In section: why might they think that?  What sort of work might be involved?)

In section: let's discuss how to work around this problem.
How do we get Postgres to block tuples that could end up being phantoms?
Can we solve this in general?  Is there an application-specific way?