CSE 344 section 07 -- Some transaction-related topics ===================================================== SQL TRANSACTIONS IN JDBC - THE BASICS For homework 5, you will be using SQL transactions from Java code. You've seen how to start and finish transactions from PostgreSQL: BEGIN TRANSACTION [ISOLATION LEVEL blah] starts a transaction. COMMIT completes a transaction. ROLLBACK forces Postgres to undo all transaction changes and abort the transaction. One way to do database transactions from a Java program is just to run the SQL code for transactions from Java. Below is an example of this approach (see Section7.java for the complete code): static void sqlTxCode() throws SQLException { Statement stmt = conn.createStatement(); stmt.executeUpdate("BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE"); // [code that runs SQL queries agains the database...] stmt.executeUpdate("COMMIT"); } In homework 5 we've made this slightly easier for you, by adding PreparedStatements (commented-out) to the starter code for BEGIN TRAN, COMMIT, and ROLLBACK. You may need to use SQL transactions for any application "transaction" that reads or writes your customer database, as other instances of your application may be doing the same thing. (Emphasis on "may". It is up to you to decide where DBMS transactions are needed, and where they are not.) You may want to mark some transactions as "read-only" ones, to tell Postgres that they will never *cause* conflicts with other transactions (though they CAN still conflict with other transactions, that do do writes). To add this hint, you can say: BEGIN TRANSACTION READ ONLY [ISOLATION LEVEL blah]... You can add the SQL code for these as well to Query.java. You can also use another approach, that uses JDBC method calls instead of directly executing SQL strings. static void javaTxCode() throws SQLException { conn.setAutoCommit(false); // [code that runs SQL queries agains the database...] conn.commit(); conn.setAutoCommit(true); } The main difference between this style and the one with executing the transactional SQL strings is that here, we force all code to be in a transaction following the `conn.setAutoCommit(false)' call. Each commit() or rollback() thus implicitly starts a new transaction. This is especially useful if you think you are going to forget to use transactions --- BUT, it may also induce you to the opposite problem, overuse of transactions. Be cautious in either style. ================================ FORCING TRANSACTION INTERLEAVING It's easy to force Postgres to interleave statements from two concurrent transactions when you're entering the SQL statements directly into the `psql' command line. But this won't work when the SQL is run mechanically from another program, because all but the largest transactions finish in a fraction of a second after they start - making it hard for you to get two transactions to even run concurrently. To get around this problem, we suggest you do (and expect you to do, as we will do it too) the following: Add a pause in the middle of the two transactions you are interested in, by reading a line from the standard input (you can just throw away the input line, what matters is that the transaction is blocked in the Java program waiting for you to hit Enter). Here is a cut-and-paste ready code snippet to do this: Scanner console = new Scanner (System.in); console.nextLine(); Now, you can start the two concurrent transactions in two instances of your program, let one or both halt at the pause, and then let them proceed in the order that you want. You can even have multiple pauses if you need to allow for more complex interleavings. ================================= WHAT TO DO IF A TRANSACTION FAILS Here is a simplified and modified version of the 2-transaction schedule for Postgres you saw in lecture yesterday: T1: begin transaction isolation level serializable; select * from r; update r set b=11 where a=1; T2: begin transaction isolation level serializable; select * from r; update r set b=12 where a=1; T1: commit; T2: commit; Notice here that the two transactions execute nearly identical SQL statements; in particular, both write to rows of relation R with a=1, causing a conflict between them. If both transactions are started and their code is interleaved according to the above schedule, one of the transactions will be aborted due to the conflict. Here is Java code for both of the two transactions, in Section7Conflicting.java. The updateValue field varies for which transaction it is, per Section7_T1.java and Section7_T2.java : void badTransaction() throws SQLException { Statement stmt = conn.createStatement(); stmt.executeUpdate( "begin transaction isolation level serializable"); ResultSet results = stmt.executeQuery("select * from R"); results.close(); PreparedStatement pstmt = conn.prepareStatement( "update r set b= ? where a=1"); pstmt.setInt(1, updateValue); pstmt.executeUpdate(); // pause for debugging Scanner stdin = new Scanner(System.in); stdin.nextLine(); stmt.executeUpdate("commit"); } When you run these two transactions' SQL code directly in two different psql sessions, according to the above schedule, we should get an abort error message for one of them. In Java, this Postgres error message is propagated to the Java application as an exception from the Postgres JDBC driver: [window 1] Running T1() [window 2] Exception in thread "main" org.postgresql.util.PSQLException: ERROR: could not serialize access due to concurrent update at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2062) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1795) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:479) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:367) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:321) at Section7Conflicting.badTransaction(Section7Conflicting.java:26) at Section7_T2.main(Section7_T2.java:18) Recall that once a Postgres transaction fails, we must explicitly ROLLBACK the transaction before we can continue running SQL code in that psql session. We can do something similar in the Java code, by wrapping the Java code in a try/catch block: void rolledBack() throws SQLException { try { // same code as before } catch (SQLException ex) { stmt.executeUpdate("rollback"); System.out.println("Caught a Postgres error:"); ex.printStackTrace(); } } Now we get the same error message as before, but at least we remember to revert our changes ourselves, rather than forcing the database server to do it automatically when our program disconnects from the database on exit. Is it enough to just give up and roll back, as we've done here? Sometimes. Other times we may wish to restart the transaction from the beginning, hoping that this time, there will be no conflict. We can do this by simply wrapping the above in a loop: void retryUntilSuccess() throws SQLException { int attemptCount = 1; outer: while (true) { System.out.println("Attempt " + attemptCount); try { badTransaction(); break outer; } catch (SQLException ex) { conn.createStatement().executeUpdate("rollback"); System.out.println("...failed with this Postgres error:"); ex.printStackTrace(); attemptCount++; } } } Our final refinement would be to put a bound on how many times the loop gets executed, so that one transaction is not stuck forever. We can do this simply by changing the forever loop to exit once the number of attempts reaches some maximum value. ======== PHANTOMS Recall the phantom tuple problem: T1: BEGIN TRANSACTION; SELECT COUNT(*) FROM R; -- returns 99 T2: BEGIN TRANSACTION; INSERT INTO R(a,b) VALUES(4, 44); COMMIT; T1: SELECT COUNT(*) FROM R; -- returns 100! COMMIT; In class, you discussed how Postgres handles phantoms. In summary - Postgres does not handle them correctly in general. In SERIALIZABLE isolation, Postgres handles the exact case above correctly - transaction T1 does not count the new tuple introduced by T2 at any time. And sometimes, this is what we want. At other times, however, this is not so. Suppose T1 does its own database update, but whether the update occurs is controlled by the result of the query susceptible to phantoms (which is now run only once). Because the order in which T1 and T2 are scheduled now matters when T1 decides whether or not to update, it would be better given the above schedule to roll back either T2 (to prevent the phantom being inserted) or T1 (to prevent the phantom from going unnoticed) when T2 tries to commit. But Postgres does not do either, because the Postgres authors feel that the work required to detect that a new tuple is a phantom for some other transaction isn't worth it. (In section: why might they think that? What sort of work might be involved?) In section: let's discuss how to work around this problem. How do we get Postgres to block tuples that could end up being phantoms? Can we solve this in general? Is there an application-specific way?