================================================================ CSE 344 -- Spring 2011 Lecture 20: Transactions ================================================================ A. The Theory REVIEW: -- Transaction -- Serial / Serializable schedule -- ACID -- "Dirty read" = reading a value written by a transaction that has not committed yet. Isolation levels in SQL 1. SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED yes dirty reads 2. SET TRANSACTION ISOLATION LEVEL READ COMMITTED no dirty reads HOWEVER may read the same value twice and get different answers (how can this be possible ?) 3. SET TRANSACTION ISOLATION LEVEL READ REPEATABLE READS no dirty reads and repeated reads return the same value 4. SET TRANSACTION ISOLATION LEVEL SERIALIZABLE also handles "phantoms" ----- "PHANTOM" = a tuple inserted in a relation during the execution of a transaction BEGIN TRANSACTION; SET TRANSACTION ISOLATION LEVEL REPEATABLE READS; select count(*) from R; -- returns 99 -- other transaction inserts a tuple select count(*) from R; -- returns 100 COMMIT; phantoms are very difficult to prevent ! Few systems attempt to do this, but they still call their transactions serializable. ================================================================ B. The Practice. Two approaches for handling concurrency control: 1. Locks: -- acquire lock before reading / writing; if unavailable, then wait -- easy to implement correctly and to guarantee serializability -- locks at the table level handle phantoms easily, but inefficiently -- deadlocks: possible, and is major concern sqlite = one lock for the entire database SQL Server, DB2 = one lock per records (and index entry, etc) 2. Multiversion Concurrency Control (MVCC) -- keeps multiple versions of a record -- each transaction sees a snapshot of the database -- need to detect and handle read/write and write/write conflicts postgres, Oracle = implement MVCC but only detect write/write conflicts ================================================================ In class: SQLITE = is very simple, and we will study in detail For more information: http://www.sqlite.org/atomiccommit.html Step 1: every transaction aquires a READ LOCK all these transactions may read happily Step 2: when one transaction wants to write aquire a RESERVED LOCK this lock may coexists with many READ LOCKs writer transaction may now write (others don't see the updates) reader transactions continue to read, new readers accepted Step 3: when writer transaction wants to commit try to aquire a EXCLUSIVE LOCK: no READ LOCKs may coexist 1. aquire a PENDING LOCK: coexists with READ LOCKs no new READ LOCKS are accepted during a PENDING LOCK wait for all read locks to be released 2. aquire an EXCLUSIVE LOCK: all updates are written permanently to the database and log create table r(a int, b int); insert into r values (1,10); insert into r values (2,20); insert into r values (3,30); -- Run the script below in three different windows T1: begin; select * from r; -- T1 has a READ LOCK T2: begin; select * from r; -- T1 has a READ LOCK T1: update r set b=11 where a=1; -- T1 has a RESERVED LOCK T2: update r set b=21 where a=2; -- T2 asked for a RESERVED LOCK: DENIED T3: begin; select * from r; commit; -- everything works fine, could obtain READ LOCK T1: commit; SQL error: database is locked -- T1 asked for PENDING LOCK -- GRANTED -- T1 asked for EXCLUSIVE LOCK -- DENIED T3': begin; select * from r; -- T3 asked for READ LOCK -- DENIED (because of T1) T2: commit; -- releases the last READ LOCK T1: commit; -- T1 asked for EXCLUSIVE LOCK -- GRANTED T3': select * from r; -- T3 asked for READ LOCK -- GRANTED commit; *** In class: is this schedule serializable ? If so, then what is the serialization order of the four transactions T1, T2, T3, T3' ? *** In class: does SQLITE handle phantoms correctly ? ================================================================ DEADLOCK When Transaction 1 waits for a lock held by Transaction 2 and Transaction 2 waits for a lock held by Transaction 1 Worse: TX1 waits for TX2; TX2 waits for TX3; ... TXn waits for TX1. Unavoidable, expensive to detect. On detection, abort a transaction. SQLITE avoids deadlocks by never blocking: instead it returns an error code and application must re-issue request. ================================================================ In class: POSTURES = MVCC Each transaction T sees a snapshot of the database as it was at the time t, when the transaction started. In class: (run script above in postgres, using only T1 and T2) create table r(a int, b int); insert into r values (1,10); insert into r values (2,20); T1: begin transaction isolation level serializable; select * from r; T2: begin transaction isolation level serializable; select * from r; T1: update r set b=11 where a=1; T2: update r set b=21 where a=2; T1: select * from r; T2: select * from r; T1: commit; T2: commit; *** In class: is this schedule serializable ? If so, then what is the serialization order of the transactions T1, T2 ? *** In class: does postgres handle phantoms correctly ? DETAILS More details: http://www.postgresql.org/docs/8.2/static/transaction-iso.html -- On READ: fetch the version of the record at time t -- On WRITE: fetch the version of the record at time t If this is the current version, then write it If it was updated by some T' then WAIT for T' to finish if T' rolls back, then update the record if T' commits, then ABORT T Note: -- default isolation level in postgres = READ COMMITTED -- either SET... SERIALIZABLE explicitly, or specify in BEGIN -- IMPORTANT NOTE: if you use SET ... then do it **after** BEGIN drop table r; create table r(a int, b int); insert into r values (1,10); T1: begin transaction isolation level serializable; update r set b=11 where a=1; T2 begin transaction isolation level serializable; update r set b=12 where a=1; -- WAIT T1: commit; -- T2 is aborted -- rollback; -- allows T2 to proceed Advantages: -- read-only transactions are always allowed to proceed without delay -- non-conflicting transactions are also allowed to proceed without delay