================================================================
CSE 344 -- Spring 2011
Lecture 20:   Transactions

================================================================
A.  The Theory

REVIEW:

  -- Transaction
  -- Serial / Serializable schedule
  -- ACID
  -- "Dirty read" = reading a value written by a transaction that has
                    not committed yet.

Isolation levels in SQL

1. SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

        yes dirty reads

2. SET TRANSACTION ISOLATION LEVEL READ COMMITTED

        no dirty reads HOWEVER may read the same value twice and get
        different answers (how can this be possible ?)

3. SET TRANSACTION ISOLATION LEVEL READ REPEATABLE READS

        no dirty reads and repeated reads return the same value

4. SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

        also handles "phantoms"

-----

"PHANTOM" = a tuple inserted in a relation during the execution of a
transaction

BEGIN TRANSACTION;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READS;
   select count(*) from R;   -- returns 99
                             -- other transaction inserts a tuple
   select count(*) from R;   -- returns 100
COMMIT;

phantoms are very difficult to prevent !  Few systems attempt to do
this, but they still call their transactions serializable.

================================================================
B. The Practice.

Two approaches for handling concurrency control:

1. Locks:
   -- acquire lock before reading / writing; if unavailable, then wait 
   -- easy to implement correctly and to guarantee serializability
   -- locks at the table level handle phantoms easily, but inefficiently
   -- deadlocks: possible, and is major concern

   sqlite = one lock for the entire database
   SQL Server, DB2 = one lock per records (and index entry, etc)


2. Multiversion Concurrency Control (MVCC) 

   -- keeps multiple versions of a record
   -- each transaction sees a snapshot of the database
   -- need to detect and handle read/write and write/write conflicts

   postgres, Oracle = implement MVCC but only detect write/write conflicts

================================================================
In class:

SQLITE = is very simple, and we will study in detail
For more information: http://www.sqlite.org/atomiccommit.html

Step 1: every transaction aquires a READ LOCK
        all these transactions may read happily

Step 2: when one transaction wants to write
        aquire a RESERVED LOCK
        this lock may coexists with many READ LOCKs
        writer transaction may now write (others don't see the updates)
        reader transactions continue to read, new readers accepted


Step 3: when writer transaction wants to commit
        try to aquire a EXCLUSIVE LOCK: no READ LOCKs may coexist
        1. aquire a PENDING LOCK: coexists with READ LOCKs
           no new READ LOCKS are accepted during a PENDING LOCK
           wait for all read locks to be released
        2. aquire an EXCLUSIVE LOCK: 
        all updates are written permanently to the database and log


create table r(a int, b int);
insert into r values (1,10);
insert into r values (2,20);
insert into r values (3,30);

-- Run the script below in three different windows

T1:
   begin;
   select * from r;
   -- T1 has a READ LOCK

T2:
   begin;
   select * from r;
   -- T1 has a READ LOCK

T1:
   update r set b=11 where a=1;
   -- T1 has a RESERVED LOCK

T2:
   update r set b=21 where a=2;
   -- T2 asked for a RESERVED LOCK:  DENIED

T3:
   begin;
   select * from r;
   commit;
   -- everything works fine, could obtain READ LOCK

T1:
   commit;
   SQL error: database is locked
   -- T1 asked for PENDING LOCK -- GRANTED
   -- T1 asked for EXCLUSIVE LOCK -- DENIED

T3':
   begin;
   select * from r;
   -- T3 asked for READ LOCK -- DENIED (because of T1)

T2:
   commit;
   -- releases the last READ LOCK

T1:
   commit;
   -- T1 asked for EXCLUSIVE LOCK -- GRANTED

T3':

   select * from r;
   -- T3 asked for READ LOCK -- GRANTED
   commit;

*** In class: is this schedule serializable ?  If so, then what is the
    serialization order of the four transactions T1, T2, T3, T3' ?

*** In class: does SQLITE handle phantoms correctly ?

================================================================
DEADLOCK

When Transaction 1 waits for a lock held by Transaction 2
and  Transaction 2 waits for a lock held by Transaction 1

Worse: TX1 waits for TX2; TX2 waits for TX3; ... TXn waits for TX1.

Unavoidable, expensive to detect.  On detection, abort a transaction.

SQLITE avoids deadlocks by never blocking: instead it returns an error
code and application must re-issue request.

================================================================

In class:

POSTURES = MVCC

Each transaction T sees a snapshot of the database as it was at the
time t, when the transaction started.

In class: (run script above in postgres, using only T1 and T2)

create table r(a int, b int);
insert into r values (1,10);
insert into r values (2,20);

T1:
   begin transaction isolation level serializable;
   select * from r;

T2:
   begin transaction isolation level serializable;
   select * from r;

T1:
   update r set b=11 where a=1;

T2:
   update r set b=21 where a=2;

T1:
  select * from r;

T2:
  select * from r;

T1:
  commit;

T2:
  commit;


*** In class: is this schedule serializable ?  If so, then what is the
    serialization order of the transactions T1, T2 ?

*** In class: does postgres handle phantoms correctly ?

DETAILS
More details: http://www.postgresql.org/docs/8.2/static/transaction-iso.html

  -- On READ: fetch the version of the record at time t

  -- On WRITE: fetch the version of the record at time t
       If this is the current version, then write it
       If it was updated by some T' then WAIT for T' to finish
               if T' rolls back, then update the record
               if T' commits, then ABORT T

Note: 
   -- default isolation level in postgres = READ COMMITTED
   -- either SET... SERIALIZABLE explicitly, or specify in BEGIN
   -- IMPORTANT NOTE: if you use SET ... then do it **after** BEGIN

drop table r;
create table r(a int, b int);
insert into r values (1,10);

T1:
   begin transaction isolation level serializable;
   update r set b=11 where a=1;

T2
   begin transaction isolation level serializable;
   update r set b=12 where a=1;
   -- WAIT

T1:
   commit;   -- T2 is aborted
   -- rollback;   -- allows T2 to proceed


Advantages:
   -- read-only transactions are always allowed to proceed without delay
   -- non-conflicting transactions are also allowed to proceed without delay