Analysis Pass

I wasn't happy with my explanation of the last two slides last night (Analysis Pass, pp. 34-35). To be honest, I'm not sure that what's written on the slides is exactly right. Moreover, it.s not described at all in the textbook. So here's another explanation, in a somewhat different order, which should help you with the last three questions on the assignment.

In each cache slot descriptor, add an entry called "oldestLSN". The oldestLSN is only meaningful for dirty cache slots. It's the LSN of the oldest log record that might need to be redone (with respect to the copy of the page in the stable database). When a clean cache slot is updated, both the LSN and oldestLSN of the cache slot are set to the LSN of that update's log record. When a dirty cache slot is updated, only the LSN is updated, not the oldestLSN. When a page is flushed, of course the dirty bit is cleared. But it doesn't matter whether the LSN and oldestLSN entries are cleared, since they're irrelevant after the flush (until the cache slot is updated again). At this point, there are no Flush records in the picture (as opposed to the first bullet on p. 34).

When performing a fuzzy checkpoint, in addition to the other work that's described on p. 28, the checkpoint procedure also builds a Dirty Page Table, which consists of a list of all the addresses of all the dirty pages that are in cache, and for each such page, the oldestLSN of that page's cache slot.

During the forward redo scan that starts at the penultimate checkpoint, when the Restart procedure encounters an update record U for page P, it looks up P in the Dirty Page Table that was in the last checkpoint record. If P is in the Dirty Page Table and its oldestLSN is less than or equal to U.s LSN then the page is read from the stable database and the usual decision is made whether to redo the update, namely, only redo the update if U's LSN is greater than the page's LSN.

This is all you need to know to answer the last three questions of Assignment 4 (i.e., (g) - (i)).

Now for some additional optimizations. After a page P is flushed (and the disk write has been acknowledged as successful), and before the flush procedure releases its read latch on P's cache slot, write a Flush(P) record to the log. This is used during recovery by making an analysis pass over the log before the redo pass. The analysis pass starts with an initial copy of the Dirty Page Table obtained from the last checkpoint record. Then starting with the last checkpoint record, the log is scanned in the forward direction (toward the end). When an update record U for page P is encountered, the procedure looks up P in the Dirty Page Table. If P is not there, then add it and set its oldestLSN to be U's LSN, else do nothing. If a Flush(P) entry is encountered, then P's entry in the Dirty Page Table is deleted. Notice that every page in the Dirty Page Table has to be read during recovery, since there's an update record after the penultimate checkpoint that might have to be applied to it. Therefore, Restart can use the Dirty Page Table as a prefetch list, to bring stable database pages into memory before it actually needs them and thereby speed up recovery. This is mentioned in the last bullet on p. 35.

The above treatment of the Dirty Page Table and Flush records is not exactly how it's done in the published descriptions of the ARIES algorithms (in ACM TODS, March 1992 and subsequent papers). But the ideas are the same.