From: Cliff Schmidt (cliff_at_bea.com)
Date: Mon Feb 23 2004 - 17:55:10 PST
This paper discussed a file system design based on a log-structure
similar to the way a database system uses logs. Even to the point of
using checkpoints and roll-forwards, I found Sprite to be very similar
to database logging, with one major exception, the log was the entire
stored structure, whereas a database system mainly uses a log for
crash/recovery purposes, since eventually the data gets pushed to the
appropriate page on disk. This also means that a database system
does not need a segment cleaning mechanism, as is required for
Sprite.
The authors make a very clear point early in the paper that a log-
based file structure leverages the increasing CPU speed and memory
capacity since disk writes can be cached in memory before writing
many changes at once to the log. This saves time in disk access,
which is pointed out to be limited in how much it will improve in the
future (separating access time from disk transfer bandwidth). It
was also important to note that this implies an assumption that
crashes are infrequent and that "it is acceptable to lose a few
seconds or minutes of work in each crash" since the buffering system
may cause something that would have been written immediately to disk
in a conventional file system to be still in main memory during a
crash. This made me wonder whether there were lessons that this
system could learn from single-level stores like Multics or Hydra.
One of the key ideas behind this type of system was mentioned in
section 3:
"the log-structured file systems converts the many small
synchronous random writes of traditional file systems into large
asynchronous sequential transfers that can utilize nearly 100%
of the raw disk bandwidth."
This point is emphasized many times throughout the paper. What a
log-structured file system does for you is allow greater use of
the available disk bandwidth. Also, by avoiding synchronous
writes, you decouple the application's performance from the
disk's performance.
Some of the other notes I made about this system were:
- inodes are the main data structure with the file attributes
- indirect blocks provide a level of indirection by pointing to
more data or indirect blocks.
- inode maps maintain the location of each inode and can be
found from a fixed checkpoint region on the disk.
- threading and copying are two methods of managing free space
within segments. Copying live data appeared to be a cross
between a disk defrag and a programming language's garbage
collection process.
- The main measurement of performance in this system is around
the "write cost" which is the average amount of time the disk
is busy per byte of new data written.
- Higher performance is obtained by forcing the disk into a
bimodal segment distribution "where most of the segments
are nearly full, a few are empty or nearly empty". This
also reminded me of database systems and how they an important
factor is how much to load each page.
- I found it very interesting to compare the "temporal
locality" of log-structured file systems with the "logical
locality" of traditional systems. Depending on your
assumptions about how the files will be accessed. The
temporal locality may provide better performance.
This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 17:55:12 PST