From: David Coleman (dcoleman_at_cs.washington.edu)
Date: Sun Feb 22 2004 - 20:32:20 PST
Log-structured file systems was an interesting look at a non-traditional
file system. I liked how it drew from the experiences of database and
other systems that use logs but applied these experiences in a unique
fashion. My experience with file systems is for optical media.
Rewritable optical media write speeds are significantly slower than read
speeds (generally at least a factor of two) and seek times are horrible
so any approach that speeds up writing would be significant.
Interestingly, write-once media is essentially a log-based system where
segments are not reused.
Segment cleaning seems like it would consume significant overhead when
rewriting the i-nodes when the location of data blocks changes due to
relocation. One solution is to use an indirection for all data block
locations (or any referenced address). The UDF (Universal Disk Format)
file system uses an indirection for addresses via a Virtual Address
Table (VAT). When moving the data block, only the VAT is updated. This
approach does incur a random-access penalty with an additional table
lookup to discover the physical block address which would impact read
performance. The ratio of cleaning to reading would dictate the
usefulness of this strategy.
I think a modification that might speed up segment cleaning would be to
have the following states recorded in the segment summary/i-node map:
- deleted / truncated to zero length – data dead, no need to check i-node
- modified – unsure of data state, need to check i-node
- unchanged since i-node/data written – data live, no need to check i-node
Currently only deleted/truncated to zero length states are maintained by
the system.
Recovering from system crashes is an important design consideration. I’m
not sure that log-based file systems need this more than traditional
file systems given that a common optimization is to use larger write
caches and order writes for optimal performance. Checkpoints only
address a single type of failure: system crash leaving the file system
in an inconsistent state. The system described would be significantly
less forgiving for disk hardware failure – even single sector failures.
These types of failures are usually much less pleasant to deal with. One
thing the paper doesn’t specifically address (or I missed it!) is how
the system determines that the file system is in an inconsistent state.
Is there some sort of integrity sector/block/descriptor indicating that
the file system is open or closed?
Ironically, a general push today in optical media file systems, and an
optimization that has been used by some implementations for years, is to
cluster meta-data together. This both allows significantly better read
performance and better strategies for error recovery. Generally, when
reading in a directory, the first thing the operating system does is get
the attributes of and information about all the files in the directory.
This involves reading in every i-node. As such, clustering meta-data
such as i-nodes together allows very good read performance when opening
directories. As any user will tell you, waiting 30 seconds to see the
contents of a directory in Windows Explorer is maddening.
This archive was generated by hypermail 2.1.6 : Sun Feb 22 2004 - 20:32:26 PST