From: Cem Paya (cemp_at_microsoft.com)
Date: Mon Feb 23 2004 - 17:52:45 PST
Review: Log-structured file system
CSE 551P, Cem Paya
In this paper Rosenblum and Ousterhout describe a new approach to
implementing file system. Called "LFS" for log-structed file system its
main distinction from a traditional file system is the allocation of
blocks on disk to files. Requirements for LFS originate with 3 good
observations: CPU speeds are subject to Moore's law which the authors
forecast will continue to hold in short term (a safe bet when the paper
appeared in 1992 and their predictions were proved correct), memory size
continues to expand though not at same rate as Moore's law-also true in
the dozen years since publication-but disk speeds remain the bottle neck
because of physical movement involved. Disk I/O is constrained by how
fast the disk is spinning and how quickly the disk head can be
positioned on top of a particular track. Of these the first is by far is
easier and current drives can push the 10K RPM envelope, but precisely
positioning the read head on top of a track still requires on the order
of milliseconds latency.
LFS is optimized for the scenario when the access pattern of an
application is dominated by the cost of disk writes. Basic idea is to
depend on large amounts of memory to buffer consecutive write operations
into a single group and write them all at once into a linear structure.
When a file is deleted or partially overwritten, these changes are not
written to the same place on disk where the files current blocks exists.
Instead they are written into the next available position in the log.
This is very different from traditional UNIX FFS where inodes keep track
of block allocations for files and subsequent write operation have to
locate that block on disk. LFS still requires a meta-map of inodes since
they are written contiguously with data blocks. Reading files involves
chasing pointers to locate the individual pieces of a file-this is
similar to standard file systems, and indeed the read performance is
comparable, or slightly worse because of increased fragmentation.
Because overwrites can render sections of the log irrelevant, there is a
process in place for freeing up disk space when number of free segments
is too low. It employs two stratetegies. First one is very similar to
the way a copying garbage collector operatates: conceptually there are
two heaps, only one of them in use. The collector identifies live
objects and moves them to the second heap, and swaps the heaps when
completed. For LFS the files are "compacted" by moving them to the
beginning of the disk. Second strategy is threading which avoids the
cost of copying files but leaves files fragmented, which increases seek
time to gather individual pieces of one large file. The paper also
defines a metric called "write cost" for measuring the efficiency of
disk I/O, relative to the raw bandwidth of the device. Under simulated
loads, the prototype implementation called Sprite LSF significantly
outperforms the standard SunOS file system in micro-benchmarks,
measuring small file operations. There were one questionable predication
about linear scaling of performance in the small file benchmark
(creating 1K files) with number of processors. It is true that the
standard file system keeps disk busy while using a very small fraction
of its available bandwidth while Sprite LFS saturates the CPU with low
disk utilization. On the other hand coordinating I/O between processors
would likely reduce bandwidth.
This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 17:53:01 PST