From: Gail Rahn (grahn_at_cs.washington.edu)
Date: Mon Feb 23 2004 - 17:46:33 PST
Review of "The Design and Implementation of a Log-Structured File System" by
Rosenblum and Ousterhout
This "Sprite LFS" paper describes a log-structured file system implemented
to dramatically improve file-access time for file systems with small files,
file systems generally found in office and engineering computing
environments.
In recent time, exponential increases in computer performance have been seen
in CPU performance and memory capacity, but not in disk-access times. File
caching in memory is prevalent, so there are fewer disk read requests. The
majority of disk traffic in operating systems has become disk write
operations. The Sprite LFS exploits this pattern by caching file writes in a
memory-based log and writing data in large sequential chunks to disk. This
method all but eliminates seek time, the most expensive part of disk writes.
The raw write speed of Sprite LFS is substantially greater than the standard
UNIX file system, faster by an order of magnitude, because disk seek time is
almost eliminated by the sequential-write structure of the log. Unix file
systems spread out file information around the disk. Even when the file data
is written sequentially, inodes and and directory entries for the file are
written to random points in the disk. This creates file-reads and write
operations that are dominated by expensive, unoptimizable disk seek
operations. File-read frequency can be mitigated by memory-level caching,
but file-write operations are virtually unoptimized in UNIX.
The Sprite LFS needs large extents of free space to operate efficiently, and
to create that space, Sprite LFS breaks the disk into large chunks called
"segments". A user-level process called a "segment cleaner" creates empty
segments by compressing and moving data from fragmented segments. The paper
describes iterations of cleaning algorithms ending in an age-based algorithm
that separates "hot segments" from "cold segments" based on how frequently
the data is overwritten. "Hot segments" are frequently overwritten and "cold
segments" are infrequently updated. The cleaner cleans cold segments at 75%
utilization but waits until hot segments reach 15% utilization before
cleaning. The idea here is that blocks in hot segments are overwritten more
frequently, so Sprite LFS should delay cleaning them until the segment
consists of mostly dead blocks. Cold segments are cleaned at a higher
utilization point, but because the data changes at a slower rate, the
cleaning is actually performed less frequently. Sprite LFS maintains a
"segment usage table" structure to record the number of live bytes in the
segment and the most recent modification time of any block in the segment.
This information is used by the cleaning algorithm to sort blocks by age.
Sprite LFS uses checkpoints and roll-forward features to handle crash
recovery. The file system has two checkpoint regions, special
fixed-positions on the disk, where the inode map and segment usage tables
are written, as well as a pointer to the last segment written. Sprite LFS
uses the most recent checkpoint region to initialize its data structures
after rebooting from a crash. It scans the log segments written after the
last checkpoint to . The presence of a directory operation log ensures that
all directory changes have a corresponding inode, ensuring that all
directories with log entries also have a corresponding inode structure.
In practice, the Sprite LFS is able to create and delete files with
greater-than-an-orde-rof-magnitude improvement over UNIX file systems. Read
improvement was also recorded, but that value was not as significant.
This idea for file system usage is intriguing. Disk space cheapens
exponentially, so as storage capacity increases on OSs, the burden of
segment cleaning declines for this type of file sysstem as more wide stripes
of free space are made available. Of course, the same premise could be used
to support more sequential writing of data and inode blocks on traditional
file systems.
-------------
Gail Rahn
gail_at_thefreddies.org
This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 17:46:44 PST