Review of "...Log-Structured File System" by Rosenblum and Ousterhout

From: Gail Rahn (grahn_at_cs.washington.edu)
Date: Mon Feb 23 2004 - 17:46:33 PST

  • Next message: Song Xue: "Review of "The Design and Implementation of a Log-Structured File System""

    Review of "The Design and Implementation of a Log-Structured File System" by
    Rosenblum and Ousterhout

    This "Sprite LFS" paper describes a log-structured file system implemented
    to dramatically improve file-access time for file systems with small files,
    file systems generally found in office and engineering computing
    environments.

    In recent time, exponential increases in computer performance have been seen
    in CPU performance and memory capacity, but not in disk-access times. File
    caching in memory is prevalent, so there are fewer disk read requests. The
    majority of disk traffic in operating systems has become disk write
    operations. The Sprite LFS exploits this pattern by caching file writes in a
    memory-based log and writing data in large sequential chunks to disk. This
    method all but eliminates seek time, the most expensive part of disk writes.

    The raw write speed of Sprite LFS is substantially greater than the standard
    UNIX file system, faster by an order of magnitude, because disk seek time is
    almost eliminated by the sequential-write structure of the log. Unix file
    systems spread out file information around the disk. Even when the file data
    is written sequentially, inodes and and directory entries for the file are
    written to random points in the disk. This creates file-reads and write
    operations that are dominated by expensive, unoptimizable disk seek
    operations. File-read frequency can be mitigated by memory-level caching,
    but file-write operations are virtually unoptimized in UNIX.

    The Sprite LFS needs large extents of free space to operate efficiently, and
    to create that space, Sprite LFS breaks the disk into large chunks called
    "segments". A user-level process called a "segment cleaner" creates empty
    segments by compressing and moving data from fragmented segments. The paper
    describes iterations of cleaning algorithms ending in an age-based algorithm
    that separates "hot segments" from "cold segments" based on how frequently
    the data is overwritten. "Hot segments" are frequently overwritten and "cold
    segments" are infrequently updated. The cleaner cleans cold segments at 75%
    utilization but waits until hot segments reach 15% utilization before
    cleaning. The idea here is that blocks in hot segments are overwritten more
    frequently, so Sprite LFS should delay cleaning them until the segment
    consists of mostly dead blocks. Cold segments are cleaned at a higher
    utilization point, but because the data changes at a slower rate, the
    cleaning is actually performed less frequently. Sprite LFS maintains a
    "segment usage table" structure to record the number of live bytes in the
    segment and the most recent modification time of any block in the segment.
    This information is used by the cleaning algorithm to sort blocks by age.

    Sprite LFS uses checkpoints and roll-forward features to handle crash
    recovery. The file system has two checkpoint regions, special
    fixed-positions on the disk, where the inode map and segment usage tables
    are written, as well as a pointer to the last segment written. Sprite LFS
    uses the most recent checkpoint region to initialize its data structures
    after rebooting from a crash. It scans the log segments written after the
    last checkpoint to . The presence of a directory operation log ensures that
    all directory changes have a corresponding inode, ensuring that all
    directories with log entries also have a corresponding inode structure.

    In practice, the Sprite LFS is able to create and delete files with
    greater-than-an-orde-rof-magnitude improvement over UNIX file systems. Read
    improvement was also recorded, but that value was not as significant.

    This idea for file system usage is intriguing. Disk space cheapens
    exponentially, so as storage capacity increases on OSs, the burden of
    segment cleaning declines for this type of file sysstem as more wide stripes
    of free space are made available. Of course, the same premise could be used
    to support more sequential writing of data and inode blocks on traditional
    file systems.

    -------------
    Gail Rahn
    gail_at_thefreddies.org


  • Next message: Song Xue: "Review of "The Design and Implementation of a Log-Structured File System""

    This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 17:46:44 PST