Review Rosenbloom and Ousterhout

From: Brian Milnes (brianmilnes_at_qwest.net)
Date: Mon Feb 23 2004 - 14:00:57 PST

  • Next message: Greg Green: "The Design and Implementation of a Log-Structured File System"

    The Design and Implementation of a Log Structured File System - Rosenbloom
    and Ousterhout

    The authors introduce the log structured file system which optimizes writes
    by reducing seek times and allows quick recovery. The log-structured file
    system differs from file systems with logs in that all of the data
    permanently resides in the log and indexing information is added to allow
    random access. The most difficult piece of this design is to ensure large
    contiguous blocks of storage for writing called segments. The Sprite LFS
    allows 65-70% of the sequential write throughput and is faster than Unix FFS
    in all but one case.

    Current file systems store attributes separately from the file and store
    related files in different locations causing many head seeks to read and
    write files. They also write meta data synchronously causing more head seeks
    even if the data is written asynchronously.

     LFS uses similar structures to FFS but writes them into the log
    sequentially. Instead of placing inodes at fixed offsets, LFS builds inode
    maps which are compact enough to fit in RAM. The methods for managing free
    space are threading or copying. Threading allows fragmentation and copying
    is expensive. LFS breaks the disk up into large segments which are threaded
    and then long lived data is copied together reducing copying cost. The
    segments are garbage collected by a stop and copy using segment information
    and a versioning scheme to detect dead blocks.

    This produces all of the standard garbage collection questions: when to
    collect, how much to collect, how to pick what to collect and how to group
    the output. When and how much are not that critical and are handled with a
    low free segment threshold. The percentage of blocks written while garbage
    collecting is called the write cost and the utilization is the percentage of
    data retained while cleaning. To beat an optimized FFS, LFS must have a .5
    utilization. The key to making it perform is to have a bimodal distribution
    of full segments and nearly empty segments.

    They developed a cost-benefit policy in which they clean cold segments when
    they achieve 75% utilization and hot segments when they 15%. In simulation
    this gives them the write costs of an improved FFS up to about 80% of disk
    utilization. This is implemented using a table that tracks the oldest file
    block in a segment and the youngest.

    Recovery is handled with periodic checkpoints and then a roll forward
    through the log to check for correctness. Files that are completely written
    since checkpoint are saved and directory entries are repaired using a
    directory entry log.

    They implemented everything and compared it to SunOS on CPU bound
    workstations using a set of synthetic benchmarks that generated no cleaning
    costs. The CPU bound is disappointing but a reality from 1992. The
    estimates are that for the create and write segments of a small file test,
    LFS is 10 times faster and could be 4 to 6 times faster as CPUs increase in
    speed as SunOS is keeping the disk busy 85% of the time compared to 17% for
    LFS. LFS is producing temporal locality while FFS is producing logical (file
    system) locality. If they match FFS does well, if they don't LFS does
    significantly better.

    When cleaning costs are added back by studying their real disk workloads
    after a few months of startup they showed write costs of 1.2 to 1.6. This is
    in part because the simulation used many small files and real life deletes
    large files. Recovery time was measured to be bounded by about 132 seconds
    for 50 MB of recovered file.


  • Next message: Greg Green: "The Design and Implementation of a Log-Structured File System"

    This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 14:01:47 PST