Review: Mendel Rosenblum and John K. Ousterhout. The Design and Implementation of a Log-Structured File System.

From: Richard Jackson (richja_at_expedia.com)
Date: Mon Feb 23 2004 - 12:14:56 PST

  • Next message: Manish Mittal: "Log Structured File System"

    This 1992 paper by Rosenblum and Ousterhout discusses a file system
    scheme called a log-structured file system(LFS), where data is not
    stored in a traditionally-structured fashion, but instead is stored by
    appending to a sequential log file. This system differs from previous
    LFS implementations in that the log is the only persistent stored data;
    there is no other place that the data lives on a permanent basis.
     
    The paper is divided into these general sections: 1) overview of disk
    and hardware limitations, 2) overview of log-structured file system, 3)
    segment free-space management and cleaning, 4) simulation results, 5)
    crash recovery, 6) Sprite LFS implementation and benchmarks.
     
    First, the paper talks about the current hardware limitations. They
    claim that processor speeds and memory sizes are growing at a
    near-exponential rate, while hard drive performance is not improving.
    Rather, they say that hard drive improvements have focused on cost and
    capacity issues. This section gives the premise for why we need a LFS.
    The reason is that current systems use hard drives inefficiently, and
    the drives spend up to 90% of their time seeking and only 5-10% writing.
    In a LFS, the log can be written sequentially and in large chunks,
    allowing 65-75% of the writing bandwidth to be used. Another issue
    mentioned is that normal file systems are synchronous, while the
    proposed LFS implementation uses asynchronous writes for a large
    speedup.
     
    The next section talks about how a LFS works. The basic structures(ie:
    inode, indirect blocks) are identical to a UNIX file system(ie: UNIX
    FFS). The main difference is that the data is written only to a
    sequential log file. To support random access, various indexes are
    written to the log as well. Inodes are not written in fixed position,
    unlike UNIX. The inode locations are stored in a data structure called
    an inode map, which is also written to the log.
     
    The section on segments and segment cleaning gives details on how to
    efficiently implement a log. The segments are the smallest logical disk
    unit, and are comprised of up to 1MB of data. Segments are always
    written sequentially, but they can be intermingled on the disk with
    other empty segments. This results in a lot of unused space that must
    be recovered. This is done by a process called threading, where new
    blocks are mixed-in with old blocks. Additionally, blocks must be
    cleaned in order to provide good performance. The authors created a
    segment cleaner to do this, and it works by reading segments, removing
    non-live data, and writing back a smaller number of segments.
     
    A simulator was created to measure the performance of UNIX FFS compared
    to two different LFS implementations that varied in the random access
    pattern. These are: 1) hot and cold - 90% of data is used 10% of time
    and 10% of data is used 90% of the time, and 2) uniform - all data has
    an equal chance of being changed. The simulator showed that both methods
    perform better than UNIX FFS. The 1st method showed that locality is
    not critical, as was expected. Based on this, a new method called
    cost-benefit was developed to ensure that space in rarely-used(cold)
    segments is cleaned before space in heavily-used(hot) segments.
     
    Crash recovery was discussed, and this included both checkpoints and
    roll-forward. Checkpoints are a consistent snapshot of the system at a
    point in time, and are read by the system after a crash to restore the
    system to that snapshot. Some data can be lost, so roll-forward is also
    needed. Here, the data written after a checkpoint can be rolled-forward
    to the latest possible event. The difficulty in this is ensuring that
    all events are completely applied. For example, you would not want to
    write a directory entry for a new file if the inode-creation was not
    included in the log.
     
    The last section includes a discussion of the Sprite LFS implementation
    of a LFS. Interestingly, they said that the users could not tell the
    difference between the UNIX FFS and LFS, so they resorted to
    micro-benchmarks to prove their success. This seemed to indicate that
    the LFS was not nearly as useful as predicted. In the micro-benchmarks,
    the LFS was compared to a UNIX FFS. The results show that LFS is up to
    10 times as fast in some cases, and about the same in the worst cases.
    The only case where LFS is worse is for reading a file sequentially
    after it was written randomly. One problem I had with this test is that
    they used 4K blocks for LFS and 8K for UNIX FFS. I do not know if this
    made any difference in their results. Cleaning overheads were also
    discussed, and it was found that the overhead was even lower than
    predicted by the simulator. I wonder if any noticeable blocking happens
    during cleaning, and how much the cleaning-block-size affects this.
     
    Overall, I found this paper to be a very interesting approach to file
    system management. The LFS seems to have some great benefits such as a
    10x increase in throughput, with only limited maintenance/cleaning
    overhead. A case where this system would fail to outperform UNIX FFS is
    when there is no time to perform the cleaning. However, these cases are
    probably rare, since most applications have a significant non-peak time.
    I wonder if any modern operating systems use this file system method. It
    seems to be a good idea.
     


  • Next message: Manish Mittal: "Log Structured File System"

    This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 12:15:09 PST