Review of LFS paper

From: Praveen Rao (psrao_at_windows.microsoft.com)
Date: Sun Feb 22 2004 - 18:59:39 PST

  • Next message: Sellakumaran: "Review: The Design and Implementation of a Log-Structured File System"

    LFS paper discusses design of a file system which makes writes to the
    disk very efficient in the face of a lot of small writes.

    Authors begin with describing the motivation behind the work. As the
    processors are getting faster and memory is getting bigger, disk access
    becomes the bottleneck in the system. Among disk I/O the effect of read
    are mitigated by large caches in the memory. Hence writes to disk
    becomes the factor that bottlenecks performance. In addition, most of
    the file operations are on small files. Making large file access
    efficient has been well studied and is limited by bandwidths of memory
    and I/O subsystems. Thus, this focus of LFS design is making the writes
    of small files efficient while not worsening the performance for other
    scenarios.

    Authors cite the problems with current file systems. Current file
    systems do too many disk I/Os for a single file creation because file
    metadata is scattered around on the disk. For example Unix FFS does 5
    accesses (2 for file attributes, 1 for file data, 1 for directory data,
    1 for directory attributes). Consequently < 5% of disk bandwidth is used
    for new data, rest of the time is spent in seeks. Moreover even though
    the file data is written asynchronously, file metadata is written
    synchronously. Network File Systems introduce even more synchronous
    behavior. Such synchronous writes make an application performance disk
    bound.

    The core idea in LFS design is to buffer a sequence of file system
    changes in the file cache and then writing all the changes to disk
    sequentially in a single disk write operation. The information written
    to the disk includes everything from data to the metadata attributes.
    Effectively, for a workload that contains many small files, LFS converts
    many small synchronous random writes into large asynchronous sequential
    transfers that can utilize nearly 100% of raw disk bandwidth. This seems
    to be a powerful idea.

    Two issues arise in such design:
    1) How to retrieve information from the log
    2) How to manage free disk space so that large chunks are always
    available for writing new data

    First problem is solved relatively simply. In Unix FFS inodes are at a
    fixed address on the disk (it can be determined where on the disk inode
    of a file is located), making it possible to retrieve them most
    efficiently. In LFS inodes are written to the log. To achieve a
    comparable performance, LFS maintains inode maps to maintain the current
    location of an inode, which are cached in the memory.

    The second problem - free space management is more involved. As the log
    gets bigger, the free space on the disk can become fragmented because of
    file deletions and overwrites. There are two choices to alleviate the
    fragmentation: threading and copying. LFS uses a combination of these
    two approaches by dividing disk into large fixed sized segments. A
    segment is always written sequentially. All the live data in the segment
    must be copied out of the segment before segment can be rewritten. These
    segments themselves are threaded together by the system.

    Segment cleaning mechanism is a three step process:
    * read a number of segments in memory
    * identify live data
    * write live data back to a smaller number of clean segments

    For segment cleaning, 4 policy issues must be addressed:
    1) When to run the cleaner
    2) How many segments to clean
    3) Which segments to clean
    4) How to regroup the live blocks

    It turns out that system performance is not very sensitive to the first
    two factors but very sensitive to the last two. Experimental data shows
    that LFS should use a cost-benefit cleaning policy to select segments.
    Regrouping of data for better locality on the other hand turned out to
    be a negative improvement. This counter-intuitive effect is due to the
    fact that regrouped segments hove just over the cleaning threshold for a
    longer time, don't get cleaned up and worsen the perf. For cost-benefit
    segment selection LFS maintains a segment usage table that records:
    * number of live bytes in the segment
    * most recent write time to any byte within the segment

    Crash recovery is much simpler and faster in LFS. In the traditional
    file system the whole disk has to be scanned for inconsistencies. In LFS
    changes are towards the end of the log. LFS uses checkpoints and
    roll-forward for recovery. Check-pointing happens at a certain interval
    to mark the state consistent. As such, during recovery it is sufficient
    to just roll-back to the last checkpoint. Roll-forward improves this by
    scanning for data and metadata changes and recovering larger amount of
    changes.

    Garbage collection in memory was ringing in my mind while reading about
    segment cleaning. Authors draw the same comparison by comparing segment
    cleaning in LFS to generational garbage collectors; though they point
    out that memory GCs have the luxury of random access.


  • Next message: Sellakumaran: "Review: The Design and Implementation of a Log-Structured File System"

    This archive was generated by hypermail 2.1.6 : Sun Feb 22 2004 - 19:00:46 PST