Review: The Design and Implementation of a Log-Structured File System

From: Raz Mathias (razvanma_at_exchange.microsoft.com)
Date: Mon Feb 23 2004 - 16:47:36 PST

  • Next message: Prasanna Kumar Jayapal: "Review of The design & Implementation of Log-Structured File System (by Rosenblum & Ousterhout)"

     

    This paper describes a unique kind of file system that overcomes many of the barriers to performance in traditional systems through the novel use of logging. The paper describes how synchronous writes and disk fragmentation are a detriment to traditional file systems and how these problems are solved in the Sprite operating system.

     

    The paper is predicated on the assumptions that moving forward, caching will make disk reads very fast, and writes will become the bottleneck for disk-bound applications; therefore file systems need to be optimized for writes. There are two factors involved in determining write performance: disk latency (seek time) and bandwidth (data transfer time) where bandwidth should be maximized and latency should be minimized. The paper states that current systems to not meet the need because they are latency-bound; most of the disk-writing time is spent seeking and less than 10% of the disk bandwidth is ever utilized. The solution presented involves forcing all writes to the disk to execute asynchronously and sequentially across long, over empty extents on the disk called segments. Deletes and changes are recorded as manifests in this log. Rather than replacing the original item, the operation itself is recorded to the end of the log. This solution suggests that disk space utilization is traded off for performance. Two basic problems arise out of this framework requiring creative solutions, namely the issues of locating fresh data after it's been written and the issue of creating free segments.

     

    In traditional file systems data is found using inodes located at fixed locations on the disk. Inodes are also used in the Sprite system except now they may be anywhere on the disk (and in fact all changes to inode data always end up at the end of the log). To easily locate inodes a memory structure called an inode map is maintained. This way, when a file is referenced, we first look in the inode map for the inode, then look in the inode for the location of the data. We read the data into a memory cache to speed up future reads. It is interesting at this point to note that the algorithm trades off a significant amount of memory for performance.

     

    The issue of free-space management is a more complex one. The paper distinguishes between the possible alternatives to managing space. Threading involves leaving the exiting live data in place and writing around it. This has the effect of fragmenting the disk and goes against the ideal of writing to long, sequential extents. The second alternative is dubbed "copy". It involves copying only live data out of a segment, essentially compressing segments by removing unused data. This data can either be copied back into the segment, into other segments, or even onto another disk. Sprite takes the hybrid approach of threading segments and creating new segments by compressing old ones. Timestamps are kept on segments so that we can build a policy for replacing segments based upon age. Segment sizes are selected such that the cost of transfer dominates greatly over the cost of seeking data. Picking longer the segments results in less live data in any given segment and also a greater ratio between transfer and seek time (performance).

     

    Several policy issues were brought up including when to clean segments, how many to clean at a time, which to clean, and how to group blocks in segments. Another useful facet of log-structured file systems is that they can be tuned to greatly enhance the speed of crash recovery using check pointing and roll forwarding.

     

    Log-structured file systems seem like a great idea. Microsoft is planning on shipping WinFS as the next "file system" in Longhorn which is based on database indexing technology. It would be nice if some of the concepts of logging are also utilized to increase write performance.

     

    One thing that I didn't fully understand is how Sprite manages to avoid keeping a free map or bitmap. Once the disk is full, how does Sprite know where the next segment will be replaced? Does it simply scan the disk sequentially, deciding which segments to compress?

     


  • Next message: Prasanna Kumar Jayapal: "Review of The design & Implementation of Log-Structured File System (by Rosenblum & Ousterhout)"

    This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 16:47:42 PST