From: Richard Jackson (richja_at_expedia.com)
Date: Mon Feb 23 2004 - 12:14:56 PST
This 1992 paper by Rosenblum and Ousterhout discusses a file system
scheme called a log-structured file system(LFS), where data is not
stored in a traditionally-structured fashion, but instead is stored by
appending to a sequential log file. This system differs from previous
LFS implementations in that the log is the only persistent stored data;
there is no other place that the data lives on a permanent basis.
The paper is divided into these general sections: 1) overview of disk
and hardware limitations, 2) overview of log-structured file system, 3)
segment free-space management and cleaning, 4) simulation results, 5)
crash recovery, 6) Sprite LFS implementation and benchmarks.
First, the paper talks about the current hardware limitations. They
claim that processor speeds and memory sizes are growing at a
near-exponential rate, while hard drive performance is not improving.
Rather, they say that hard drive improvements have focused on cost and
capacity issues. This section gives the premise for why we need a LFS.
The reason is that current systems use hard drives inefficiently, and
the drives spend up to 90% of their time seeking and only 5-10% writing.
In a LFS, the log can be written sequentially and in large chunks,
allowing 65-75% of the writing bandwidth to be used. Another issue
mentioned is that normal file systems are synchronous, while the
proposed LFS implementation uses asynchronous writes for a large
speedup.
The next section talks about how a LFS works. The basic structures(ie:
inode, indirect blocks) are identical to a UNIX file system(ie: UNIX
FFS). The main difference is that the data is written only to a
sequential log file. To support random access, various indexes are
written to the log as well. Inodes are not written in fixed position,
unlike UNIX. The inode locations are stored in a data structure called
an inode map, which is also written to the log.
The section on segments and segment cleaning gives details on how to
efficiently implement a log. The segments are the smallest logical disk
unit, and are comprised of up to 1MB of data. Segments are always
written sequentially, but they can be intermingled on the disk with
other empty segments. This results in a lot of unused space that must
be recovered. This is done by a process called threading, where new
blocks are mixed-in with old blocks. Additionally, blocks must be
cleaned in order to provide good performance. The authors created a
segment cleaner to do this, and it works by reading segments, removing
non-live data, and writing back a smaller number of segments.
A simulator was created to measure the performance of UNIX FFS compared
to two different LFS implementations that varied in the random access
pattern. These are: 1) hot and cold - 90% of data is used 10% of time
and 10% of data is used 90% of the time, and 2) uniform - all data has
an equal chance of being changed. The simulator showed that both methods
perform better than UNIX FFS. The 1st method showed that locality is
not critical, as was expected. Based on this, a new method called
cost-benefit was developed to ensure that space in rarely-used(cold)
segments is cleaned before space in heavily-used(hot) segments.
Crash recovery was discussed, and this included both checkpoints and
roll-forward. Checkpoints are a consistent snapshot of the system at a
point in time, and are read by the system after a crash to restore the
system to that snapshot. Some data can be lost, so roll-forward is also
needed. Here, the data written after a checkpoint can be rolled-forward
to the latest possible event. The difficulty in this is ensuring that
all events are completely applied. For example, you would not want to
write a directory entry for a new file if the inode-creation was not
included in the log.
The last section includes a discussion of the Sprite LFS implementation
of a LFS. Interestingly, they said that the users could not tell the
difference between the UNIX FFS and LFS, so they resorted to
micro-benchmarks to prove their success. This seemed to indicate that
the LFS was not nearly as useful as predicted. In the micro-benchmarks,
the LFS was compared to a UNIX FFS. The results show that LFS is up to
10 times as fast in some cases, and about the same in the worst cases.
The only case where LFS is worse is for reading a file sequentially
after it was written randomly. One problem I had with this test is that
they used 4K blocks for LFS and 8K for UNIX FFS. I do not know if this
made any difference in their results. Cleaning overheads were also
discussed, and it was found that the overhead was even lower than
predicted by the simulator. I wonder if any noticeable blocking happens
during cleaning, and how much the cleaning-block-size affects this.
Overall, I found this paper to be a very interesting approach to file
system management. The LFS seems to have some great benefits such as a
10x increase in throughput, with only limited maintenance/cleaning
overhead. A case where this system would fail to outperform UNIX FFS is
when there is no time to perform the cleaning. However, these cases are
probably rare, since most applications have a significant non-peak time.
I wonder if any modern operating systems use this file system method. It
seems to be a good idea.
This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 12:15:09 PST