From: Jeff Duzak (jduzak_at_exchange.microsoft.com)
Date: Sun Feb 22 2004 - 14:33:07 PST
This paper describes the design and implementation of Sprite LFS, a
log-structured file system. The primary motivation for the system is
the fact that disk access speed has not and will not improve anywhere
near as fast as processor speed, memory size, and disk size. Disk
access becomes the main bottleneck for disk-bound applications.
A log-structured file system improves disk performance by reducing disk
seeks as much as possible. The random access nature of traditional file
systems cause a huge proportion of disk bandwidth to be wasted on seeks.
In Sprite LFS, data is written to disk sequentially, thereby removing
most seeks.
In reality, writing modified file blocks at the end of a log, as opposed
to blasting over the original file block, does not significantly change
the organization of the file system. Blocks are addressed by block
identifiers which can address blocks anywhere on the disk, so it is not
absolutely necessary to maintain locality among blocks of a file. In
practice, a traditional file system is not great at maintaining such
locality. Intermixed deletions and expansions of files cause
fragmentation of file blocks.
However, not all data can be written at the end of a log. At some
point, the file system has to record the locations of modified file
blocks, and this data must be recorded in a location that can be found
quickly. Sprite LFS does this with checkpoint regions, which locate
blocks of the filesystem's inode map. The checkpoint region is cached
in memory, and written to a fixed location on disk periodically (every
30 seconds in the production version).
The further difficulty is creating long segments of free space to which
the log can be written. Sprite LFS does this using a garbage collection
routine which reads all data from a segment and writes back only live
data. The method of selecting segments to be "cleaned" by the garbage
collector affects the performance of the overall system. Several such
methods are described by the paper, and a cost-benefit approach was
found to be the best.
The performance of the Sprite LFS system was compared against a SunOS
filesystem. The Sprite LFS performed about ten times better when
writing and deleting small files. Performance was similar in the two
systems for small file reads, as well as for most large file operations
in general. Performance in the Sprite LFS system was significantly
worse for sequentially reading a large file which had been written
randomly.
In the performance tests on the Sprite LFS system, the CPU was more of a
limiting factor than on the SunOS system. On the SunOS system, disk
bandwidth was the primary bottleneck. Therefore, if processor speed
improves more than disk bandwidth in the future, the difference in
performance between the Sprite LFS system and the SunOS system can be
expected to increase.
This paper presents some pretty cool ideas. The fact that so much disk
time is wasted in seeks, and that current filesystems do not address
this fact, seems ridiculous.
The Sprite LFS system optimizes for disk writes, since writes are more
of a bottleneck than reads during periods of high activity (read
performance can be improved through caches in memory). The paper also
alludes to the idea of doing cleanup during long periods of idle time
(such as at night). I would be curious to see how worthwhile it would
be to have a process run at such idle times and restructure the files in
order to optimize for later reading. For example, files could be
defragmented, and the locality of files in the same directory, or
similar in age or frequency of use, can be improved.
This archive was generated by hypermail 2.1.6 : Sun Feb 22 2004 - 14:33:11 PST