From: Sellakumaran (ksella_at_hotmail.com)
Date: Sun Feb 22 2004 - 19:52:40 PST
This paper presents a new technique for disk storage management: log-based
file system. It starts with the problems faced by current systems like UNIX,
proposes a log based solution, discusses the various components of the
solution, and compares a prototype with Unix FFS. The paper focuses on
issues and improving the write performance only and concludes that a
log-structured file system can use disks an order of magnitude more
efficiently than existing file systems.
One of the big issues that limit the scalability of current computer systems
is the disproportionate growth of processor capabilities and hard disk
capabilities with respect to speed and performance. While processing power
has grown dramatically, disk speeds have improved only slowly. So even the
cpu speed is very high, the disk access (read and write) has been a limiting
factor in applications / computer systems taking advantage of this power.
Both disk read and write are issues. At the same time, the computer memory
has also been growing dramatically. Given this, the log-based file system
presented in this paper primarily focuses in on disk writes, leaving the
disk reads to be taken care of by making use of increased main memory
(caches).
There were some interesting details provided while discussing the problems
with current files systems. For example, in Unix FFS, it takes at least 5
I/Os (each preceded by a seek) for creating a new file. When writing small
files in such a system, less than 5% of the disk's potential bandwidth is
used for new data; the rest of the time is spent in seeking. This is a big
problem area and tackling this would give many orders of magnitude of
performance improvement. And log-based file system described here (Sprite
LFS) tries to do that. The paper first presents the considerations and
defines the problem area that it tries to tackle: disk writes for small
sized files.
The fundamental idea of a log-structured files system is to improve write
performance by buffering a sequence of file system changes in the file cache
and then writing all the changes to disk sequentially in a single write
operation. And the data is stored permanently in the log and there are no
other structures on disk. There are two key issues in these systems: a)
locating a file and reading it b) manage free space on disk (so that large
extents of free space are available). The other main consideration is the
restart ability of the system.
The Sprite LFS has the following data structures: inode, inode map, indirect
map, segment summary, segment usage table, super block, check point region,
directory change log. Sprite LFS uses a combination of threading and copying
for managing free space. The disk is divided into large fixed-size extents
called segments (and each segment is made up of blocks). And a segment
cannot be rewritten until the live data has been copied out. This process
is called segment cleaning: (read segment(s) into memory, identify live data
and write them to a smaller number of clean segments - this leaves the
original segment(s) available for new writes). Segment summary block is
used in the process. Next question is to identify the cleaning policies.
This is where the authors introduce some interesting terms/factors like age
sort, write cost, hot and cold access patterns. They try out a simulation
in identifying these policies and conclude that the hot and cold segments
must be treated differently by the cleaner; free space in cold segment in
more valuable and come up with the cost-benefit policy. This leads to the
data structure called segment usage table.
Crash recoverability is handled in two ways: check points and roll-forward.
The check point frequency is another important decision and a proper
trade-off has to be taken considering cost of normal operation and faster
recovery.
The benchmarks were taken using small programs on Sun OS and Sprite LFS.
Small-file performance in Sprite was very good compared to SunOS. The
interesting point was the disk saturation in Sprite LFS was only 17%
compared to 100% cpu. So Sprite will be able to take advantage of increate
CPU power where as SunOS had 85% disk saturation.
The system has borrowed many ideas from previous work, different storage
management systems / database systems. Overall, I think that the authors
clearly stated their goals (write, small files, better performance than
Unix) and they described the system and explained the various results well
to show that they indeed achieved their goal. It will be interesting to
know how this system measures up in today's hardware.
This archive was generated by hypermail 2.1.6 : Sun Feb 22 2004 - 19:52:55 PST