From: Steve Arnold (stevearn_at_microsoft.com)
Date: Sun Feb 22 2004 - 22:05:47 PST
The authors of this paper suggest a different way of storing data on a
disk. Instead of using linked indexes, as are traditionally used, they
are using a log, much like a database might use a log to store
information. Usually file systems have poor performance on writes. They
are trying to remedy this by caching the writes in memory before
committing them to disk. They also argue that current systems take too
many accesses to do simple operations. All the buffered writes get
written sequentially at the same time to improve performance.
The authors try to show that there are two kinds of data, that which is
long-lived (and infrequently accesses) and that which is hot
(short-lived and frequently accessed). After doing some trial runs, they
find that they need to treat the two cases separately. The interesting
case is that which is long-lived. Why? Because it must be cleaned up
after. The logs can end up with fragmented free space. In order to
re-capture this space, they have to do some housecleaning. This can be
costly, although it is not something they do all the time.
It seemed to me that their initial results were only slightly better
than a traditional file system. However, they do suggest that there are
several things that could be tweaked, such as when to run the segment
cleaner, how many segments get cleaned at once, which segments to clean,
and how are live blocks grouped? I'd like to know if they took this any
further.
It seems to me that this would work good for special application that
rely on doing a lot of writes (e.g., database systems). For systems that
do mostly reading, is it worth the overhead?
This archive was generated by hypermail 2.1.6 : Sun Feb 22 2004 - 22:07:57 PST