Review Rosenbloom and Ousterhout

From: Brian Milnes (brianmilnes_at_qwest.net)
Date: Mon Feb 23 2004 - 14:00:57 PST

Next message: Greg Green: "The Design and Implementation of a Log-Structured File System"

Previous message: Manish Mittal: "Log Structured File System"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

The Design and Implementation of a Log Structured File System - Rosenbloom
and Ousterhout

The authors introduce the log structured file system which optimizes writes
by reducing seek times and allows quick recovery. The log-structured file
system differs from file systems with logs in that all of the data
permanently resides in the log and indexing information is added to allow
random access. The most difficult piece of this design is to ensure large
contiguous blocks of storage for writing called segments. The Sprite LFS
allows 65-70% of the sequential write throughput and is faster than Unix FFS
in all but one case.

Current file systems store attributes separately from the file and store
related files in different locations causing many head seeks to read and
write files. They also write meta data synchronously causing more head seeks
even if the data is written asynchronously.

LFS uses similar structures to FFS but writes them into the log
sequentially. Instead of placing inodes at fixed offsets, LFS builds inode
maps which are compact enough to fit in RAM. The methods for managing free
space are threading or copying. Threading allows fragmentation and copying
is expensive. LFS breaks the disk up into large segments which are threaded
and then long lived data is copied together reducing copying cost. The
segments are garbage collected by a stop and copy using segment information
and a versioning scheme to detect dead blocks.

This produces all of the standard garbage collection questions: when to
collect, how much to collect, how to pick what to collect and how to group
the output. When and how much are not that critical and are handled with a
low free segment threshold. The percentage of blocks written while garbage
collecting is called the write cost and the utilization is the percentage of
data retained while cleaning. To beat an optimized FFS, LFS must have a .5
utilization. The key to making it perform is to have a bimodal distribution
of full segments and nearly empty segments.

They developed a cost-benefit policy in which they clean cold segments when
they achieve 75% utilization and hot segments when they 15%. In simulation
this gives them the write costs of an improved FFS up to about 80% of disk
utilization. This is implemented using a table that tracks the oldest file
block in a segment and the youngest.

Recovery is handled with periodic checkpoints and then a roll forward
through the log to check for correctness. Files that are completely written
since checkpoint are saved and directory entries are repaired using a
directory entry log.

They implemented everything and compared it to SunOS on CPU bound
workstations using a set of synthetic benchmarks that generated no cleaning
costs. The CPU bound is disappointing but a reality from 1992. The
estimates are that for the create and write segments of a small file test,
LFS is 10 times faster and could be 4 to 6 times faster as CPUs increase in
speed as SunOS is keeping the disk busy 85% of the time compared to 17% for
LFS. LFS is producing temporal locality while FFS is producing logical (file
system) locality. If they match FFS does well, if they don't LFS does
significantly better.

When cleaning costs are added back by studying their real disk workloads
after a few months of startup they showed write costs of 1.2 to 1.6. This is
in part because the simulation used many small files and real life deletes
large files. Recovery time was measured to be bounded by about 132 seconds
for 50 MB of recovered file.

Next message: Greg Green: "The Design and Implementation of a Log-Structured File System"

Previous message: Manish Mittal: "Log Structured File System"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 14:01:47 PST