Logging File System Paper Review

From: Reid Wilkes (reidwilkes_at_hotmail.com)
Date: Sat Feb 21 2004 - 18:45:44 PST

  • Next message: Honghai Liu: "Review on Log-structured File System"

    With many of the papers we have read this quarter, and this one in particular, I have wondered quite a bit how much the ideas described in the paper have actually been used "in the wild" of commercial systems. Although I know very little about file systems overall, I have a suspicion the ideas in this paper are not used in any major commercial systems. Which leads me to wonder: "why not?" The ideas seem quite sound and the performance results are certainly impressive. It is interesting to consider if any lack of adoption of these ideas might be simply because it seems counter intuitive to not keep all the data for a given file in a certain area on disk; that it just doesn't seem right to use the temporal data layout that logging file systems provide. So now - what was the paper about? This paper was about logging file systems. At first reading this title, I imagined the system proposed would be something very much Windows NT's NTFS file system - where the file system maintains what is essentially a transactional log in addition to the data on the disk. However, I was suprised to see that in this particular logging file system, the data itself is part of the log. There is no separate storage of data outside the log. The basic idea behind this is increased performance for writes and recovery. The authors claim from the start that this file system architecture at best provides no improvement in read performance. However, they also make the point from the start that as system memory sizes continue to increase, more and more reads can be satisfied from the memory and do not have to go to disk. Writes, on the other hand, will always need disk accesses so that the changed data can be transferred to non-volatile storage. Therefore the authors claim that disk write performance is more important than disk read performance. In addition, recovery time can be quite slow in more "conventional" file systems. Because recovery operations in this logging file system involve reading only the very recently written data at the end of the log and maybe making a few adjustments to some tables, recovery in this system can be much faster. The technical challenges with implementing the logging file system are mainly how to deal with the fact that the growing log will eventually consume all of the disk. The solution of course is to wrap the log but it becomes important to only overwrite allowable areas of the previous log. The Sprite LFS system uses a concept of segments. The disk is divided into fixed size segments, and segments are either completely writable or not writeable. As the log is growing sequentially across the disk, if it comes upon a segment that is not writable, it will simply skip over that segment and continue writing. The other piece required to make this work is a cleaner. What the cleaner does is essentially like garbage collection on the disk - it identifies data in segments that is no longer part of "live" files, and reclaims those segments for the log. It also shuffles true live data around to maintain optimum ratios of free segments to used ones. In summary, this system seemed to me like it was clearly the way file systems should be implemented. Given that processor speeds continue to grow at such a high rate and disk speeds lag considerably, it seems inevitable we will have to build smarter software like this to lessen the impact of the performance discrepancy.


  • Next message: Honghai Liu: "Review on Log-structured File System"

    This archive was generated by hypermail 2.1.6 : Sat Feb 21 2004 - 18:45:51 PST