Lecture: file system Q&A
preparation
administrivia
recap
- high-level structure: syscalls - FS - disk
- real-world I/O stack is complex -
Linux
- syscalls (POSIX)
- semantics are tricky
- example: update a new file
- why not just directly writing to
file
- can leave an incomplete new file if crash in the middle
- does this guarantee that users will see either the old or the new file?
- how many inodes are being changed in this case?
- POSIX is underspecified
- the behavior may vary across file systems
- safe fix: insert
fsync(fd)
before close
- maybe even
fsync
the directory
- if you are interested, learn more about crash consistency
- disk
- HDD (magnetism), SSD (NAND flash), 3D XPoint (unclear yet)
- Flash Translation Layer (FTL)
- emulate the block device interface
- logical (virtual) block to physical block: how does this compare to virtual memory
- widely used in SSDs
- file systems
- hierarchical on-disk datastructure
- superblock
- free bit maps
- inodes (direct & indirect blocks)
- files & directories
- crash safety
- journaling
- ideas
- step 1: write a “todo” list before destructive updates
- step 2: replay the “todo list”
- can you use an “undo” list instead?
- examples: ext3/ext4 (Linux), NTFS (Windows), HFS+ (macOS)
- example: LevelDB,
key-value store - not a file system, but share many ideas
- downsides
- write twice (once in the journal and once for the actual data)
- performance (log)
- how about running LevelDB on top of ext4?
- copy-on-write (COW)
- don’t do destructive updates; reduce updates to one single write
- example: log-structured file systems
- examples: Btrfs (Linux), ReFS (Windows), APFS (macOS)
- other approaches for crash safety
- best effort repair; sync metadata change +
fsck
(garbage collection)
- introduce redundancy: replications, checksums
- soft updates