Lecture: Apple File System

This June, Apple announced a new file system, the Apple File System (APFS), for its future operating systems. Although the source code is not available at this point, let’s guess how it is implemented using what you have learned about file systems from 451.

preparation

read the Apple File System Guide and see if you can guess the answers when you read the “Implementation” part of the “Frequently Asked Questions” page
try to find more APFS resources (e.g., the presentation at WWDC 2016)

administrivia

lab X
- see feedback & feel free to talk to us
- lab 6 takes time: allocate your time carefully
- demo
- no late days; see the grading and late day policy
do Exercise: big files yourself; go to the sections next Thursdays if you have any questions

recap

high-level structure: syscalls - FS - disk
- real-world I/O stack is complex - Linux
syscalls (POSIX)
- semantics are tricky
- example: update a new file
  - why not just directly writing to file - can leave an incomplete new file if crash in the middle
  - does this guarantee that users will see either the old or the new file?
  - how many inodes are being changed in this case?
- POSIX is underspecified
  - the behavior may vary across file systems
  - safe fix: insert fsync(fd) before close
  - maybe even fsync the directory
- if you are interested, learn more about crash consistency

int fd = open("file.tmp", ...);
write(fd, newdata, newdatasize);
close(fd);
rename("file.tmp", "file");

disk
- HDD (magnetism), SSD (NAND flash), 3D XPoint (unclear yet)
- Flash Translation Layer (FTL)
  - emulate the block device interface
  - logical (virtual) block to physical block: how does this compare to virtual memory
  - widely used in SSDs
file systems
- hierarchical on-disk datastructure
  - superblock
  - free bit maps
  - inodes (direct & indirect blocks)
  - files & directories
- crash safety
  - journaling
    - ideas
      - step 1: write a “todo” list before destructive updates
      - step 2: replay the “todo list”
      - can you use an “undo” list instead?
    - examples: ext3/ext4 (Linux), NTFS (Windows), HFS+ (macOS)
    - example: LevelDB, key-value store - not a file system, but share many ideas
    - downsides
      - write twice (once in the journal and once for the actual data)
      - performance (log)
      - how about running LevelDB on top of ext4?
  - copy-on-write (COW)
    - don’t do destructive updates; reduce updates to one single write
    - example: log-structured file systems
      - conceptually, the entire file system is a log
      - over-simplified example: see Figure 2
      - pros and cons?
      - see the original paper: The Design and Implementation of a Log-Structured File System, from SOSP 1991
      - also see this LWN article: Log-structured file systems: There’s one in every SSD
    - examples: Btrfs (Linux), ReFS (Windows), APFS (macOS)
  - other approaches
    - best effort repair; sync metadata change + fsck (garbage collection)
    - introduce redundancy: replications, checksums
    - soft updates

APFS

APFS notes