Lecture: crash recovery
preparation
crash safety
  - problem: multi-step operations
    
      - need to update multiple blocks
        
          - example: file create
            
              - allocate a new block
- add dirententry to parent directory
- write file inode
 
- which operations should happen first?
- what happens if the operations don’t complete?
            
              - dirententry points to uninitialized inode - reliablity & security
- block allocated but not used - waste
- compare to memory bugs: dangling pointers and leak
 
 
- how to tell if multi-block updates are incomplete?
- recovery: need to either continue with or undo incomplete updates
 
- assumptions
    
      - the system can go down at any arbitrary point
- fail-stop: the disk may miss writes
        
          - no arbitrary writes, no disk corruption
- power failures, hardware failures, kernel panics
 
- single sector/block write is often atomic (by hw)
- ordering: can two writes be reordered? - more on this later
 
- goal: crash safety
    
      - the file system must be “usable” after reboot and recovery
- invariant: maintain consistent file system state
 
logging/journaling
  - basic idea: “transaction”
    
      - log everything you intend to do before making any destructive changes
- mark “done” in the log
- do the changes
- delete the log
- if crash
        
          - “done” in log: redo writes from the log
- no “done” in log: simply discard the log
 
- how about crash while replaying the log
- ensure disk won’t reorder these steps - need flush disk inbetween
- xv6 example: begin_op/log_write/end_op(log.c)