File Systems

File System Basics

software layer overview

a map from name to bytes
abstraction for storage devices but with much richer features

read and write named data
create and delete name data pair
specify and enforce permission on named data

how do we build such a system on top of storage devices?

where do we place data on disk?
how do we track information about data?
how do we name and organize data?

Where do we place data on disk?

need to track usage of hundreds of millions of sector/block
- can track multiples of sector/block instead of single sector/block
- use efficient data structures: bitmap (encode info in each bit)
- but... where should this bitmap live? if disk, how do we know where on disk?

How do we track information about data?

what information do we need to know?
- size, owner, permission, location of data on disk
- all considered metadata, also referred to as inode, file header, file record
where should metadata live? if disk, how do we know where on disk?

How do we name and organize data?

one option: just generate random string/bytes for a data, no explicit organization
another option: use user defined name and a path that encodes organizational info

file: user defined name for the data
directory: a way to group and organize files

implemented as a file where its data tracks files inside the directory
since directory is just a file, we can have nested directories
directory entry consists of name of file and the location of its metadata
how do we locate metadata for directory?

path: /separated path consists of all directories leading to the file

/: root directory, metadata lives in a known location
to locate /home/tom/foo.txt

start at the metadata for root, find its data, locate directory entry for "home"
use the metadata for "home" to find its data, locate directory entry for "tom"
use the metadata for "tom" to find its data, locate directory entry for "foo.txt"
use the metadata for "foo.txt" to find its data

Filesys Deep Dive: Data Layout

so far we just said metadata track where data is on disk, but how is the data laid out on disk?
note that we want to support both small and large files
basic techniques

contiguous

allocate consecutive blocks on disk
metadata tracks the starting block number and the number of blocks
how do you locate which block contains the ith byte of data?

linked

allocate any blocks on disk, each block stores data + a pointer to the next block
metadata tracks the block number of the first data block
how do you locate which block contains the ith byte of data?

option 3: indexed
use a block to store an array of allocated blocks
metadata tracks the block number of the index block
how do you locate which block contains the ith byte of data?

combined techniques

extents

one extent tracks a contiguous section of blocks
track multiple extents via array and/or linked apporach

multilevel indexed pointers

track an array of blocks, some point to the actual data, some to an indexed block, some to a doubly indexed block

Filesystem Design: Fast File System (FFS)

designed in 1980s, designed for disk
Linux ext2 (1993-2001) and ext3 (2001-2006) uses this design
Data Layout: Multilevel Index
- inode stores 15 pointers to track data location
- first 12 are pointers to data blocks
- pointer 13 is a pointer to an indirect block
- pointer 14 is a pointer to a double indirect block
- pointer 15 is a pointer to a triple indirect block
- what's the most and least number of blocks to read to locate a data block?
- what if we do a write at 0, and a write at a large offset (sparse file)?
- any limitation for file size with this layout?
Locality Heuristics
- when allocating blocks, a simple way is to use the first available free blocks using the bitmap
- if we do this, what do we know about our neighboring blocks?
- what is the performance for accessing neighboring blocks?
- what should the neighboring blocks store if we want user to enjoy good performance?
- Block Group Placement
- Free Space Reserve

Filesystem Design: New Technology File System (NTFS)

The Windows filesystem (1993-now)
Data Layout: Master File Table (MFT) & Extents
- MFT: table of records, 1KB record, most of the time one record = one metadata
  - record tracks data location, but if data is small enough, store data directly in the record
  - if data doesn't fit, record stores an array of extents for data
  - if the array of extent is full, allocate another record and link to it
  - if attribute list gets too large, it can be stored in an extent as well
  - all things combined:
Locality
- caches a small section of the bitmap for allocation
- neighboring blocks are once written close in time
- reserve space for MFT