Persistent Storage: Hard Drive & Solid State Drive

persistent/nonvolatile, retains data after power down
- Hard Drive (HDD) / Spinning Disk
  - large capacity at low cost, block level access, not byte addressable
  - physical motion needed to read and write, milliseconds access latency
- Solid State Drive (SSD)
file system abstraction is built on top of storage devices

head: moves above the platter (3nm), reads data from and writes data to disk sectors
sector: unit of reads and writes on disk, 512 bytes, contains error correcting code
track: length varies across disk, outer tracks have more sectors

separated by unused guard regions to reduce likelihood of neighboring corruption
only outer half of radius is used, most sectors are there in the outer half

total time = seek time + rotation time + transfer time
seek time: time to move disk arm over the desired track (1-20ms)
rotation time: time for the desired sector to rotate under the disk head (based on RPM, 4-15ms)
- eg. 7200 RPM = 120 RPS = 0.12 rotation per ms = 8.3 ms per rotation
- reasonable to assume it takes half a rotation to get to the desired sector, so 8.3/2 = 4ms
transfer time: time to transfer data onto/off the disk (based on disk bandwidth, often < 4us per sector)

since seek time is large, OS can reorder I/O requests to minimize seek time

disk scheduling = deciding on the order in which I/O requests are served

goal: minimize latency per request

Shortest Seek Time First (SSTF):
- serve the request with the shortest seek time from the current head position
- any problem with this?
SCAN, CSCAN, RCSCAN:
- acts like an elevator, when it goes up, stops at any desired floors on the way up, and same on the way down
- SCAN
  - disk arm moves from inner to outer track, serves all requests in between
  - then moves from outer to inner, serves all requests in between
- CSCAN
- RCSCAN
  - rotation aware CSCAN, rotation time is nontrivial
  - might be faster to seek to a different track if that's < rotation delay

units:
- page: unit of read and write, 2-4 KB, not VM pages!
- block: unit of erasure, 1-8 MB, span hundreds of pages
- in order to write to a page within a block, we first have to erase the entire block
operations:

read (a page): can read any page, fast(~10us) sequential and random access
erase (a block): erase a block by setting all bits in the block to 1, slow(a few ms)

program (a page): program a page in an erased block by setting certain bits to 0 to write data, ~100us

sequential access still faster than random access, but much closer than hard drive
metric: I/O Operation Per Second (IOPS)

meaningful with latency: if you batch a lot of IO requests, you can have high IOPS but also with high latency
similarly, if you have to complete requests within a certain amount of time, you may have a relatively low IOPS

a block becomes unusable after a certain number(10-100K) of program/erase operations
repeated writes to the same page is bad for endurance
wear leveling: try to spread writes across the blocks as evenly as possible

Flash Translation Layer
- uses logical blocks/pages to communicate with its client (OS)
- maps logical blocks/pages to physical blocks/pages
- makes SSD internal management easier (wear leveling, garbage collection, see OSTEP: SSD)