Lecture 1: Intro; Fault Models — Whiteboard Descriptions

These are text descriptions of the whiteboard PDF from the lecture on March 30, 2026. See also the whiteboard PDF and the notes.

These materials were drafted by AI based on the live whiteboard PDF and audio transcript from the corresponding lecture and then reviewed and edited by course staff. They may contain errors. Please let us know if you spot any.

What is a Distributed System?

Multiple machines
- machines are faulty
- concurrency!
Connected by a network
- the network is faulty
Leslie Lamport

Why?

Harness the power of multiple machines
- Horizontal scaling
Redundancy / replication
- Fault tolerance
Placing data near users

How hard is it to build a Distributed System?

Hard to maintain "coherence"
- replication makes updates hard
Multiple machines working — one could fail
- partial failure
Concurrency

Fault Model

A fault model is a list of failures we plan to tolerate — tolerate automatically.

What failures are possible?

Power goes out — machines crash
Network faults:
- reordering (standard fault model)
- dropped (standard fault model)
- duplicate (standard fault model)
- delay (standard fault model)
- corruption
- message injection
- unplug the cable — drop
Machine crashes (standard fault model)

Items marked as "standard fault model" are the failures included in the standard fault model used throughout this course.