Reading Reflection 2

Due: Friday, April 17 at 5:00 PM | 50 points

In Reading Reflection 1, you engaged with the skeptics. Now see what happened when real engineers actually tried formal methods at scale.

Theme

Week 3 is called "The Evidence" because the skeptics had their say and now it is time to look at what happened when real organizations invested in formal methods. Each option below describes a concrete case where formal reasoning changed how engineers work. Your job is to evaluate the evidence honestly: is it convincing, or is it a special case?

Some readings may be behind publisher paywalls. As a UW student, you have free access through the UW Libraries proxy. See the Resources page for setup instructions.

Options (pick one)

Option A: Brooker and Desai 2025, "Systems Correctness Practices at AWS"

Read it: queue.acm.org
Time: ~25 min
Stance: Cheerleader

The definitive 2025 update on formal methods at the world's largest cloud provider. Covers a decade of evolution from TLA+ through the P language, Cedar in Dafny, and Firecracker in Kani. Not one heroic case study but a broad portfolio across dozens of services.

Option B: chiplog 2025, "How Intel Avoids the FDIV Bug"

Read it: chiplog.io
Time: ~12 min
Stance: Cheerleader

A $500 million recall catalyzed an entire industry's adoption of formal verification. Intel recruited practically every prominent FM researcher and built tools that became industry standards. The clearest cost-benefit narrative in the library.

Option C: Dill and Rushby 1996, "Acceptance of Formal Methods: Lessons from Hardware Design"

Read it: csl.sri.com
Time: ~12 min
Stance: Nuanced (strategic framework)

Two pioneers of hardware verification explain why formal methods succeeded in chip design: not because engineers became purists, but because the tools became cost-effective. Their four principles for successful FM adoption shaped every adoption story since, including AWS twenty years later.

Reflection

After reading your chosen option, write a short reflection (~500 words) with three components:

Summary and main argument (~5 sentences). What is the author's central claim? What evidence or reasoning do they offer?
What resonated (~5 sentences). What connected to your own experience as a working engineer? What rang true?
What you questioned (~5 sentences). Where did you disagree, find gaps, or want to push back?

Submit via Gradescope by Friday at 5:00 PM. You may submit up to Monday at 8:00 AM with no penalty.

Grading

Each component is worth 10 points on a ternary scale:

Score	Meaning
3	Surface-level: restates without engagement
7	Solid engagement with the ideas
10	Connects to your experience or course content

Total: 30 points (individual) + 20 points (group discussion summary, completed in class the following Tuesday).

Enrichment

These are optional readings that go deeper into the themes raised by the three options above. You are not required to read any of them, but if one of the options left you wanting more, start here.

Newcombe et al. 2015, "How Amazon Web Services Uses Formal Methods"

lamport.azurewebsites.net

The original landmark that started the AWS formal methods story. If you read Brooker/Desai 2025, this is where it all began: TLA+ finding 10 critical bugs across 7 production services. If you read Dill/Rushby, this paper validates their predictions from 20 years earlier. The AWS team independently rediscovered the same principles.

Paulson 2025, "Revisiting an Early Critique of Formal Verification"

lawrencecpaulson.github.io

The creator of Isabelle/HOL, Fellow of the Royal Society, revisits the famous DeMillo critique from Reading Reflection 1 enrichment and concludes that 45 years of progress have answered the skeptics. If you read the DeMillo enrichment piece in the previous reflection, this is the rebuttal.

Datadog 2024, "Formal Modeling and Simulation for Reliable Distributed Systems"

datadoghq.com

A non-AWS success story: Datadog combined TLA+ with lightweight simulation and chaos testing to catch a real message-loss bug. If you read Brooker/Desai and wondered whether FM only works at AWS scale, this is the counterexample. If you read Dill/Rushby, Datadog exemplifies their principle of targeting FM at the hardest problems.

Sadowski et al. 2018, "Lessons from Building Static Analysis Tools at Google"

cacm.acm.org

Google's first static analysis tool failed because engineers ignored it. Their second tool succeeded by prioritizing developer experience. The lesson: even good analysis tools die if deployment ignores how people actually work. A cautionary complement to the success stories.

Distefano and O'Hearn 2019, "Scaling Static Analyses at Facebook"

cacm.acm.org

Facebook's Infer tool applies separation logic at 100 million lines of code, running on every code diff in CI. Over 100,000 issues fixed before shipping. If the AWS story is about design-level FM, this is about code-level FM at a scale comparable to hardware verification.