Skip to main content

Reading Reflection 3

Due: Friday, May 1 at 5:00 PM | 50 points

Reading Reflection 2 showed formal methods working. Now hear what even the believers admit they cannot do.

Theme

Week 5 is called "Honest Limits" because the best critique of any tool comes from the people who use it most. Each option below is written by someone who believes in formal methods and still says: here is where it breaks down. Your job is to decide whether these are fixable growing pains or inherent constraints.

Some readings may be behind publisher paywalls. As a UW student, you have free access through the UW Libraries proxy. See the Resources page for setup instructions.

Options (pick one)

Option A: Brooker 2022, "Formal Methods Only Solve Half My Problems"

A co-author of the landmark AWS formal methods paper honestly admits that FM leaves critical questions unanswered: latency, cost, hardware sizing, overload behavior. The cleanest statement of what proofs leave untouched, from someone who cannot be dismissed as uninformed.

Option B: Zhou 2024, "Mariposa: the Butterfly Effect in SMT-based Program Verification"

Rename a variable and your proof breaks. Two Z3 commits totaling 30 lines caused 67% of stability regressions across six major verification projects. If 2% of your CI tests randomly failed, you would consider the pipeline broken. This is that, for proofs.

Option C: Parnas 2010, "Really Rethinking Formal Methods"

The inventor of information hiding and modular software design calls formal methods "a complete failure" in adoption, then offers a constructive path forward. He agrees math is essential for software. He just argues the FM community has been promoting the wrong formalisms.

Reflection

After reading your chosen option, write a short reflection (~500 words) with three components:

  1. Summary and main argument (~5 sentences). What is the author's central claim? What evidence or reasoning do they offer?

  2. What resonated (~5 sentences). What connected to your own experience as a working engineer? What rang true?

  3. What you questioned (~5 sentences). Where did you disagree, find gaps, or want to push back?

Submit via Gradescope by Friday at 5:00 PM. You may submit up to Monday at 8:00 AM with no penalty.

Grading

Each component is worth 10 points on a ternary scale:

Score Meaning
3 Surface-level: restates without engagement
7 Solid engagement with the ideas
10 Connects to your experience or course content

Total: 30 points (individual) + 20 points (group discussion summary, completed in class the following Tuesday).

Enrichment

These are optional readings that go deeper into the themes raised by the three options above. You are not required to read any of them, but if one of the options left you wanting more, start here.

Kirejczyk 2020, "The Problem with Formal Verification"

medium.com

A team invested in formal verification from the start for Ethereum smart contracts, hired a PhD specialist, integrated proofs into CI, and abandoned it after four months. If Brooker describes what FM cannot address, Kirejczyk shows what happens when you try FM on problems it should handle and it still falls apart. The clearest "we tried it and it did not pay off" postmortem in the library.

Feist et al. 2024, "Why Fuzzing over Formal Verification?"

blog.trailofbits.com

An offensive security shop demonstrates that the two most-publicized bugs found by formal verification in major DeFi protocols were reproducible by a fuzzer in minutes. Writing good invariants is 80% of the work; the tool checking them is secondary. If Zhou shows FM tools are brittle, Feist shows a simpler tool that avoids the brittleness entirely.

Winterer et al. 2020, "Validating SMT Solvers via Semantic Fusion"

doi.org

Z3 and CVC4 have had 29 confirmed soundness bugs where solvers returned incorrect sat/unsat answers. Zhou shows instability (inconsistent answers); Winterer shows unsoundness (wrong answers). Together: the solver you trust is neither stable nor always correct.

Naur 1985, "Programming as Theory Building"

doi.org

A Turing Award winner argues that the most important knowledge about a program lives in the programmer's head and resists formalization. If Brooker says proofs miss performance, and Zhou says the tools are fragile, Naur says the deeper problem is that what matters most about code is not capturable in any formal artifact.

Keles 2024, "Test, Don't (Just) Verify"

alperenkeles.com

A modern synthesis arguing that testing and verification are complementary, not alternatives. If Brooker and Parnas leave you wondering what a constructive path forward looks like, Keles offers one: use both, with clear understanding of what each provides.