Where Does It Break?
Three readings on the limits of formal methods. Share what you read, argue about which limits matter, and post your group's takeaway.
The Readings
For Reading Reflection 3, you chose one of three readings on the limits of formal methods:
- Wayne, Why Don't People Use Formal Methods? (2019)
- Hughes, Experiences with QuickCheck: Testing the Hard Stuff and Staying Sane (2016)
- Zhou et al., The Butterfly Effect in SMT-Based Program Verification (2024)
Tonight we will discuss these readings in small groups. Try to make sure your group has at least some people who read different papers.
Discussion Protocol
Four rounds, ten minutes each.
Share (10 min)
Each person: which reading you chose, what the core claim is, and your gut reaction.
Some starting points if helpful:
- Did your reading describe a limit you have personally hit at work?
- Was there a number, claim, or moment in the reading that you keep coming back to?
Debate (10 min)
The three authors locate the limit in different places: Wayne in economics and culture, Hughes in the cost gap between random testing and exhaustive checking, Zhou in the solver itself. Wayne and Hughes both assume the solver works as advertised; Zhou says it doesn't.
Some starting points if helpful:
- Which of these three limits would stop you from using formal methods in your work?
- All three readings come close to "the spec is the hard part." If that's true, what work is left for solvers and testing tools?
- Does Zhou's instability undermine what Wayne and Hughes are arguing, or is it a separate problem?
If you reach easy consensus: what would the author of one of the other readings say to your group right now?
Converge + Post (10 min)
As a group, decide: what was the sharpest thing you argued about? Where did you land? What's one question you couldn't settle? What do you want the other groups to know?
Post your group's summary to both Ed and Gradescope.
Report-Out (10 min)
Groups share what they found. What couldn't your group settle? Did anyone change their mind? Looking back at the question we closed with two weeks ago, has this week's evidence moved your answer?