Skip to main content
  (Week 7)

Reading Reflection 4

Due: Friday, May 15 at 5:00 PM | 50 points

You have heard the doubts, seen the evidence, and identified the limits. Now: where does formal reasoning go from here?

Theme

The theme for this reflection is "The Future Is Unwritten." One author argues AI will make formal verification go mainstream by collapsing the cost of proof. One argues that AI coding tools make experienced developers more valuable, not less, because the bottleneck is now verification. One asks who verifies the verifiers and proposes an architecture for trustworthy proof systems. The authors disagree about almost everything except that the ground is shifting. Your job is to decide where you stand.

Some readings may be behind publisher paywalls. As a UW student, you have free access through the UW Libraries proxy. See the Resources page for setup instructions.

Options (pick one)

Option A: Kleppmann 2025, AI Will Make Formal Verification Go Mainstream

Kleppmann argues that AI will make formal verification go mainstream. Proof checkers provide deterministic feedback: a wrong proof is simply rejected, which means hallucination is irrelevant. Meanwhile, AI-generated code creates new demand for machine-checkable correctness. Kleppmann coins "vericoding" for the practice of having AI produce both code and its correctness proof. The bottleneck shifts to writing the specification. If you have written Z3 constraints this quarter, you have practiced exactly the skill Kleppmann says will matter most.

Option B: Thomas 2026, AI Coding's Hidden Verification Premium

Thomas reports that experienced developers using AI coding tools believed they were 20% faster. The METR randomized trial showed a 19% net slowdown. The gap between perceived and actual productivity is the heart of his argument: AI-generated code looks right, passes tests, and accumulates technical debt that only experienced engineers can detect. His conclusion: AI generates code faster than anyone can verify it, and the developers who can tell good code from bad are worth more than ever.

Option C: de Moura 2026, Who Watches the Provers?

The creator of Z3 explains, in public, that Z3 has no independent checking mechanism and still produces incorrect results. His response is architectural: separate proof search (untrusted, arbitrarily complex) from proof checking (trusted, under 5,000 lines of code), then build multiple independent checkers. When AI found seven kernel bugs in Rocq, Lean's architecture survived because an exploit that works against one kernel fails against the others. If you have used Z3 this quarter and trusted its output, de Moura forces a question: what are you actually trusting, and is the architecture built to deserve that trust?

Enrichment

These are optional readings that extend the futures presented by the three options above. You do not need to read any of them, but if one of the options left you wanting more, start here.

de Moura 2026, When AI Writes the World's Software, Who Verifies It?

De Moura's longer essay is the full vision behind Kleppmann's prediction and the "Who Watches the Provers?" kernel-trust argument. His case: mathematical proof, not testing, is the right response to AI-generated code, because proofs cover all inputs by construction and cannot be gamed by an adversarial system. He reports that a general-purpose AI converted zlib to Lean with a machine-checked proof, and includes a pointed critique of push-button SMT solvers as poor feedback systems for AI. If you read Kleppmann or de Moura's shorter post and wanted the detailed case, this is it.

Brugger et al. 2025, Toward Practical Deductive Verification: Insights from a Qualitative Survey

Thirty practitioners of deductive verification, interviewed about what works and what does not. The barriers are specific: specification engineering is sometimes harder than proof, SMT automation removes user control, and proofs break when code changes. The only enrichment on this page that engages with the future of formal methods independent of AI. If the required options felt too focused on one technology trend, Brugger provides the broader view.

Murray 2026, Formal Verification in the Age of AI

Murray's "formal verification triangle" says classical verification tools could achieve automation, scalability, or precision, but only two at a time. He argues AI dissolves this tradeoff by handling reasoning tasks that decision procedures never could: inventing lemmas, repairing proofs, decomposing problems. Short and accessible. If you want a framework for understanding why AI might change the verification landscape and what the new architecture looks like, start here.

Buzzard 2025, Formal or Not Formal: That Is the Question in AI for Theorem Proving

Buzzard has spent years teaching Lean what elliptic curves are. He identifies a prior problem for the AI-and-verification optimists: the formal systems do not know enough mathematics to state most modern research theorems. Worse, unlike proofs, definitions cannot be machine-checked for correctness. A mangled axiom compiles fine. If you read de Moura and want a specific challenge to the optimistic vision from inside the Lean community, Buzzard provides it.


See Reading Reflections for submission format and grading.