Reading Library

76 readings on formal methods, solver-aided programming, and software correctness. Browse by topic or follow the links that catch your eye.

Some readings may be behind publisher paywalls. As a UW student, you have free access through the UW Libraries proxy. See the Resources page for setup instructions.

Should We Bother?

Formal methods promise correctness, but critics have been asking hard questions for decades: Do proofs actually work in practice? Can the economics ever justify the cost? These readings argue both sides of the fundamental value question. Start here if you want to understand why reasonable people disagree about whether this whole enterprise is worth it.

It's the Specification, Stupid!

Manfred Broy, Harald Ruess, Natarajan Shankar (2025)

Software failure costs exceed $1 trillion annually, and LLM-generated code will make things worse because LLMs cannot reliably self-check. Three senior FM researchers argue the field has it backwards: instead of verifying code after the fact, invest in rigorous specifications of core infrastructure and generate correct implementations from them. They estimate 100-120 person-years could produce big specifications for the entire global software supply stack.

cheerleader specification recent

Formal Methods: Just Good Engineering Practice?

Marc Brooker (2024)

Not using formal methods on large-scale systems wastes time and money. After a decade of TLA+ at AWS, a Distinguished Engineer argues FM reduces rework by catching design errors before implementation and reduces the cost of change by getting interfaces right before Hyrum's Law locks them in. FM not only helps build systems faster but helps build faster systems -- one AWS team used TLA+ to verify an aggressive optimization they would not have shipped otherwise.

cheerleader economics recent short

You Already Know Formal Methods

Joey Dodds (2021)

Every software engineer already builds mental models of languages, programs, and dependencies and reasons about correctness -- formalizing those models is the essence of formal methods. A Galois researcher argues you can start with a blank document and fifteen minutes, no tools required. The lowest-barrier on-ramp to formal thinking in the library.

cheerleader developer-experience short

There Was No Formal Methods Winter

Hillel Wayne (2021)

There was no formal methods winter because there was never a summer -- unlike AI, FM never attracted significant industry investment in the first place. Wayne introduces the "power-to-cost ratio" framework: testing, types, and safer languages historically offered better ratios than formal proofs, and Z3 (2006) was FM's closest analog to AlexNet but its impact was indirect and slow. A realistic economic lens for understanding why the tools you learn in CARS are not yet mainstream.

nuanced economics short

Towards Making Formal Methods Normal: Meeting Developers Where They Are

Alastair Reid, Luke Church, Shaked Flur, Sarah de Haas, Maritza Johnson, Ben Laurie (2020)

Less than 1% of developers use formal methods, and the fix is not making better correctness arguments but making tools that pay for themselves every week. A Google Research team proposes building on testing as a bridge: the same specification should serve as both a test harness and a verification harness, letting developers slide from fuzzing to proving without switching tools.

cheerleader developer-experience

The Business Case for Formal Methods

Hillel Wayne (2019)

An hour of TLA+ modeling found a 35-step bug in DynamoDB and compressed a four-month schedule by two; two engineer-days of specification saved a startup $300K/year. Wayne assembles the strongest ROI case for formal methods with dollar figures, schedule estimates, and a live demo of catching a concurrency bug in under six minutes.

cheerleader economics short

Why Don't People Use Formal Methods?

Hillel Wayne (2019)

Code verification is technically hard (IronFleet produced four verified lines per day); design verification is culturally hard (programmers distrust non-code artifacts). Wayne maps the full landscape of adoption barriers and concludes that 90-99% correct is dramatically cheaper than 100%, making full verification economically irrational for most software. The best single-essay overview of why formal methods remain niche after fifty years of research.

nuanced economics philosophy

Software Quality: Better in Practice Than in Theory (Economics of Formality)

John D. Cook (2018)

Formal methods adoption is fundamentally an economic question: verification is expensive, and for most software, code reviews and tests provide adequate quality at lower cost. Cook maps out a continuum of formality and argues the optimal position depends on the probability and consequences of errors -- a calculation that is shifting as both tools improve and stakes increase.

nuanced economics short

Who Builds a House Without Drawing Blueprints?

Leslie Lamport (2015)

No engineer would build a house without blueprints, yet most programmers jump straight to code without any precise specification. The creator of TLA+ and LaTeX argues that specification is a professional responsibility -- not because every line needs a proof, but because thinking precisely about a design before coding catches bugs that testing never will.

cheerleader specification short

The Wooden Firehouse

Andrew Myers (2015)

The software we rely on to fight security fires is built from the same flammable material as everything else. Myers's vivid allegory about a town made entirely of wood distills the trusted computing base problem into a single memorable image: even the firehouse is wooden.

cheerleader security short

One Week of Bugs

Dan Luu (2014)

Serious bugs from Microsoft, Google, and Apple are not rare exceptions -- they are a daily occurrence for any attentive user. Luu catalogs every software bug he hit in a single ordinary week, building a visceral case that current quality practices are systemically inadequate. The post never mentions formal methods, which is exactly the point.

critic developer-experience

When Will Software Verification Matter?

John Regehr (2012)

Full verification multiplies development cost by 10x, and the market reveals how little it values correctness: almost nobody disables compiler optimizations despite known miscompilation bugs. Regehr predicts verified "islands" will appear first in crypto libraries and hypervisors -- a prediction that proved remarkably accurate within five years.

critic economics short

The Future of Computing: Logic or Biology

Leslie Lamport (2006)

Computing faces a fork: treat software as a logical artifact you can prove things about, or as a biological system you can only observe empirically. The creator of TLA+ and LaTeX argues that the biological metaphor is an intellectual surrender that treats bugs as inevitable features rather than preventable errors. In the AI era -- where generated code is more emergent and less understood than ever -- his question has only gotten sharper.

cheerleader philosophy short

Worse is Better is Worse

Nickieben Bourbaki (Richard P. Gabriel) (2000)

Nine years after arguing that worse is better, Gabriel wrote a pseudonymous rebuttal to his own famous essay. The dichotomy was never between right-thing design and worse-is-better design -- it was between right-thing design and no design at all. The most dangerous legacy of the original essay, he now argues, is teaching young engineers that it is acceptable to aim low.

cheerleader philosophy short

How Did Software Get So Reliable Without Proof?

C.A.R. Hoare (1996)

The inventor of Hoare Logic asks the most uncomfortable question a verification pioneer can ask: why did the software catastrophe never arrive, even though almost nobody proves their programs correct? Hoare's honest answer -- that engineering practices absorbed the insights of formal methods without using the proofs themselves -- became the mainstream view of how theory transfers to practice.

nuanced philosophy classic

The Rise of Worse is Better

Richard P. Gabriel (1991)

Simpler, imperfect systems beat correct, complex ones through viral adoption -- Unix and C won not by being better but by being simpler and good enough. Gabriel's three-page essay names the tension every engineer lives with and asks whether correctness is a losing strategy. He was so unsettled by his own argument that he later wrote a pseudonymous rebuttal.

nuanced philosophy classic short

What Happened When They Tried It

Theory is one thing. These readings are about what actually happened when engineers at AWS, Intel, Google, Facebook, and others put formal methods into production. Some are triumphant success stories. Others reveal surprising gaps between the pitch and the reality. Either way, the data is real.

Lessons Learned from Incorporating Formal Methods in Huawei Cloud Reliability

Claudia Cauli, Timo Lang, Shuo Chen, Sebti Mouelhi, Xin Jin, Subhajit Bandopadhyay, Xusheng Chen, Yazhi Feng, Haoze Song, Linhua Tang, Zhenli Sheng, Ananth Shrinivas Srinath (2026)

The first major FM experience report from a Chinese technology company. Huawei's formal methods team found seven bugs across three production cloud systems -- including atomicity violations requiring specific multi-node timing -- using P, TLA+, and Gobra. The paper provides unusually detailed effort data (person-months, team composition, timelines) and compares lightweight modeling against heavyweight code verification side by side.

nuanced distributed recent

Systems Correctness Practices at AWS

Marc Brooker, Ankush Desai (2025)

A decade after AWS first reported using TLA+, the toolkit has expanded to include the P language, Cedar in Dafny, Firecracker verification in Kani, and RSA proofs in HOL Light. The most comprehensive survey of formal methods in industrial practice available, covering dozens of services at the world's largest cloud provider.

cheerleader distributed recent

How Intel Makes Sure the FDIV Bug Never Happens Again

Subbu (2025)

A $500 million recall turned Intel into the world's largest consumer of formal verification. After the 1994 FDIV disaster, Intel recruited practically every prominent researcher in model checking and automated theorem proving, built tools that became industry standards, and trained 85 engineers to be formally literate. The clearest case study of how a catastrophic bug transforms an entire industry's engineering practices.

cheerleader hardware recent

Reality Check on Formal Methods in Industry: A Study of Verum Dezyne

Michele Chiari, Matteo Camilli, Marcello M. Bersani, Rutger van Beusekom, Damian A. Tamburri (2025)

Interviews with practitioners at ASML, Thermo-Fisher, and Philips reveal that formal methods can be integrated into industrial workflows without major disruption -- scalability is addressed through ordinary modularization, and the learning curve is manageable when tool syntax resembles familiar programming languages. The biggest remaining challenge: some behaviors are too complex to express in any modeling language.

nuanced developer-experience recent

Formal Modeling and Simulation for Reliable Distributed Systems

Arun Parthiban, Sesh Nalla, Cecilia Wat-Kim (2024)

After a global outage in March 2023, Datadog adopted TLA+ to redesign their message queuing service and the model checker found a subtle bug where messages could be permanently lost. They complemented formal modeling with lightweight SimPy simulations for performance and chaos testing for implementation fidelity. One of the most detailed publicly available accounts of the complete FM lifecycle outside AWS.

cheerleader distributed recent

What the Proof Implies

seL4 Foundation (2024)

Proving that a kernel implementation matches its spec gives you buffer overflow freedom, null dereference freedom, memory leak freedom, and code injection immunity -- for free. The seL4 team explains exactly what their proof guarantees (and what assumptions it rests on), providing the clearest snapshot of what full functional correctness buys you in practice.

cheerleader security short

Scaling Static Analyses at Facebook

Dino Distefano, Manuel Fahndrich, Francesco Logozzo, Peter W. O'Hearn (2019)

Separation logic, one of the deepest ideas in program analysis, runs on every code change at Facebook and has led to over 100,000 bug fixes. When Facebook switched the same analysis from a nightly batch job to code-review comments, the fix rate jumped from near zero to over 70%. Google chose simple analyses with few false positives; Facebook chose deep analyses that find more bugs -- the contrast reveals that there is no single right answer for deploying formal reasoning at scale.

cheerleader developer-experience

Lessons from Building Static Analysis Tools at Google

Caitlin Sadowski, Edward Aftandilian, Alex Eagle, Liam Miller-Cushon, Ciera Jaspan (2018)

Google failed three times to deploy static analysis before succeeding, and the fix was not better analysis but better workflow integration. The hard-won lesson: developers perceived 74% of compile-time warnings as real problems versus 21% of the same warnings on already-committed code. Where you show the result matters more than what you find.

cheerleader developer-experience

Automated Reasoning and Amazon s2n

Colm MacCarthaigh (2017)

Formal verification of cryptographic code runs on every pull request in CI/CD, proving correctness for all possible inputs -- not as a research exercise, but on the TLS library that handles 100% of Amazon S3's SSL traffic. The s2n project showed that SMT-based verification could be as routine as running unit tests.

cheerleader security sat-smt short

How Amazon Web Services Uses Formal Methods

Chris Newcombe, Tim Rath, Fan Zhang, Bogdan Munteanu, Marc Brooker, Michael Deardeuff (2015)

Engineers at AWS adopted TLA+ and found 10+ critical bugs across S3, DynamoDB, and EBS that had survived design reviews, code reviews, and extensive testing. The landmark industrial adoption paper, with over 85,000 downloads, demonstrated that formal methods could deliver ROI even in a fast-shipping engineering culture -- and that engineers could learn TLA+ and get useful results in 2-3 weeks.

cheerleader distributed

A Decade of Software Model Checking with SLAM

Thomas Ball, Vladimir Levin, Sriram K. Rajamani (2011)

Every Windows device driver submission gets automatically verified by a tool built on Z3. SLAM introduced counterexample-guided abstraction refinement (CEGAR), turning software model checking from a research idea into something that runs routinely on millions of lines of production C code at Microsoft.

cheerleader sat-smt

Formal Methods: Practice and Experience

Jim Woodcock, Peter Gorm Larsen, Juan Bicarregui, John Fitzgerald (2009)

Across 62 industrial projects spanning transport, finance, and defense, 92% reported quality improvements from formal methods -- yet almost none could put a dollar figure on the savings. The most comprehensive empirical survey of FM in practice documents both real successes (the Paris Metro, Tokeneer's zero defects) and the persistent unsolved problem: making the business case.

cheerleader economics

Acceptance of Formal Methods: Lessons from Hardware Design

David L. Dill, John Rushby (1996)

Formal methods won in hardware not because chip designers became correctness purists but because a single bug cost Intel $400 million. Dill and Rushby distill four principles from hardware's experience: automate aggressively, use verification to find bugs rather than prove perfection, target the hardest problems, and work on real designs. Their playbook predicted how AWS would adopt formal methods twenty years later.

nuanced hardware economics short

Where It Breaks Down

Even the strongest advocates of formal methods will tell you there are things verification cannot touch. These readings are honest about the limits: solver instability, the specification problem, properties that resist formalization, and the gap between what a proof covers and what can actually go wrong. Read these before you get too comfortable.

What Works (and Doesn't) Selling Formal Methods

Mike Dodds (2025)

Most FM projects simply do not pencil out for clients. A Galois researcher shares hard-won lessons from years of trying to sell formal verification: clients have priced in bugs, proof guarantees get distorted as they propagate up the org chart, and cheaper techniques like testing should be exhausted first. The most candid insider account of why FM adoption stalls despite technical maturity.

critic economics recent

On the Impact of Formal Verification on Software Development

Eric Mugnier, Yuanyuan Zhou, Ranjit Jhala, Michael Coblenz (2025)

Fourteen experienced Dafny users describe proof brittleness as "soul crushing" and proof debugging as harder than regular debugging because the verifier's internal state is never visible. But they also report something unexpected: verification enables fearless optimization, letting them make aggressive performance changes they would never attempt without formal guarantees. The first rigorous user study of what auto-active verification actually feels like in practice.

nuanced developer-experience sat-smt recent

Mariposa: the Butterfly Effect in SMT-based Program Verification

Yi Zhou (2024)

Rename a variable and your proof breaks. Across six major verification projects, 2-5% of SMT queries are unstable, and two tiny Z3 commits caused 67% of stability regressions. If 2% of your CI tests randomly failed, you would consider the pipeline broken.

critic sat-smt recent

Mariposa: Measuring SMT Instability in Automated Program Verification

Yi Zhou, Jay Bosamiya, Yoshiki Takashima, Jessica Li, Marijn Heule, Bryan Parno (2023)

Rename a variable and your proof breaks. Across six major verification projects and over three million mutated queries, 2-5% of SMT queries are unstable, and two tiny Z3 commits totaling about 30 lines caused 67% of stability regressions. If 2% of your CI tests randomly failed, you would consider the pipeline broken -- but this is the state of automated program verification.

critic sat-smt

Formal Methods Only Solve Half My Problems

Marc Brooker (2022)

Safety and liveness proofs are only half the design questions for distributed systems -- TLA+ cannot tell you what latency your system will have, how much hardware you need, or how it behaves under overload. A co-author of the canonical AWS/TLA+ paper honestly scopes what formal methods leave on the table and calls for hybrid tools combining specifications with quantitative simulation.

nuanced distributed short

The Verification Gap

Matt Weisberger (2022)

Almost no mainstream language has formally defined semantics, so there is always a gap between what is verified and what actually runs. Weisberger names this the "verification gap" and surveys three strategies for bridging it: extract verified code from Coq (CompCert), parse C into Isabelle (seL4), or generate tests from TLA+ specs (MongoDB). No approach fully eliminates the distance between proof and production.

nuanced specification short

Breaking Aave Upgradeability

Josselin Feist (2020)

Five security firms reviewed Aave's lending pool contracts and one used formal verification, yet a critical vulnerability let anyone self-destruct the protocol. The proofs covered individual function behaviors while the real attack surface lived in multi-contract interactions and upgradeability patterns. A simple static analysis tool could have found the bug in seconds -- the most concrete demonstration that "verified" does not mean "secure."

critic security short

The Problem with Formal Verification

Marek Kirejczyk (2020)

A team invested in formal verification from day one for Ethereum smart contracts: hired a PhD specialist, integrated proofs into CI, used the same tools as MakerDAO. After four months of growing verification times and proofs that broke on trivial code changes, they abandoned it. The clearest "we tried it and it failed" postmortem in the library.

critic developer-experience short

Validating SMT Solvers via Semantic Fusion

Dominik Winterer, Chengyu Zhang, Zhendong Su (2020)

In four months of automated testing, this tool found 29 soundness bugs in Z3 and CVC4 where the solver returned the wrong sat/unsat answer -- silently invalidating any downstream verification. If the very tool your proof depends on can be wrong, what does your proof actually guarantee? A direct challenge to anyone who treats solver output as ground truth.

critic sat-smt

An Empirical Study on the Correctness of Formally Verified Distributed Systems

Pedro Fonseca, Kaiyuan Zhang, Xi Wang, Arvind Krishnamurthy (2017)

Zero protocol bugs in verified systems versus protocol bugs in every unverified system -- but 16 bugs total, all at the boundary between verified and unverified code. One bug caused the verifier to silently report "VERIFIED" for any program when Z3 crashed. The first empirical study of formally verified distributed systems is simultaneously the strongest evidence for verification and a sobering catalog of how it fails.

nuanced sat-smt distributed

Formal Verification: The Gap Between Perfect Code and Reality

Ray Wang (2017)

An MIT student entered a formal verification course believing it was the future of software and left a skeptic after experiencing 10x proof-to-code ratios, months of solver-wrangling, and verification tools that silently reported success on crash. The KRACK attack on WPA2 -- exploiting the gap between two individually-verified components -- drives the point home.

critic developer-experience

This World of Ours

James Mickens (2014)

If Mossad is after you, no amount of cryptography will help; if script kiddies are after you, basic hygiene suffices. The funniest reading in the CARS library makes a serious point beneath the absurdist humor: formal security guarantees operate within threat models, and threat models are themselves informal assumptions about the world.

critic security short

Really Rethinking 'Formal Methods'

David Lorge Parnas (2010)

After 40 years of formal methods research, a pioneer of software engineering calls them a complete failure in practice -- not because mathematics is unnecessary, but because the community has been promoting the wrong formalisms. Parnas argues that FM models are often more complex than the code and that "correctness proof" is a misleading borrowing from mathematics that engineers should abandon.

critic philosophy specification

Security in the Real World: How to Evaluate Security

Bruce Schneier (1999)

Proving a cryptographic algorithm correct means nothing when the system fails through bad randomization, broken key management, or a user who writes the password on a sticky note. Schneier catalogs twenty ways real security systems fail despite correct mathematics, making the case that attackers target assumptions, not logic.

critic security

The Alternatives

Formal verification is not the only game in town. Testing, fuzzing, chaos engineering, property-based testing, and production observability all compete for the same budget and engineering attention. These readings make the case for alternative approaches and ask when each one wins.

Test, Don't (Just) Verify

Alperen Keles (2025)

Autoformalization is part of the trusted computing base and cannot itself be verified. A formal methods PhD student systematically catalogs where verification falls short -- proof assistants use encodings orders of magnitude slower than production code, failed proofs give no signal, and many properties lack formal models. His proposed workflow: verify a reference implementation, then differentially test the production system against it.

nuanced developer-experience recent

Why Fuzzing over Formal Verification?

Josselin Feist, Tarun Bansal, Gustavo Grieco (2024)

The two most-publicized bugs found by formal verification in major DeFi protocols were both rediscovered by a fuzzer in minutes on a commodity laptop. Trail of Bits argues that writing good invariants is 80% of the work and the checking tool is secondary -- and fuzzers are dramatically easier, faster, and more accessible than provers. FM should only be considered after extensive fuzzing, not before.

critic security recent short

I Test in Prod

Charity Majors (2019)

Modern distributed systems have emergent behaviors that no pre-production test suite can replicate, so every deploy is an experiment whether you admit it or not. The CTO of Honeycomb argues that resilience comes from surviving many errors, not eliminating all errors, and never once mentions formal methods. A sharp challenge to the premise of a verification course.

critic developer-experience short

Chaos Engineering

Ali Basiri, Niosha Behnam, Ruud de Rooij, Lorin Hochstein, Luke Kosewski, Justin Reynolds, Casey Rosenthal (2016)

Distributed systems exhibit emergent behavior that no amount of pre-production analysis can predict, so Netflix builds confidence by injecting real failures in production and watching what happens. Chaos engineering is the empirical counterpart to formal verification: it builds confidence through controlled experiments rather than mathematical proof.

critic distributed

Experiences with QuickCheck: Testing the Hard Stuff and Staying Sane

John Hughes (2016)

Write a formal model, generate random tests automatically, and shrink failures to minimal reproducible cases -- property-based testing bridges the gap between testing and formal specification. Hughes describes finding bugs in Volvo automotive software, Ericsson databases, and Erlang concurrency code that had escaped conventional testing, and discovers that errors are often in the model, not the code.

cheerleader developer-experience

The Verification of a Distributed System

Caitie McCaffrey (2016)

Composing provably correct components does not yield a provably correct system. McCaffrey surveys the full verification landscape for distributed systems -- from TLA+ and Coq through QuickCheck and Jepsen to chaos engineering -- and argues that no single technique is sufficient. The best strategy layers formal methods for critical components with testing and fault injection for system-level behavior.

nuanced distributed

Testing v. Informal Reasoning

Dan Luu (2014)

The same cognitive biases that cause developers to write bugs also cause them to omit tests for those bugs. Luu marshals evidence that informal reasoning about correctness is systematically unreliable -- even for experienced engineers -- and that testing alone cannot close the gap.

critic short

How Does Formal Verification Affect Software Testing?

John Regehr (2012)

Verification can only replace testing when four demanding conditions all hold simultaneously -- and even then, specifications themselves need a testing-like process to gain confidence. Regehr predicts cost reduction, not defect reduction, will be formal methods' killer app, and envisions a future of patchwork verification where different layers get different levels of assurance.

nuanced short

AI Changes Everything

Large language models can now generate code faster than humans can review it. That changes the calculus for formal methods in ways nobody fully understands yet. These readings explore whether AI makes verification more necessary, more feasible, or both, and what skills matter when the bottleneck shifts from writing code to confirming it does the right thing.

The Final Bottleneck

Armin Ronacher (2026)

AI has made code generation fast and cheap, but code review, understanding, and accountability remain fundamentally human. The creator of Flask draws a parallel to textile industrialization: eliminating one bottleneck always reveals the next constraint downstream. If machines write the code and machines review the code, the question becomes what gives humans justified confidence to approve.

nuanced economics recent short

Coding After Coders: The End of Computer Programming as We Know It

Clive Thompson (2026)

Most professional developers now spend their days talking to AI in plain English rather than writing code. Based on interviews with 70+ engineers at Google, Amazon, Microsoft, and Apple, Thompson documents the strange new rituals -- scolding AI agents, writing prompt files, discovering that emotional language affects output -- and the deep unease about losing the ability to understand your own codebase.

nuanced philosophy recent

Verifiability is the Limit

Alperen Keles (2025)

The bottleneck for AI-assisted programming is not code generation but verification -- the ability of a human to confirm that generated code matches intent. LLMs crush UI tasks because verification is just looking at the screen, but struggle with backend code where checking correctness requires real engineering effort. If verification is the constraint, then better verification tools raise the ceiling on what AI can deliver.

nuanced developer-experience recent short

The Coming Need for Formal Specification

Ben Congdon (2025)

AI generates tests as easily as it generates code, so testing alone cannot serve as the quality gate for AI-written software. A Databricks staff engineer proposes a layered verification stack -- natural-language specs at the top, TLA+ models at component boundaries, formal proofs for critical code -- and argues that the ability to write formal specifications is becoming the scarcest and most valuable engineering skill.

cheerleader specification recent short

Prediction: AI Will Make Formal Verification Go Mainstream

Martin Kleppmann (2025)

The author of Designing Data-Intensive Applications predicts three forces will bring formal verification into the mainstream: AI reduces the cost of writing proofs, AI-generated code demands verification, and proof checkers give LLMs reliable feedback. He calls this "vericoding" in contrast to "vibecoding."

cheerleader economics recent short

Verina: Benchmarking Verifiable Code Generation

Zhe Ye, Zhengxu Yan, Jingxuan He, Timothe Kasriel, Kaiyu Yang, Dawn Song (2025)

The best LLM achieves 72.6% on code generation but only 4.9% on proof generation. Across 189 curated Lean problems, even iterative refinement with compiler feedback raises proof success to just 20% -- meaning 80% of proofs still fail after 64 attempts. The hardest empirical reality check for anyone who thinks AI will automate formal verification soon.

nuanced sat-smt recent

Limitations on Formal Verification for AI Safety

Andrew Dickson (2024)

Proposals for "provably safe AI" dramatically overestimate what formal verification can deliver, because proofs provide guarantees about symbol systems, not about the physical world. Formalizing "safety" is itself an unsolved specification problem, and world models for biology, social systems, and cognition are far beyond current capabilities. A technically grounded pushback against the AI safety community's enthusiasm for verification as a silver bullet.

critic philosophy recent

Formal Methods + AI: Where Does Galois Fit In?

Taisa Kushner (2024)

Neural networks excel in exactly the domains where formal specifications are hardest to write -- vision, language, biology. Galois catalogs why verifying ML models is fundamentally harder than verifying traditional software and where the field's best tools fall short, making the case that the specification skills CARS teaches are the missing ingredient for AI safety.

cheerleader specification recent

The Intellectual History

The debate over program correctness goes back to the 1960s. These are the foundational papers and essays that shaped how computer scientists think about proof, complexity, trust, and the limits of formal reasoning. They are older, sometimes denser, and always worth reading carefully.

Revisiting an Early Critique of Formal Verification

Lawrence C. Paulson (2025)

The creator of Isabelle/HOL takes on the most famous attack on program verification, 45 years later. DeMillo et al. argued that formal proofs would be infeasibly long and that only social consensus establishes truth -- but proof assistants now formalize major theorems routinely, and a Fields medalist turned to machine checking after peer review missed his own error for 13 years.

cheerleader philosophy recent

Social Processes, Program Verification and All That

Andrea Asperti, Herman Geuvers, Raja Natarajan (2009)

Thirty years after DeMillo, Lipton, and Perlis declared program verification dead, three proof-assistant researchers revisit every argument and find that most have been overtaken by tools the original authors could not have imagined. Compilers once made hand assembly obsolete; proof assistants may do the same for hand proof. The catch: formalization still costs roughly one week per page of textbook math.

nuanced philosophy

Why Most Published Research Findings Are False

John P. A. Ioannidis (2005)

When studies are underpowered, effect sizes are small, and analytical flexibility is high, the majority of published findings are likely wrong. The most-accessed paper in PLoS history catalyzed the replication crisis and offers a sharp indirect argument for formal verification: a proved property is either true or the proof is wrong -- there is no p-value to game. But beware specification shopping, the verification analog of p-hacking.

nuanced philosophy

Why Speculate?

Michael Crichton (2002)

Read a newspaper article about your area of expertise, notice every error, then turn the page and believe the next article about something you know nothing about. Crichton's Gell-Mann Amnesia effect is a parable for software: developers encounter bugs in code they trusted, then go right back to trusting their intuition on the next module. A case for replacing speculation with proof, without ever mentioning formal methods.

nuanced philosophy

On the Cruelty of Really Teaching Computing Science

Edsger W. Dijkstra (1988)

Computing is a radical novelty that cannot be understood by analogy to familiar machines, and programming is applied formal mathematics -- full stop. Dijkstra's most confrontational essay dismisses software engineering as a doomed discipline and insists that proof and program must be constructed together. Whether you find it clarifying or infuriating says a lot about where you stand on what CARS is trying to do.

cheerleader philosophy classic

Program Verification: The Very Idea

James H. Fetzer (1988)

Algorithms are abstract mathematical objects that can be verified deductively, but programs run on physical machines whose behavior can only be established inductively -- so program verification is a category error. Fetzer's philosophical attack provoked one of the fiercest controversies in CACM history. The core question remains live: when Z3 proves a property, is it proving something about the real system or just a model of it?

critic philosophy classic

No Silver Bullet -- Essence and Accident in Software Engineering

Frederick P. Brooks Jr. (1986)

No single technology will ever produce a ten-fold improvement in software productivity, because the hard part is essential complexity -- specifying and designing the conceptual construct -- not the accidental complexity of encoding it. Brooks assessed formal verification in 1986 and concluded the specification problem was the real bottleneck. CARS students will recognize this firsthand: writing the Z3 spec is the hard part.

nuanced philosophy classic

Programming as Theory Building

Peter Naur (1985)

The most important product of programming is not code or documentation but a theory -- a mental model of how the program maps to the world. When the original team leaves and that theory is lost, the program effectively dies, no matter how well it runs. Naur's framework asks a question formal verification cannot dodge: if the richest design knowledge resists formalization, what exactly does a proof capture?

nuanced philosophy classic

Reflections on Trusting Trust

Ken Thompson (1984)

A compiler backdoor can reproduce itself even after the malicious source code is deleted. In three elegant pages, Thompson demonstrates that no amount of source-level verification can guarantee a program is trustworthy, because the toolchain itself is a potential attack vector. In the era of SolarWinds and xz-utils, his conclusion hits harder than ever: trust in software is ultimately trust in people.

critic security classic short

Social Processes and Proofs of Theorems and Programs

Richard A. DeMillo, Richard J. Lipton, Alan J. Perlis (1979)

Mathematical proofs gain credibility through social processes -- reading, debating, simplifying, reusing -- that program verifications cannot participate in. Co-authored by the first Turing Award winner, this is the most famous attack on program verification ever published. Nearly fifty years later, the formal methods community is still writing rebuttals.

critic philosophy classic

The Humble Programmer

Edsger W. Dijkstra (1972)

Testing shows the presence of bugs, never their absence. Dijkstra's 1972 Turing Award lecture launched the structured programming movement and made the single most-cited case for building proofs and programs hand in hand. The intellectual origin story for everything CARS teaches.

cheerleader philosophy classic

The Architecture of Complexity

Herbert A. Simon (1962)

Hierarchical, modular systems survive interruptions that destroy monolithic ones. Simon's watchmaker parable -- one of the most retold thought experiments in engineering -- explains why decomposition is not just convenient but evolutionarily inevitable. The same principle underlies how SAT/SMT solvers exploit structure through theory decomposition.

cheerleader philosophy classic

Getting Your Hands Dirty

These readings are about the practical experience of using formal methods tools day to day. What does it feel like to write a TLA+ spec? How do you integrate a SAT solver into a real workflow? What do practitioners wish they had known before starting? If you want to know what the work actually looks like, start here.

The Pragmatic Magic of Semi-Formal Methods

Vidhi Katkoria, Justin Moore (2025)

You do not need a PhD to borrow ideas from formal methods. Antithesis walks through a concrete Rust example where defining entities, assumptions, and obligations in the type system catches an authorization bug that code review alone would miss. The cost is 2-3x more work than standard development, but you get safer onboarding, clearer reasoning, and bugs caught at compile time.

cheerleader developer-experience recent

Using Formal Methods at Work

Hillel Wayne (2019)

Six concrete entry points for introducing formal specifications at work, ranked from low-friction to high-ambition: document, showcase, find, understand, vet, build. Wayne frames specs as orders of magnitude smaller than code, not code themselves, and not tests -- they catch design errors that tests miss. The most actionable adoption playbook in the library.

cheerleader developer-experience short

Modern SAT Solvers: Fast, Neat and Underused (Part 1 of N)

Martin Horenovsky (2018)

A huge class of NP-complete problems can be encoded as SAT and solved with off-the-shelf solvers, yet most programmers have never tried. Horenovsky walks through encoding Sudoku as a SAT problem in two hours -- slower than a hand-tuned solver, but the development-time savings are the real win. The go-to introduction to SAT for working programmers.

cheerleader sat-smt

In-Depth: Static Code Analysis

John Carmack (2011)

The creator of Doom and Quake calls static analysis the most important quality improvement he has adopted, arguing it is professionally irresponsible not to use these tools. The mindset shift matters more than the bugs caught: code that cooperates with automated analysis is also better code for humans.

cheerleader developer-experience short

The Power of Ten: Rules for Developing Safety-Critical Code

Gerard J. Holzmann (2006)

Ten coding rules that became JPL's standard for Mars rover flight software, written by the inventor of the SPIN model checker. The profound tradeoff: ban recursion, bound every loop, restrict dynamic allocation -- deliberately limit the language so that static analysis becomes tractable. Verification is not just about having good tools; it is about writing code that tools can analyze.

cheerleader specification short