Practice
A retrospective across L01–L09, then the project demos.
Where we've been
L01. Two implementations of unsigned division by 2, handed to Z3 with the question of whether they agreed on all four billion 32-bit inputs. The solver returned a single number that broke them apart. The same query shape (encode, assert the negation, check) carried from verifying to debugging to synthesizing the fix.
x = BitVec('x', 32)
s = Solver()
s.add(bvudiv2(x) != bvudiv2_a(x))
s.check() # sat
s.model() # x = 2147483648
L02. Inside the SAT solver. When BCP hits a conflict, CDCL builds an implication graph, reads a learned clause off a cut, and jumps back to the level where the conflict actually originated. One conflict, analyzed once, can rule out an exponential slice of the search space.
flowchart LR
accTitle: CDCL main loop
accDescr: A cycle through decide, propagate, conflict, analyze and learn, then backtrack and restart.
D[decide] --> P[propagate]
P --> Q{conflict?}
Q -->|no| D
Q -->|yes| A[analyze + learn]
A --> B[backtrack]
B --> DL03. EUF and congruence closure. Assert f³(a) = a and f⁵(a) = a, and function congruence ripples those two equalities into five merges until every subterm lands in one class. The disequality f(a) ≠ a has nowhere left to live.
Start with six singletons:
graph LR
accTitle: Initial state — six singleton classes
accDescr: Each of the six subterms a, f of a, f squared of a, f cubed of a, f to the fourth of a, and f to the fifth of a sits alone in its own class.
subgraph c1 [ ]
n0["a"]
end
subgraph c2 [ ]
n1["f(a)"]
end
subgraph c3 [ ]
n2["f²(a)"]
end
subgraph c4 [ ]
n3["f³(a)"]
end
subgraph c5 [ ]
n4["f⁴(a)"]
end
subgraph c6 [ ]
n5["f⁵(a)"]
end
n0 ~~~ n1 ~~~ n2 ~~~ n3 ~~~ n4 ~~~ n5
classDef cc fill:#fffde7,stroke:#f57c00
class c1,c2,c3,c4,c5,c6 ccAssert a = f³(a) and merge:
graph LR
accTitle: After merging a and f cubed of a
accDescr: a and f cubed of a are in the same class. The other four terms remain singletons.
subgraph c1 [ ]
n0["a"]
n3["f³(a)"]
end
subgraph c2 [ ]
n1["f(a)"]
end
subgraph c3 [ ]
n2["f²(a)"]
end
subgraph c5 [ ]
n4["f⁴(a)"]
end
subgraph c6 [ ]
n5["f⁵(a)"]
end
n0 ~~~ n1 ~~~ n2 ~~~ n4 ~~~ n5
classDef cc fill:#fffde7,stroke:#f57c00
class c1,c2,c3,c5,c6 ccCongruence: since a and f³(a) share a class, f(a) and f⁴(a) must too, and f²(a) and f⁵(a) must too:
graph LR
accTitle: After two congruence ripples
accDescr: Three classes. a sits with f cubed of a. f of a sits with f to the fourth of a. f squared of a sits with f to the fifth of a.
subgraph c1 [ ]
n0["a"]
n3["f³(a)"]
end
subgraph c2 [ ]
n1["f(a)"]
n4["f⁴(a)"]
end
subgraph c3 [ ]
n2["f²(a)"]
n5["f⁵(a)"]
end
n0 ~~~ n1 ~~~ n2
classDef cc fill:#fffde7,stroke:#f57c00
class c1,c2,c3 ccAssert a = f⁵(a) and chase the cascade to fixpoint:
graph LR
accTitle: Final state, one collapsed class
accDescr: All six subterms a, f of a, f squared of a, f cubed of a, f to the fourth of a, and f to the fifth of a are in a single congruence class.
subgraph c1 [ ]
n0["a"]
n1["f(a)"]
n2["f²(a)"]
n3["f³(a)"]
n4["f⁴(a)"]
n5["f⁵(a)"]
end
classDef cc fill:#fffde7,stroke:#f57c00
class c1 ccL04. LRA, LIA, bit-vectors, arrays. One engineering problem per theory and four ways the wrong theory gives a correct answer to a different question: integers collapse a blending ratio to pure beans, the LP relaxation invents a phantom solution at (24/11, 10/11), Int proves a JDK overflow bug away, and bit-blasting memory builds a four-gigabyte boolean formula. Picking the theory is itself a reduction decision.
L05. Nelson–Oppen. LRA and EUF swap three equalities across a shared boundary and reach UNSAT without either solver ever reading the other's terms. Purify into pure conjunctions, share constants, and theories pass notes through equality.
sequenceDiagram accTitle: Nelson-Oppen propagation chain on the L05 worked example accDescr: Two solvers exchange three equalities and conclude unsat. Σ_R sends x equals y to Σ_=, which derives u equals v by congruence. Σ_R then sends w equals z, contradicting f of w not equal f of z. autonumber participant R as Σ_R (LRA) participant E as Σ_= (EUF) R->>E: x = y E->>R: u = v R->>E: w = z Note over R,E: UNSAT
L06. DPLL(T). Replace each theory atom with a fresh boolean variable, let CDCL enumerate candidate models of the connectives, then ask the theory whether each one is real. When the theory rejects a model, the loop learns a clause and tries again. Quantifiers feed the theory's atom-level work through E-matching on ForAll axioms.
graph TD
accTitle: DPLL(T) main loop
accDescr: T2B builds the boolean abstraction. CDCL either returns UNSAT or a propositional model. The model is refined back to the theory layer by B2T and passed to T-solve. T-solve either accepts the model or returns a theory conflict, which triggers learning a clause and looping back.
Start(["T-formula φ"])
T2B["φP ← T2B(φ)"]
CDCL{"CDCL(φP)"}
UNSAT(["return UNSAT"])
B2T["μT ← B2T(μP)"]
Tsolve{"T-solve(μT)"}
SAT(["return SAT"])
Learn["φP ← φP ∧ ¬μP"]
Start --> T2B
T2B --> CDCL
CDCL -- "UNSAT" --> UNSAT
CDCL -- "model μP" --> B2T
B2T --> Tsolve
Tsolve -- "SAT" --> SAT
Tsolve -- "UNSAT" --> Learn
Learn --> CDCLL07. Symbolic execution and BMC. An interpreter over symbolic state forks at every branch and accumulates guards into a path condition. Each leaf of the tree becomes one Z3 query, and the path conditions partition the inputs into the regions that reach each return.
graph TD
accTitle: Symbolic execution tree for the sign function
accDescr: The tree forks twice. The root has sigma mapping x to x0 with path condition true. The first branch on x0 greater than 0 reaches a feasible leaf returning 1. The else branch forks on x0 less than 0, reaching a feasible leaf returning -1 and a feasible leaf returning 0.
root["σ = {x ↦ x₀}
φ = ⊤"]
root -->|x₀ > 0| pos["r ↦ 1
φ = x₀ > 0"]
root -->|x₀ ≤ 0| nz["φ = x₀ ≤ 0"]
nz -->|x₀ < 0| neg["r ↦ -1
φ = x₀ ≤ 0 ∧ x₀ < 0"]
nz -->|x₀ ≥ 0| zero["r ↦ 0
φ = x₀ ≤ 0 ∧ x₀ ≥ 0"]
classDef feasible fill:#d9f5c5,stroke:#5a8a3a
class pos,neg,zero feasibleL08. Hoare logic and weakest preconditions. One invariant per loop annotates the program, loop-cut compiles it into a loop-free IVL, and backward WP yields a small batch of Z3 queries covering every input. Where L07's BMC stopped at depth k, one predicate per loop carried the proof to every iteration count.
graph TD accTitle: VC Gen pipeline from annotated source to Z3 accDescr: Annotated source program is compiled by loop-cut into an IVL with no loops. The IVL is walked backward by the WP rules to produce one verification condition per Hoare obligation. Z3 checks each one. src["annotated source
mini IMP + invariant"] ivl["IVL
loop-free"] vc["verification conditions
one per obligation"] z3["Z3
VALID or counterexample"] src -->|loop-cut| ivl ivl -->|backward WP| vc vc -->|check| z3
L09. PEC (2009) and Rosette (today). PEC proves a compiler optimization correct once and for all by walking the FIND and REPLACE CFGs in lockstep, marking sync points at the metavariables, and dispatching each path-pair to the solver as one obligation. Rosette pulls verify, solve, and synthesize into the language: #lang rosette and the engine students built by hand in L07 and L08 comes free.
What we asked the solver
Every week's lecture sat in one of two shapes. L01 through L06 encoded a constraint and asked Z3 for a satisfying assignment, where the model was the answer the program wanted. L07 through L09 inverted the question, asserting the negation of a property and reading a satisfying assignment as a counterexample. The engine never changed.
| Cluster | What we asked Z3 | The model meant |
|---|---|---|
| L01–L06 | Find a satisfying assignment for this encoding. | The values the program needed. |
| L07–L09 | Find a satisfying assignment for the negated property. | An input that broke the program. |
Theory names these two shapes and pushes one quantifier past the second.
Readings
Four reading reflections paced the term. Each slot offered three options around a single question, and groups posted discussion summaries to Ed under the Readings category.
The Case Against? (R1). Dodds, Gabriel, and Hoare. The room grounded these arguments in lived workplace experience from week one. Counter-examples to "worse is better" came in from medical devices, cryptography, and financial trading; tech-debt stories arrived from across the cohort's workplaces; several students independently raised whether AI changes the cost curve.
The Evidence Is In (R2). Dodds, Helwer, and Lopes. Multiple groups arrived independently at specification as the harder problem than proof. Several students went past the readings: one wrote up a hands-on attempt to verify a real system from work; another argued that production legacy code becomes its own implicit specification; a third pushed back on Dodds with examples that fail his own criteria.
Honest Limits (R3). Wayne, Hughes, and Zhou. Zhou's instability findings drove the discussion: a 2.6 percent query-flip rate from a variable rename, on the same solver students had been using all term. Several groups closed their summaries with variants of "how do we ask the right question of the solver?"
The Future Is Unwritten (R4). Kleppmann, Thomas, and de Moura. Multiple Thomas readers cited METR's nineteen-percent slowdown number alongside concrete examples from their own teams. Synthesis across the four rounds emerged unprompted: one student wove together papers from across the quarter into a single argument; another linked Gabriel from R1 forward to the AI debate. A separate concern came up across the discussions: the junior-to-senior pipeline and what happens to verification expertise when fewer juniors are hired.
Multiple students reported sharing the readings with coworkers, and several wrote synthesis essays connecting all four rounds. Several of these reflections changed how we'll teach the readings next time.
Project demos
Demos run after the retrospective. Walk-ups are welcome at the break.