Practice

A retrospective across L01–L09, then the project demos.

Where we've been

L01. Two implementations of unsigned division by 2, handed to Z3 with the question of whether they agreed on all four billion 32-bit inputs. The solver returned a single number that broke them apart. The same query shape (encode, assert the negation, check) carried from verifying to debugging to synthesizing the fix.

x = BitVec('x', 32)
s = Solver()
s.add(bvudiv2(x) != bvudiv2_a(x))
s.check()    # sat
s.model()    # x = 2147483648

L02. Inside the SAT solver. When BCP hits a conflict, CDCL builds an implication graph, reads a learned clause off a cut, and jumps back to the level where the conflict actually originated. One conflict, analyzed once, can rule out an exponential slice of the search space.

flowchart LR
  accTitle: CDCL main loop
  accDescr: A cycle through decide, propagate, conflict, analyze and learn, then backtrack and restart.
  D[decide] --> P[propagate]
  P --> Q{conflict?}
  Q -->|no| D
  Q -->|yes| A[analyze + learn]
  A --> B[backtrack]
  B --> D

L03. EUF and congruence closure. Assert f³(a) = a and f⁵(a) = a, and function congruence ripples those two equalities into five merges until every subterm lands in one class. The disequality f(a) ≠ a has nowhere left to live.

Start with six singletons:

graph LR
  accTitle: Initial state — six singleton classes
  accDescr: Each of the six subterms a, f of a, f squared of a, f cubed of a, f to the fourth of a, and f to the fifth of a sits alone in its own class.
  subgraph c1 [ ]
    n0["a"]
  end
  subgraph c2 [ ]
    n1["f(a)"]
  end
  subgraph c3 [ ]
    n2["f²(a)"]
  end
  subgraph c4 [ ]
    n3["f³(a)"]
  end
  subgraph c5 [ ]
    n4["f⁴(a)"]
  end
  subgraph c6 [ ]
    n5["f⁵(a)"]
  end
  n0 ~~~ n1 ~~~ n2 ~~~ n3 ~~~ n4 ~~~ n5
  classDef cc fill:#fffde7,stroke:#f57c00
  class c1,c2,c3,c4,c5,c6 cc

Assert a = f³(a) and merge:

graph LR
  accTitle: After merging a and f cubed of a
  accDescr: a and f cubed of a are in the same class. The other four terms remain singletons.
  subgraph c1 [ ]
    n0["a"]
    n3["f³(a)"]
  end
  subgraph c2 [ ]
    n1["f(a)"]
  end
  subgraph c3 [ ]
    n2["f²(a)"]
  end
  subgraph c5 [ ]
    n4["f⁴(a)"]
  end
  subgraph c6 [ ]
    n5["f⁵(a)"]
  end
  n0 ~~~ n1 ~~~ n2 ~~~ n4 ~~~ n5
  classDef cc fill:#fffde7,stroke:#f57c00
  class c1,c2,c3,c5,c6 cc

Congruence: since a and f³(a) share a class, f(a) and f⁴(a) must too, and f²(a) and f⁵(a) must too:

graph LR
  accTitle: After two congruence ripples
  accDescr: Three classes. a sits with f cubed of a. f of a sits with f to the fourth of a. f squared of a sits with f to the fifth of a.
  subgraph c1 [ ]
    n0["a"]
    n3["f³(a)"]
  end
  subgraph c2 [ ]
    n1["f(a)"]
    n4["f⁴(a)"]
  end
  subgraph c3 [ ]
    n2["f²(a)"]
    n5["f⁵(a)"]
  end
  n0 ~~~ n1 ~~~ n2
  classDef cc fill:#fffde7,stroke:#f57c00
  class c1,c2,c3 cc

Assert a = f⁵(a) and chase the cascade to fixpoint:

graph LR
  accTitle: Final state, one collapsed class
  accDescr: All six subterms a, f of a, f squared of a, f cubed of a, f to the fourth of a, and f to the fifth of a are in a single congruence class.
  subgraph c1 [ ]
    n0["a"]
    n1["f(a)"]
    n2["f²(a)"]
    n3["f³(a)"]
    n4["f⁴(a)"]
    n5["f⁵(a)"]
  end
  classDef cc fill:#fffde7,stroke:#f57c00
  class c1 cc

L04. LRA, LIA, bit-vectors, arrays. One engineering problem per theory and four ways the wrong theory gives a correct answer to a different question: integers collapse a blending ratio to pure beans, the LP relaxation invents a phantom solution at (24/11, 10/11), Int proves a JDK overflow bug away, and bit-blasting memory builds a four-gigabyte boolean formula. Picking the theory is itself a reduction decision.

Integer solutions to the LIA equation 3i minus 5j equals 2, spaced along a line by vector (5, 3) in the plane. Iteration boxes drawn at N=4 (smaller, no solutions inside) and N=5 (larger, three solutions inside). — LIA's integer search: solutions lie on a line, found by bounding the box.

L05. Nelson–Oppen. LRA and EUF swap three equalities across a shared boundary and reach UNSAT without either solver ever reading the other's terms. Purify into pure conjunctions, share constants, and theories pass notes through equality.

sequenceDiagram
  accTitle: Nelson-Oppen propagation chain on the L05 worked example
  accDescr: Two solvers exchange three equalities and conclude unsat. Σ_R sends x equals y to Σ_=, which derives u equals v by congruence. Σ_R then sends w equals z, contradicting f of w not equal f of z.
  autonumber
  participant R as Σ_R (LRA)
  participant E as Σ_= (EUF)
  R->>E: x = y
  E->>R: u = v
  R->>E: w = z
  Note over R,E: UNSAT

L06. DPLL(T). Replace each theory atom with a fresh boolean variable, let CDCL enumerate candidate models of the connectives, then ask the theory whether each one is real. When the theory rejects a model, the loop learns a clause and tries again. Quantifiers feed the theory's atom-level work through E-matching on ForAll axioms.

graph TD
  accTitle: DPLL(T) main loop
  accDescr: T2B builds the boolean abstraction. CDCL either returns UNSAT or a propositional model. The model is refined back to the theory layer by B2T and passed to T-solve. T-solve either accepts the model or returns a theory conflict, which triggers learning a clause and looping back.
  Start(["T-formula φ"])
  T2B["φP ← T2B(φ)"]
  CDCL{"CDCL(φP)"}
  UNSAT(["return UNSAT"])
  B2T["μT ← B2T(μP)"]
  Tsolve{"T-solve(μT)"}
  SAT(["return SAT"])
  Learn["φP ← φP ∧ ¬μP"]
  Start --> T2B
  T2B --> CDCL
  CDCL -- "UNSAT" --> UNSAT
  CDCL -- "model μP" --> B2T
  B2T --> Tsolve
  Tsolve -- "SAT" --> SAT
  Tsolve -- "UNSAT" --> Learn
  Learn --> CDCL

L07. Symbolic execution and BMC. An interpreter over symbolic state forks at every branch and accumulates guards into a path condition. Each leaf of the tree becomes one Z3 query, and the path conditions partition the inputs into the regions that reach each return.

graph TD
  accTitle: Symbolic execution tree for the sign function
  accDescr: The tree forks twice. The root has sigma mapping x to x0 with path condition true. The first branch on x0 greater than 0 reaches a feasible leaf returning 1. The else branch forks on x0 less than 0, reaching a feasible leaf returning -1 and a feasible leaf returning 0.
  root["σ = {x ↦ x₀}
φ = ⊤"]
  root -->|x₀ > 0| pos["r ↦ 1
φ = x₀ > 0"]
  root -->|x₀ ≤ 0| nz["φ = x₀ ≤ 0"]
  nz -->|x₀ < 0| neg["r ↦ -1
φ = x₀ ≤ 0 ∧ x₀ < 0"]
  nz -->|x₀ ≥ 0| zero["r ↦ 0
φ = x₀ ≤ 0 ∧ x₀ ≥ 0"]
  classDef feasible fill:#d9f5c5,stroke:#5a8a3a
  class pos,neg,zero feasible

L08. Hoare logic and weakest preconditions. One invariant per loop annotates the program, loop-cut compiles it into a loop-free IVL, and backward WP yields a small batch of Z3 queries covering every input. Where L07's BMC stopped at depth k, one predicate per loop carried the proof to every iteration count.

graph TD
  accTitle: VC Gen pipeline from annotated source to Z3
  accDescr: Annotated source program is compiled by loop-cut into an IVL with no loops. The IVL is walked backward by the WP rules to produce one verification condition per Hoare obligation. Z3 checks each one.
  src["annotated source
mini IMP + invariant"]
  ivl["IVL
loop-free"]
  vc["verification conditions
one per obligation"]
  z3["Z3
VALID or counterexample"]
  src -->|loop-cut| ivl
  ivl -->|backward WP| vc
  vc -->|check| z3

L09. PEC (2009) and Rosette (today). PEC proves a compiler optimization correct once and for all by walking the FIND and REPLACE CFGs in lockstep, marking sync points at the metavariables, and dispatching each path-pair to the solver as one obligation. Rosette pulls verify, solve, and synthesize into the language: #lang rosette and the engine students built by hand in L07 and L08 comes free.

A slide titled 'Find Synchronization Points' with two panels. Left panel lists three bullets: traverse in lockstep, stop at statement metavariables, prune infeasible paths. Right panel shows two control-flow graphs side by side connected by dashed red lines marking sync points at the entry, two interior locations around a loop body, and the exit. — PEC walks two CFGs in lockstep, syncing at statement metavariables.

What we asked the solver

Every week's lecture sat in one of two shapes. L01 through L06 encoded a constraint and asked Z3 for a satisfying assignment, where the model was the answer the program wanted. L07 through L09 inverted the question, asserting the negation of a property and reading a satisfying assignment as a counterexample. The engine never changed.

Cluster	What we asked Z3	The model meant
L01–L06	Find a satisfying assignment for this encoding.	The values the program needed.
L07–L09	Find a satisfying assignment for the negated property.	An input that broke the program.

Theory names these two shapes and pushes one quantifier past the second.

Readings

Four reading reflections paced the term. Each slot offered three options around a single question, and groups posted discussion summaries to Ed under the Readings category.

The Case Against? (R1). Dodds, Gabriel, and Hoare. The room grounded these arguments in lived workplace experience from week one. Counter-examples to "worse is better" came in from medical devices, cryptography, and financial trading; tech-debt stories arrived from across the cohort's workplaces; several students independently raised whether AI changes the cost curve.

The Evidence Is In (R2). Dodds, Helwer, and Lopes. Multiple groups arrived independently at specification as the harder problem than proof. Several students went past the readings: one wrote up a hands-on attempt to verify a real system from work; another argued that production legacy code becomes its own implicit specification; a third pushed back on Dodds with examples that fail his own criteria.

Honest Limits (R3). Wayne, Hughes, and Zhou. Zhou's instability findings drove the discussion: a 2.6 percent query-flip rate from a variable rename, on the same solver students had been using all term. Several groups closed their summaries with variants of "how do we ask the right question of the solver?"

The Future Is Unwritten (R4). Kleppmann, Thomas, and de Moura. Multiple Thomas readers cited METR's nineteen-percent slowdown number alongside concrete examples from their own teams. Synthesis across the four rounds emerged unprompted: one student wove together papers from across the quarter into a single argument; another linked Gabriel from R1 forward to the AI debate. A separate concern came up across the discussions: the junior-to-senior pipeline and what happens to verification expertise when fewer juniors are hired.

Multiple students reported sharing the readings with coworkers, and several wrote synthesis essays connecting all four rounds. Several of these reflections changed how we'll teach the readings next time.

Project demos

Demos run after the retrospective. Walk-ups are welcome at the break.