Theory
For nine weeks we have used the solver to encode problems and to verify properties. The same engine can also find inputs that make programs succeed (angelic execution) and programs that satisfy a spec on every input (synthesis).
Flip the oracle
The course has used the solver in two shapes. In L01–L06 we encoded problems and asked Z3 for a satisfying assignment, the model that mapped each symbolic variable to a value the constraints accepted. SAT found a package installation that fit the dependencies. SMT found a sudoku solution. The encoding was the input to the solver, and the model was the output.
In L07–L09 we used the solver to refute. We negated a property, asserted the negation, and asked Z3 to find a model. A sat answer was a counterexample. An unsat answer was the verification: no input violated the property. Symbolic execution and weakest preconditions both compile to that negate-and-check shape.
Verification has been the course's word for the counterexample use. The satisfying-input use is angelic execution.
| Demonic verification | Angelic execution | |
|---|---|---|
| Question | Can you break my code? | Can you make my code succeed? |
sat means |
A counterexample. | A witness. |
unsat means |
The property holds. | No input works. |
| When the solver runs | Ahead of time, offline. | Inside the program, at runtime. |
| Role in the workflow | Adversary. | Ally. |
L09 Theory introduced solve as a Rosette REPL primitive, calling it on bvudiv2 to find x = 0 as a witness where two implementations agree. L10 puts solve inside the body of a running program. Unknowns become define-symbolic placeholders at the points where the program needs values, and a solve call fills them in so the program's assertions hold.
Solver queries
Demonic verification queries Z3 with the disjunction over paths and assertions :
If unsat, no path reaches a failing assertion, and the program is verified.
Angelic execution queries Z3 for a model of the conjunction:
A model is a choice of symbolic inputs such that for every reachable path the path's assertion holds. The model is the answer the program needs at runtime.
Asking the right question
Suppose we want a constant c such that c * x = x + x for every integer x. This setup is from Bornholt's 2018 synthesis tutorial. We give Rosette both c and x as symbolic, write the constraint, and call solve:
#lang rosette
(define-symbolic x c integer?)
(solve
(assert (= (* c x) (+ x x))))
Rosette returns:
(model
[x 0]
[c 0])
The math checks out: 0 * 0 = 0 + 0. We asked for any pair (c, x) where the equation holds. We wanted a single c that works for every x.
If you get the reduction wrong, the solver gives you a correct answer to the wrong question. The reduction was wrong here by one quantifier alternation: we wrote but we wanted .
Promoting x to universal
Rosette's synthesize query separates the two roles: a #:forall variable becomes the adversary the answer has to survive, and the remaining symbolic variables stay as degrees of freedom the solver picks.
(synthesize
#:forall (list x)
#:guarantee (assert (= (* c x) (+ x x))))
Now c is the only thing the solver picks. The constraint must hold for every value of x. Rosette returns:
(model
[c 2])
c = 2 is the constant we wanted. The identity was discovered from a single algebraic constraint.
The synthesis problem
Synthesis is angelic execution with a universal wrapped around the input:
Find a program (or a constant, or a sketch fill) e such that for every input x, the spec holds.
Drop the and what remains is angelic execution: . The is handled by the sketch: holes become symbolic variables, and a model for those variables decodes back to a program. L09 saw that with (bvlshr x (int32 1)).
Synthesis is one quantifier alternation past plain solve. Discharging the cannot be done by enumeration.
CEGIS
CEGIS stands for counterexample-guided inductive synthesis. It handles the by alternating a synthesize call (find a candidate that satisfies the spec on the current input set X) with a verify call (find an input where the candidate fails). Each counterexample joins X. The loop ends when the verifier finds no counterexample, or when the synthesizer finds no candidate. An UNSAT from the synthesizer means the sketch cannot express a program that meets the spec. The standard response is to widen the sketch. This is the algorithm Rosette's synthesize runs under the hood.
The algorithm
def cegis(sketch, spec):
X = []
while True:
# synthesize: find a hole-fill e such that
# spec(x, sketch_fill(e, x)) holds for every x in X
e = synth(sketch, spec, X)
if e is None:
return UNSAT
# verify: find an x where the candidate program fails spec
x_cex = verify(sketch, spec, e)
if x_cex is None:
return e
X.append(x_cex)
The two calls are angelic execution against X and demonic verification on the full spec.
flowchart TD
accTitle: CEGIS refinement loop
accDescr: A synthesizer proposes a candidate that satisfies the spec on the current input set. A verifier looks for a counterexample. If no counterexample exists, the candidate is returned. If a counterexample is found, it joins the input set and the synthesizer runs again. If no candidate satisfies the spec on the current input set, CEGIS returns UNSAT.
X[Input set X]
SYN["Synthesize:
find e so spec holds on every x in X"]
VER["Verify:
find x where sketch[e] fails spec"]
DONE[Return e]
UNSAT[Return UNSAT]
X --> SYN
SYN -->|no e exists| UNSAT
SYN -->|candidate e| VER
VER -->|no counterexample| DONE
VER -->|counterexample x| XWalking the loop
CEGIS on Bornholt's example finishes in two rounds:
| Round | X | Candidate | Verify finds |
|---|---|---|---|
| 1 | {} |
c = 0 (vacuously satisfies) |
x = 1, spec wants 2 |
| 2 | {1} |
c = 2 (works for x = 1) |
no counterexample |
Round 1 starts with X = {}. The synthesizer is unconstrained, so c = 0 works. The verifier finds x = 1: the candidate returns 0, the spec wants 2. Round 2 synthesizes against X = {1}. Only c = 2 satisfies c * 1 = 1 + 1. The verifier finds no counterexample, and the loop ends.
Two rounds, four solver calls, one quantifier alternation handled. The runnable hand-trace lives at cegis-by-hand-mul2.rkt in the companion demos.
Same loop, larger sketch
Here is the sign function sketch with five holes:
(define (sign x)
(cond [(< x (??)) (??)]
[(> x (??)) (??)]
[else (??)]))
(synthesize
#:forall (list x)
#:guarantee (assert (= (sign x) (sgn x))))
Each hole picks an integer. With the range {-1, 0, 1}, the search space is candidate programs. CEGIS on the sign sketch finishes in three rounds:
| Round | X | Candidate | Verify finds |
|---|---|---|---|
| 1 | {} |
always 0 |
x = 5, spec wants 1 |
| 2 | {5} |
positive 1, else 0 |
x = -3, spec wants -1 |
| 3 | {5, -3} |
negative -1, positive 1, else 0 |
no counterexample |
Rosette returns:
(define (sign x)
(cond ((< x 0) -1)
((> x 0) 1)
(else 0)))
Three rounds, six solver calls, about one second on the wall clock. Two counterexamples were enough to pin down the correct sign function out of 243 candidates.
Why it converges in practice
CEGIS is worst-case exponential but converges fast on the problems people actually have. Solar-Lezama (2013) named this the bounded-observation hypothesis: for a well-shaped sketch, a small concrete-input set suffices to eliminate every wrong candidate. Each counterexample peels a region of the search space away.
This is the L02 loop again. CDCL extracts a clause from each conflict and feeds it into the next assignment search. CEGIS extracts an input from each counterexample and feeds it into the next program search. Both algorithms make an intractable search tractable by accumulating failure constraints.
Each counterexample replaces an unbounded with one more concrete constraint. That is how the solver discharges a quantifier it cannot enumerate.
Across the field
Every lecture in this course encoded a different domain into solver queries: propositional formulas (L01), theories of integers and bitvectors and arrays (L03–L06), program states under symbolic execution and weakest preconditions (L07–L08), compiler rewrites (L09), and synthesis sketches (L10). The solver was Z3 throughout.
The same shape holds across the broader landscape: Dafny and Verus push verification reductions into the host language, Lean and Rocq treat the solver as a subroutine they recertify, FlashFill and Lakeroad use synthesis at deployment time, and the LLM+solver hybrids shipping now add a learned proposer to the loop. The reduction from domain to query is still what the engineer writes.