(Week 10)

Theory

For nine weeks we have used the solver to encode problems and to verify properties. The same engine can also find inputs that make programs succeed (angelic execution) and programs that satisfy a spec on every input (synthesis).

Flip the oracle

The course has used the solver in two shapes. In L01–L06 we encoded problems and asked Z3 for a satisfying assignment, the model that mapped each symbolic variable to a value the constraints accepted. SAT found a package installation that fit the dependencies. SMT found a sudoku solution. The encoding was the input to the solver, and the model was the output.

In L07–L09 we used the solver to refute. We negated a property, asserted the negation, and asked Z3 to find a model. A sat answer was a counterexample. An unsat answer was the verification: no input violated the property. Symbolic execution and weakest preconditions both compile to that negate-and-check shape.

Verification has been the course's word for the counterexample use. The satisfying-input use is angelic execution.

	Demonic verification	Angelic execution
Question	Can you break my code?	Can you make my code succeed?
`sat` means	A counterexample.	A witness.
`unsat` means	The property holds.	No input works.
When the solver runs	Ahead of time, offline.	Inside the program, at runtime.
Role in the workflow	Adversary.	Ally.

L09 Theory introduced solve as a Rosette REPL primitive, calling it on bvudiv2 to find x = 0 as a witness where two implementations agree. L10 puts solve inside the body of a running program. Unknowns become define-symbolic placeholders at the points where the program needs values, and a solve call fills them in so the program's assertions hold.

Solver queries

Demonic verification queries Z3 with the disjunction over paths $P C_{i}$ and assertions $A_{i}$ :

⋁_{i} (P C_{i} \land \neg A_{i})

If unsat, no path reaches a failing assertion, and the program is verified.

Angelic execution queries Z3 for a model of the conjunction:

⋀_{i} (P C_{i} \to A_{i})

A model is a choice of symbolic inputs such that for every reachable path the path's assertion holds. The model is the answer the program needs at runtime.

Asking the right question

Suppose we want a constant c such that c * x = x + x for every integer x. This setup is from Bornholt's 2018 synthesis tutorial. We give Rosette both c and x as symbolic, write the constraint, and call solve:

#lang rosette

(define-symbolic x c integer?)

(solve
 (assert (= (* c x) (+ x x))))

Rosette returns:

(model
 [x 0]
 [c 0])

The math checks out: 0 * 0 = 0 + 0. We asked for any pair (c, x) where the equation holds. We wanted a single c that works for every x.

If you get the reduction wrong, the solver gives you a correct answer to the wrong question. The reduction was wrong here by one quantifier alternation: we wrote $\exists c . \exists x . c \cdot x = x + x$ but we wanted $\exists c . \forall x . c \cdot x = x + x$ .

Promoting `x` to universal

Rosette's synthesize query separates the two roles: a #:forall variable becomes the adversary the answer has to survive, and the remaining symbolic variables stay as degrees of freedom the solver picks.

(synthesize
 #:forall    (list x)
 #:guarantee (assert (= (* c x) (+ x x))))

Now c is the only thing the solver picks. The constraint must hold for every value of x. Rosette returns:

(model
 [c 2])

c = 2 is the constant we wanted. The identity $2 x = x + x$ was discovered from a single algebraic constraint.

The synthesis problem

Synthesis is angelic execution with a universal wrapped around the input:

\exists e . \forall x . φ (x, P [e] (x))

Find a program (or a constant, or a sketch fill) e such that for every input x, the spec $φ$ holds.

Drop the $\forall x$ and what remains is angelic execution: $\exists e . φ (x, P [e] (x))$ . The $\exists e$ is handled by the sketch: holes become symbolic variables, and a model for those variables decodes back to a program. L09 saw that with (bvlshr x (int32 1)).

Synthesis is one quantifier alternation past plain solve. Discharging the $\forall x$ cannot be done by enumeration.

CEGIS

CEGIS stands for counterexample-guided inductive synthesis. It handles the $\forall x$ by alternating a synthesize call (find a candidate that satisfies the spec on the current input set X) with a verify call (find an input where the candidate fails). Each counterexample joins X. The loop ends when the verifier finds no counterexample, or when the synthesizer finds no candidate. An UNSAT from the synthesizer means the sketch cannot express a program that meets the spec. The standard response is to widen the sketch. This is the algorithm Rosette's synthesize runs under the hood.

The algorithm

def cegis(sketch, spec):
    X = []
    while True:
        # synthesize: find a hole-fill e such that
        # spec(x, sketch_fill(e, x)) holds for every x in X
        e = synth(sketch, spec, X)
        if e is None:
            return UNSAT

        # verify: find an x where the candidate program fails spec
        x_cex = verify(sketch, spec, e)
        if x_cex is None:
            return e

        X.append(x_cex)

The two calls are angelic execution against X and demonic verification on the full spec.

flowchart TD
    accTitle: CEGIS refinement loop
    accDescr: A synthesizer proposes a candidate that satisfies the spec on the current input set. A verifier looks for a counterexample. If no counterexample exists, the candidate is returned. If a counterexample is found, it joins the input set and the synthesizer runs again. If no candidate satisfies the spec on the current input set, CEGIS returns UNSAT.

    X[Input set X]
    SYN["Synthesize:
find e so spec holds on every x in X"]
    VER["Verify:
find x where sketch[e] fails spec"]
    DONE[Return e]
    UNSAT[Return UNSAT]

    X --> SYN
    SYN -->|no e exists| UNSAT
    SYN -->|candidate e| VER
    VER -->|no counterexample| DONE
    VER -->|counterexample x| X

Walking the loop

CEGIS on Bornholt's example finishes in two rounds:

Round	X	Candidate	Verify finds
1	`{}`	`c = 0` (vacuously satisfies)	`x = 1`, spec wants `2`
2	`{1}`	`c = 2` (works for `x = 1`)	no counterexample

Round 1 starts with X = {}. The synthesizer is unconstrained, so c = 0 works. The verifier finds x = 1: the candidate returns 0, the spec wants 2. Round 2 synthesizes against X = {1}. Only c = 2 satisfies c * 1 = 1 + 1. The verifier finds no counterexample, and the loop ends.

Two rounds, four solver calls, one quantifier alternation handled. The runnable hand-trace lives at cegis-by-hand-mul2.rkt in the companion demos.

Same loop, larger sketch

Here is the sign function sketch with five holes:

(define (sign x)
  (cond [(< x (??)) (??)]
        [(> x (??)) (??)]
        [else (??)]))

(synthesize
 #:forall    (list x)
 #:guarantee (assert (= (sign x) (sgn x))))

Each hole picks an integer. With the range {-1, 0, 1}, the search space is $3^{5} = 243$ candidate programs. CEGIS on the sign sketch finishes in three rounds:

Round	X	Candidate	Verify finds
1	`{}`	always `0`	`x = 5`, spec wants `1`
2	`{5}`	positive `1`, else `0`	`x = -3`, spec wants `-1`
3	`{5, -3}`	negative `-1`, positive `1`, else `0`	no counterexample

Rosette returns:

(define (sign x)
  (cond ((< x 0) -1)
        ((> x 0) 1)
        (else 0)))

Three rounds, six solver calls, about one second on the wall clock. Two counterexamples were enough to pin down the correct sign function out of 243 candidates.

Why it converges in practice

CEGIS is worst-case exponential but converges fast on the problems people actually have. Solar-Lezama (2013) named this the bounded-observation hypothesis: for a well-shaped sketch, a small concrete-input set suffices to eliminate every wrong candidate. Each counterexample peels a region of the search space away.

This is the L02 loop again. CDCL extracts a clause from each conflict and feeds it into the next assignment search. CEGIS extracts an input from each counterexample and feeds it into the next program search. Both algorithms make an intractable search tractable by accumulating failure constraints.

Each counterexample replaces an unbounded $\forall x$ with one more concrete constraint. That is how the solver discharges a quantifier it cannot enumerate.

Across the field

Every lecture in this course encoded a different domain into solver queries: propositional formulas (L01), theories of integers and bitvectors and arrays (L03–L06), program states under symbolic execution and weakest preconditions (L07–L08), compiler rewrites (L09), and synthesis sketches (L10). The solver was Z3 throughout.

The same shape holds across the broader landscape: Dafny and Verus push verification reductions into the host language, Lean and Rocq treat the solver as a subroutine they recertify, FlashFill and Lakeroad use synthesis at deployment time, and the LLM+solver hybrids shipping now add a learned proposer to the loop. The reduction from domain to query is still what the engineer writes.