(Week 9)

Theory

Across L07 and L08 we hand-built verification engines from scratch. Rosette is a programming language with that engine built into #lang, plus one capability the engines did not have: program synthesis.

More for free

L07's symbolic-execution engine and L08's WP engine both followed the same recipe: parse a function's AST, walk it, accumulate symbolic state, encode the result as an SMT formula, dispatch the formula to Z3, and interpret the answer. The two engines together were close to 800 lines of Python. The engineer wrote the function being verified and the spec it had to satisfy. The engine glued those to Z3.

Rosette puts that glue into a programming language. A file with #lang rosette at the top is a Racket program with three new constructs: a way to declare symbolic values, a way to assert and assume claims about them, and a small family of queries that ask the solver questions. The AST walking, SMT encoding, and model extraction that we hand-wrote in L07 and L08 are all handled behind the language.

Hand-written in L07 / L08	Built into Rosette
~465 lines of Python: AST walker, IVL primitives, VC Gen	`#lang rosette`
`assume`, `havoc`, `assert` as marker functions	Same names, built-in
Dispatched VCs to Z3 by hand	`(verify ...)`, `(solve ...)`
Parsed counterexample models	`(evaluate ...)`
(no synthesis)	`(synthesize ...)`

Rosette also adds a capability we never built by hand: program synthesis.

verify asks "is there an input where the spec fails?"
solve asks the dual: "is there an input where the spec holds?"
synthesize goes further: "is there a program, drawn from a sketch with holes, that makes the spec hold for every input?"

The solver writes the code.

flowchart TD
    accTitle: Rosette mediates between user code and Z3
    accDescr: User code in Racket calls a query (verify, solve, or synthesize). Rosette handles symbolic execution, state merging, and SMT encoding, then sends an SMT formula to Z3. Z3 returns a model or unsat, which Rosette translates back into an answer for the user.
    A["Your code (Racket)"]
    B["Rosette
SE + state merging + SMT encoding"]
    C["Z3
VALID / counterexample / program"]
    A -->|verify, solve, synthesize| B
    B -->|SMT formula| C

Reading Racket

Rosette is built on Racket, a modern Lisp. Every Lisp program is built from s-expressions: either an atom (a number, string, or symbol) or a parenthesized list of s-expressions separated by spaces. There is no other syntax.

A parenthesized list is a function call. The first element is the function, and the rest are its arguments:

(+ 1 2)             ; 1 + 2
(equal? a b)        ; a == b
(define x 42)       ; x = 42
(define (f x) ...)  ; def f(x): ...

Operators are ordinary functions. There is no special syntax for + or ==, and no operator precedence to remember. Parentheses make grouping explicit.

A few naming conventions: predicates end in ? (boolean?, null?), and mutators end in ! (vector-set!). The last expression in a function body is the return value.

For a fuller introduction to the host language, see the Racket Guide. For the solver-aided constructs Rosette adds, see the Rosette Guide.

Why Racket?

Every Racket program is already its own AST. (+ 1 2) is a three-element list: the symbol +, the number 1, and the number 2. The value the Racket reader produces and the value the evaluator runs on are the same value, and any other program can inspect it as data.

In L07 and L08, our Python engines began with ast.parse(): take a string of source, lex it, build a tree of ast.Module / ast.FunctionDef / ast.Call nodes, then walk that tree. We reimplemented a parser's worth of work to reach the data structure we wanted to operate on.

Python source:  x + 1
Python AST:     BinOp(Add, Name('x'), Constant(1))

Racket source:  (+ x 1)
Racket AST:     (+ x 1)

Racket skips the parsing step. The source (verify (assert (equal? a b))) is a list whose head is verify and whose tail is the assertion. Rosette can walk that list directly, symbolically evaluating the assertion, accumulating path conditions, and emitting SMT, because the program is already in the shape a meta-program wants to work with. The phrase for this is homoiconicity: code and data have the same representation.

This is why Rosette is built on Racket. One tedious part of L07 and L08 was bridging from source code to a symbolic representation. In Racket, the source is the data structure.

Symbolic values

A symbolic value is a placeholder for a concrete value of the same type. The solver decides what concrete value it stands for.

The L07 and L08 engines built symbolic values by calling Z3 directly:

# inside the engine
from z3 import Bool, Solver, Or, Not

b = Bool('b')                # b is a Z3 boolean expression
s = Solver()
s.add(Not(Or(b, Not(b))))    # is (b or not b) ever false?
s.check()                    # unsat

In Rosette, symbolic values and the verify query are language constructs:

> (define-symbolic b boolean?)
> b
b
> (verify (assert (or b (not b))))
(unsat)

The REPL prints b as b because there is no concrete value yet. Symbolic values are ordinary first-class values in Racket: they bind, pass, store, and combine like any other value. The expression (or b (not b)) is itself a symbolic expression over b.

verify asks whether an assertion can fail for any concrete value of the symbolic constants. unsat means no failing assignment exists. The assertion is a tautology.

Verifying `bvudiv2`

The running example is unsigned 32-bit division by 2, the same bvudiv2 from L01 Practice, with two implementations:

#lang rosette

(define int32? (bitvector 32))
(define (int32 i) (bv i int32?))

(define (bvudiv2 x)
  (bvudiv x (int32 2)))   ; x / 2

(define (bvudiv2-a x)
  (bvashr x (int32 1)))   ; x >> 1 (arithmetic)

bvudiv2 uses Rosette's built-in unsigned division. bvudiv2-a replaces the divide with an arithmetic shift. In L01 we wrote the equivalence check against Z3 by hand. The Rosette version does it in one query.

Verify

(define-symbolic x int32?)

(define cex
  (verify
   (assert (equal? (bvudiv2 x) (bvudiv2-a x)))))

verify asks whether there is an input where the assertion fails. Rosette returns a counterexample if one exists, or unsat if not. The counterexample here is x = 0x80000000, the most negative 32-bit integer. Arithmetic shift right preserves the sign bit, while unsigned division by 2 does not. The two implementations disagree on INT_MIN.

Solve

(define guess
  (solve
   (assert (equal? (bvudiv2 x) (bvudiv2-a x)))))

solve flips the verify question: instead of looking for an input that fails the assertion, it looks for one that satisfies it. Rosette returns x = 0. The two implementations do agree on at least one input.

Assume

The arithmetic-shift implementation is wrong on negative inputs and correct everywhere else. Capture that restriction with assume:

(verify
 (begin
   (assume (bvsge x (int32 0)))
   (assert (equal? (bvudiv2 x) (bvudiv2-a x)))))

assume adds a precondition to the verify query. The solver now searches for counterexamples that also satisfy x >= 0. Rosette returns unsat: under that precondition, the two implementations agree on every input. The implicit assumption (that we only care about non-negative x) has to be in the formula for the solver to honor it.

Synthesizing the fix

Until now in CARS the solver has answered questions about a program. Verify and solve both took an assertion and asked the solver about it: verify for a counterexample, solve for a witness. The L07 and L08 engines asked variations of those. The user supplied the program, and the solver supplied the answer.

Synthesis flips the roles. The user supplies a specification (an assertion that must hold) and a sketch with holes. The solver supplies the program: it finds expressions to fill the holes so the assertion holds for every input.

Construct	User supplies	Solver returns
`verify`	complete program + assertion	counterexample or `unsat`
`solve`	complete program + assertion	satisfying input or `unsat`
`synthesize`	program sketch with holes + assertion	filled-in program or `unsat`

This is the first time in CARS the solver writes code.

bvudiv2-a was wrong because it used the wrong shift. Instead of trying each shift by hand, we sketch the structure and let Rosette pick:

(require rosette/lib/synthax)

(define (bvudiv2-b x)
  ((choose bvashr bvlshr bvshl) x (?? int32?)))

(define sol
  (synthesize
   #:forall (list x)
   #:guarantee (assert (equal? (bvudiv2 x) (bvudiv2-b x)))))

(generate-forms sol)

The synthesis query has three new pieces.

?? is a hole. The solver fills it with an expression. (?? int32?) asks for a 32-bit integer constant.
choose is a finite menu. The solver picks one item from the list. (choose bvashr bvlshr bvshl) says "one of these operators."
#:forall and #:guarantee state the spec. #:forall (list x) quantifies over every 32-bit value of x, and #:guarantee is the assertion that must hold for each one.

Together the query asks: is there a way to fill the holes so the assertion holds for every input? In quantifier form, the query is $\exists e . \forall x . assertion$ , with $e$ the hole-filling and $x$ the input.

Rosette returns (bvlshr x (int32 1)). That is logical shift right by 1, the correct way to halve an unsigned integer.

The synthesized code comes with a proof of correctness on every 32-bit input. The synthesizer dispatched a full SMT query and returned the search result.

Rosette as a language host

So far Rosette has been a verifier and synthesizer for individual Racket functions. Most code we care about isn't Racket: production verifiers reason about eBPF bytecode, EPICS dataflow programs, hardware ISAs, and many other languages.

Rosette extends to these by acting as a host language. The language you want to reason about, the guest language, gets defined by an interpreter written as a Rosette function. The interpreter is the bridge: it takes a program in the guest language plus inputs, and tells Rosette what that program means. Rosette then runs the interpreter symbolically, and verify and synthesize work on guest-language programs through it.

Rosette is a meta-DSL: a language for building solver-aided DSLs. Building a new verifier from scratch took hundreds of lines and a custom symbolic walker in L07 and L08. Through Rosette, the same capability comes from an interpreter, which is much smaller.

bvlang: a tiny ISA

bvlang is a tiny register-based language for 32-bit bitvector code, similar in spirit to a small assembly language. Here is an ARM add of two registers:

add x2, x0, x1     ; x2 = x0 + x1

The same operation in eBPF:

r2 = r0 + r1

Both write a sum of two source registers into a destination register. bvlang has the same shape, with register names written as small integers and the instruction wrapped in parens. The general form is (out op in ...):

(2 bvadd 0 1)      ; r2 = r0 + r1

Here out is the destination register, op is the operation, and the rest are source registers. The first registers hold the function's arguments. By convention, each subsequent instruction writes to a new register, numbered in order. The last register written holds the return value.

A complete bvlang program declares its arguments and lists its instructions. The def macro bundles a name, the registers that hold the function's arguments, and the instruction body. Here is a 3-instruction program that computes the sum of squares of its two arguments:

(def sum-sq (0 1)        ; argument registers: r0, r1
  (2 bvmul 0 0)          ; r2 = r0 * r0
  (3 bvmul 1 1)          ; r3 = r1 * r1
  (4 bvadd 2 3))         ; r4 = r2 + r3, the result

bvlang strips assembly down to its essentials: integer-indexed registers and a small set of bitvector operations. Operations include bitwise (bvand, bvor, bvxor, bvshl, bvashr), arithmetic (bvadd, bvsub, bvmul, bvneg), and comparison (bvsge, bvult, ...). These are the bitvector primitives we've been using.

This program is data in Rosette. The values bvadd, bvand, and the rest are just symbols at this point. Rosette doesn't yet know that bvadd should compute a sum. We give them meaning by writing an interpreter.

The `bvlang` interpreter

An interpreter is a function that takes a bvlang program and a list of inputs, runs the program, and returns the result. It is the bridge from "symbols on the page" to "computation Rosette can reason about."

To run sum-sq on inputs 3 and 4 by hand, we keep a register file (a small array indexed by integer) and execute each instruction in order:

Step	r0	r1	r2	r3	r4
Initial	3	4
`(2 bvmul 0 0)`	3	4	9
`(3 bvmul 1 1)`	3	4	9	16
`(4 bvadd 2 3)`	3	4	9	16	25

The result is r4 = 25, the value of the last register written.

An interpreter does this for any bvlang program. To preview the Rosette version, here is the same procedure in Python:

def interpret(prog, inputs):
    # Register file: inputs first, then one slot per instruction.
    registers = list(inputs) + [None] * len(prog)

    for (out, opcode, *ins) in prog:
        op = lookup(opcode)                 # map opcode symbol to a function
        args = [registers[i] for i in ins]  # read source registers
        registers[out] = op(*args)          # apply and store result

    return registers[-1]                    # value of the last register

The Racket version follows the same shape:

(define (interpret prog inputs)
  (make-registers prog inputs)
  (for ([stmt prog])
    (match stmt
      [(list out opcode in ...)
       (define op (lookup opcode))
       (define args (map load in))
       (store out (apply op args))]))
  (load (last)))

Both versions allocate the register file, walk each instruction, destructure it into output register, opcode, and inputs, look up the operation, read the source registers, apply, and store. Return the last register at the end.

The helpers (make-registers, lookup, load, store, last) are short utility functions over the register file. Full source is in the companion bvlang.rkt file.

The same function works two ways. On concrete inputs, it runs the trace above. On symbolic inputs, Rosette accumulates the SMT formula describing what sum-sq computes for every input. That second mode is how verify and synthesize work on bvlang programs.

The verifier on top

To verify a bvlang program, we compare it against a Racket spec. Here is the spec for max of two 32-bit signed integers:

(define (bvmax0 x y)
  (if (bvsge x y) x y))

And a bvlang attempt at implementing it, using the branchless-max trick:

(def bvmax1 (0 1)        ; argument registers: r0, r1
  (2 bvsge 0 1)          ; r2 = (r0 >= r1) ? 1 : 0
  (3 bvneg 2)            ; r3 = -r2 (all 1s if r0 >= r1, else 0)
  (4 bvxor 0 2)          ; r4 = r0 XOR r2
  (5 bvand 3 4)          ; r5 = r3 AND r4
  (6 bvxor 1 5))         ; r6 = r1 XOR r5, the intended max

The ver wrapper hooks the interpreter into Rosette's verify query:

(define (ver impl spec)
  (define-symbolic* in int32? #:length (procedure-arity spec))
  (define cex
    (verify
     (assert (equal? (interpret (prog-body impl) in)
                     (apply spec in)))))
  (if (sat? cex)
      (evaluate in cex)
      cex))

It does three things:

define-symbolic* creates a fresh list of symbolic 32-bit integers, one for each argument of the spec.
verify asks Z3 whether impl and spec can produce different results on any inputs.
evaluate extracts the concrete input values from Z3's counterexample, if one exists.

In about 20 lines of Rosette, we have a custom verifier that does symbolic execution for any bvlang program. L08's analogous verifier for mini IMP was about 400 lines of Python.

Running it:

> (ver bvmax1 bvmax0)
(list (bv #x00010000 32) (bv #x00000000 32))

Z3 returns a counterexample as a list of two 32-bit values: x = 0x10000 (decimal 65536) and y = 0. On these inputs the spec bvmax0 returns 65536, but bvmax1 returns 65537. The bug is in instruction 4: it XORs r0 with r2 (the comparison bit, 0 or 1) instead of r1 (the second argument).

The verifier found a real bug without us tracing the program by hand, and the next section asks Rosette to find the fix.

The synthesizer on top

The synthesizer asks Rosette to find an implementation that matches the spec. Instead of writing bvmax1 by hand, we give Rosette a sketch: a program template with the number of instructions fixed but the operations and operands left as holes for the solver to fill.

Spelled out, a 5-instruction sketch looks like this:

(def bvmax2 (0 1)
  (2 (choose bvsge bvneg bvxor bvand) ?? ??)
  (3 (choose bvsge bvneg bvxor bvand) ?? ??)
  (4 (choose bvsge bvneg bvxor bvand) ?? ??)
  (5 (choose bvsge bvneg bvxor bvand) ?? ??)
  (6 (choose bvsge bvneg bvxor bvand) ?? ??))

Each line declares one instruction slot. The first hole in each slot is the operator, picked from the list. The two ?? holes are source registers, each picked from a register already written. The output registers stay sequential, as before.

The #:sketch macro is shorthand for the same pattern:

(def bvmax2 (0 1)                       ; argument registers: r0, r1
  #:sketch 5 (bvsge bvneg bvxor bvand)) ; 5 slots, ops from this list

The syn wrapper hooks the interpreter into Rosette's synthesize query:

(define (syn impl spec)
  (define-symbolic* in int32? #:length (procedure-arity spec))
  (define sol
    (synthesize
     #:forall in
     #:guarantee (assert (equal? (interpret (prog-body impl) in)
                                 (apply spec in)))))
  (if (sat? sol)
      (evaluate impl sol)
      sol))

It does three things:

define-symbolic* creates a fresh list of symbolic 32-bit integers, one for each argument of the spec.
synthesize with #:forall in #:guarantee ... asks Z3 to find hole-fills such that the interpreter on impl and the spec agree on every input.
evaluate substitutes the solver's hole-fills back into impl, returning a filled-in bvlang program.

In about 10 more lines of Rosette, we have a synthesizer for any bvlang program. Unlike verify, synthesis was not part of L07 or L08.

Running it:

> (syn bvmax2 bvmax0)
(prog 'bvmax2 '(0 1)
      '((2 bvsge 0 1)
        (3 bvneg 2)
        (4 bvxor 0 1)
        (5 bvand 3 4)
        (6 bvxor 1 5)))

The synthesized program is the same shape as bvmax1, but instruction 4 is now (4 bvxor 0 1) instead of (4 bvxor 0 2). The solver searched the space of 5-instruction programs in the allowed ops and found one that matches bvmax0 on every 32-bit input.

The solver wrote the program. We supplied only a sketch and a spec.

We can also ask whether a shorter program exists. Change the sketch to 4 instructions instead of 5:

(def bvmax-4 (0 1)
  #:sketch 4 (bvsge bvneg bvxor bvand))

Running synthesis:

> (syn bvmax-4 bvmax0)
(unsat)

Rosette returns (unsat): no 4-instruction branchless max exists in this op set. The 5-instruction version we just synthesized is the shortest possible.

For the price of an interpreter, Rosette gives you a verifier and synthesizer for every program in the guest language.

A bitwidth gotcha

Rosette's integer? type can be approximated by a finite-precision bitvector. By default the precision is unbounded (current-bitwidth is #f). Finite precision is much faster for synthesis but quietly changes what the solver believes about your integers:

> (current-bitwidth)
#f
> (define-symbolic x integer?)
> (solve (assert (= x 64)))
(model [x 64])

> (current-bitwidth 5)
> (solve (assert (= x 64)))
(model [x 0])

The second model is x = 0 because 64 mod 32 = 0, and under 5-bit precision, 64 and 0 are the same value. The solver is not buggy. The integers it is reasoning about are not the integers we have in mind.

bvlang.rkt sets (current-bitwidth 4) for exactly this reason: synthesis is much faster under finite precision, even on pure bitvector code that does not depend on integer semantics. The trade-off is invisible until something does not match expectation.

Reasoning precision is one of several Rosette gotchas. Others include unbounded loops over symbolic values (which run forever) and unsafe features like hash tables (which Rosette does not lift). The Rosette Guide covers each in detail.

Applications

Rosette has been used to build dozens of solver-aided tools, in domains from medical devices to operating-system kernels to FPGA compilers. Three examples:

CNTS: the Clinical Neutron Therapy System at UW Medical Center, in continuous service for over 30 years. The CNTS engineers wanted to verify their EPICS dataflow code, which controls the radiotherapy machine. A Rosette verifier for EPICS scaled to the 90,000-line codebase and found two safety-critical bugs in a pre-release version. One was in the EPICS runtime itself: the controller had been relying on a misimplementation.
Jitterbug: a verifier for the eBPF JIT compilers in the Linux kernel. The kernel runs eBPF bytecode (used for tracing, security policies, and network filters) through per-architecture JIT compilers. Jitterbug uses the lifted-interpreter pattern from above: a Rosette interpreter for BPF, per-target interpreters for x86, ARM, and RISC-V, and a per-target compiler. It found 16 previously unknown bugs across five JIT implementations.
Lakeroad: a synthesizer for FPGA technology mapping, built at UW. The guest language is a small algebra of FPGA primitives (lookup tables, DSP blocks, embedded multipliers). Lakeroad uses Rosette's synthesize to find implementations of high-level hardware operations from sketches over those primitives. It has produced higher-quality mappings than vendor tools for many operations.

The shape these share: a Rosette interpreter for some guest language plus a few wrappers around verify or synthesize. Your project could follow this same shape. The starting point is a tiny Rosette interpreter for a domain you care about.

Then and now

Building a solver-aided tool used to require the infrastructure: a custom symbolic compiler, a hand-rolled SMT encoder, bespoke quantifier triggering. PEC in 2009 was that infrastructure built from scratch. Rosette pulls the infrastructure into a #lang line. What you write is the interpreter for your domain and the query you want to ask.