Practice
Hands-on with ForAll: a queue library, a wrapper to verify, and a cliff where Z3 stops cooperating.
A queue library and its spec
The running example is a queue library you call but don't have source for. The header exposes five operations:
empty # the empty queue
push(q, x) # add x at the back; returns the new queue
pop(q) # remove the front element; returns the new queue
peek(q) # read the front element
size(q) # current length
The spec from the docs translates into six axioms, organized as three pairs.
Counting.
FIFO peek. peek returns the oldest element. Base case followed by recursive case:
FIFO pop. pop removes the oldest element. Same shape:
The signature is uninterpreted. Q is a sort with no fixed meaning. push, pop, peek, and size are uninterpreted functions. The axioms above are the only facts Z3 has about any of them.
from z3 import (DeclareSort, Const, Function, IntSort, Int, Ints,
ForAll, Solver, Not)
Q = DeclareSort('Q')
empty = Const('empty', Q)
push = Function('push', Q, IntSort(), Q)
pop = Function('pop', Q, Q)
peek = Function('peek', Q, IntSort())
size = Function('size', Q, IntSort())
q = Const('q', Q)
x, y = Ints('x_ y_')
queue_spec = [
size(empty) == 0,
ForAll([q, x], size(push(q, x)) == size(q) + 1),
ForAll([x], peek(push(empty, x)) == x),
ForAll([q, x, y], peek(push(push(q, x), y)) == peek(push(q, x))),
ForAll([x], pop(push(empty, x)) == empty),
ForAll([q, x, y], pop(push(push(q, x), y)) == push(pop(push(q, x)), y)),
]
The wrapper under review
You wrote a one-line wrapper that rotates the head element to the back of the queue:
def cycle(q):
return push(pop(q), peek(q))
Take the front element with peek, drop it from the front with pop, push it onto the back. The size is unchanged. The front advances by one:
[a, b, c] --cycle--> [b, c, a]
Both observations follow by inspection. The next section confirms them with Z3, mechanically and for every choice of a, b, c.
The basic invariants
The two invariants from inspection: cycle preserves the queue's size, and the new front is the element that was second from the front. Each query below asserts the negation of one and asks Z3 for a counterexample:
def cycle(q):
return push(pop(q), peek(q))
a, b, c = Ints('a b c')
q3 = push(push(push(empty, a), b), c)
# Property 1: cycle preserves size.
s = Solver()
for ax in queue_spec:
s.add(ax)
s.add(Not(size(cycle(q3)) == 3))
print(s.check()) # unsat
# Property 2: the new front is the second-pushed element.
s = Solver()
for ax in queue_spec:
s.add(ax)
s.add(Not(peek(cycle(q3)) == b))
print(s.check()) # unsat
Both queries return unsat in 1 ms. Each proof uses two axioms cooperating.
The size proof:
- The size axiom fires on the outer
push:size(cycle(q3)) = size(pop(q3)) + 1. - The recursive
popaxiom fires twice onpop(q3), unwinding to the base case:pop(q3) = push(push(empty, b), c). - Two more size-axiom firings reduce that to
size(pop(q3)) = 2. - Chain:
size(cycle(q3)) = 2 + 1 = 3.
The peek proof:
- The FIFO peek axiom reduces
peek(q3)toa, socycle(q3) = push(pop(q3), a). - With
pop(q3) = push(push(empty, b), c)from above, this ispush(push(push(empty, b), c), a). - The FIFO peek axiom fires on the outer two pushes:
peek(cycle(q3)) = peek(push(empty, b)) = b.
Both invariants hold for every choice of a, b, c.
Cycle as a 3-rotation
Cycling a 3-element queue three times should bring it back to the original.
a, b, c = Ints('a b c')
q3 = push(push(push(empty, a), b), c)
s = Solver()
for ax in queue_spec:
s.add(ax)
s.add(Not(cycle(cycle(cycle(q3))) == q3))
print(s.check()) # unsat
unsat in 18 ms. The recursive pop axiom fires many times (once per cycle, several times per nesting level), the FIFO peek axiom fires several times, and the result canonicalises to the original term.
The work hides inside s.check(). Z3 walked the term, matched each subterm against an axiom's pattern, instantiated, and chained the resulting equalities. This pattern-matching strategy has a name: E-matching.
Without the spec
A natural question, asked without thinking: is cycle the identity? Drop the axioms entirely and ask Z3 directly.
a, b, c = Ints('a b c')
q3 = push(push(push(empty, a), b), c)
s = Solver()
s.add(cycle(q3) == q3)
print(s.check()) # sat
sat. Z3 picked an interpretation of push, pop, peek where cycling does nothing. Maybe pop is the identity. Maybe push ignores its second argument. With no axioms, Q, push, pop, peek mean nothing, and uninterpreted is literal: Z3 is free to pick any functions that satisfy the constraint.
Add the queue spec back and re-run the same query, with the additional fact that the elements are distinct so the rotation is observable:
s = Solver()
for ax in queue_spec:
s.add(ax)
s.add(a != b, b != c, a != c)
s.add(cycle(q3) == q3)
print(s.check()) # unsat
unsat in 6 ms. The spec rules out the trivial models. Now cycle must move elements around in a specific way, and on a queue with three distinct elements that movement is visible.
A correct answer to the wrong question. Same code, same query, two different results. The spec was the difference. Without it, Z3 answered a problem we did not mean to ask: "is there any interpretation of these symbols that makes my claim true?" The honest answer was yes, and it was useless. The solver answers the question its inputs encode.
When the spec drifts
You don't drop the spec. You import the wrong one.
The team next door also ships a stack library, with the same five names: empty, push, pop, peek, size. Their docs say:
- (LIFO peek)
- (LIFO pop)
Counting is the same. peek and pop look at the most recently pushed element instead of the oldest. Same operation names, opposite discipline.
Imagine your build system pulled in the stack header by mistake. Your cycle wrapper compiles. Z3, run against the stack spec, decides:
stack_spec = [
size(empty) == 0,
ForAll([q, x], size(push(q, x)) == size(q) + 1),
ForAll([q, x], peek(push(q, x)) == x),
ForAll([q, x], pop(push(q, x)) == q),
]
a, b, c = Ints('a b c')
q3 = push(push(push(empty, a), b), c)
s = Solver()
for ax in stack_spec:
s.add(ax)
s.add(Not(cycle(q3) == q3))
print(s.check()) # unsat -- "cycle is the identity"
unsat in 6 ms. Z3 has proved that cycle is the identity.
Walk through it. Under stack semantics, pop(push(push(push(empty,a),b),c)) = push(push(empty,a),b) (LIFO pop strips the top). And peek(push(push(push(empty,a),b),c)) = c (LIFO peek reads the top). So
The proof is honest. Under the stack spec, cycle really is the identity. The problem is that your library is a queue, and against a queue your cycle rotates.
Two specs, same code, opposite answers, both proofs valid. The verifier did not lie. The spec did. This is the failure mode the formal-methods community calls spec drift: a verified guarantee that no longer corresponds to what the system actually does. It is one of the few ways a verifier can ship a wrong answer with full confidence.
Every proof on this page came back in milliseconds. Practice continues with more ForAll examples and closes at the cliff, where small changes to an axiom flip Z3 from instant to unknown.
Stating what you know
Z3's built-in theories don't know that your abs returns
non-negative integers, or that your decoder undoes your encoder.
Facts about your uninterpreted functions are what ForAll is for.
A non-negative function
from z3 import Function, IntSort, Int, ForAll, Solver, Not
f = Function('f', IntSort(), IntSort())
x, y = Int('x'), Int('y')
s = Solver()
s.add(ForAll([x], f(x) >= 0))
s.add(Not(f(y) >= 0))
print(s.check()) # unsat
unsat. Z3 instantiates the axiom at x := y, derives
f(y) >= 0, contradicts the negation.
Drop the axiom and Z3 picks freely:
s = Solver()
s.add(Not(f(y) >= 0))
print(s.check()) # sat
print(s.model()) # f(y) = -1 (or some other negative)
No axiom, no constraint. Uninterpreted is literal.
Injectivity from a round-trip spec
Suppose your encoder and decoder satisfy
Does it follow that encode is injective, i.e., distinct inputs
give distinct outputs? You can prove it from the spec alone.
from z3 import Function, IntSort, Ints, ForAll, Solver
encode = Function('encode', IntSort(), IntSort())
decode = Function('decode', IntSort(), IntSort())
x, a, b = Ints('x a b')
s = Solver()
s.add(ForAll([x], decode(encode(x)) == x))
s.add(a != b)
s.add(encode(a) == encode(b))
print(s.check()) # unsat
unsat. Two distinct inputs cannot land at the same encoded
output if decoding gets you back to where you started.
The proof Z3 found uses the axiom twice and EUF once. The axiom
fires at x := a, giving decode(encode(a)) = a; and at
x := b, giving decode(encode(b)) = b. From encode(a) ==
encode(b), EUF concludes decode(encode(a)) == decode(encode(b)).
Substitute: a == b, contradicting a != b.
Two layers of reasoning:
- A universal axiom, instantiated at the ground terms in the formula.
- EUF closing the loop with congruence.
Specs that compose
A ForAll axiom is a re-usable rule. Multiple axioms can chain
through a single query, and the solver combines them with the
built-in arithmetic and equality reasoning you already have.
Clamp respects its upper bound
clamp is the bound-a-value-to-an-interval idiom found in
graphics, audio, signal processing. You don't have source for it.
The library gives you a behavioral spec:
Plus the upper-bound spec for min:
max stays uninterpreted; we don't need its spec for this
property. From the three axioms, prove that clamp never
exceeds hi.
from z3 import Function, IntSort, Ints, ForAll, Solver, Not
mmin = Function('min', IntSort(), IntSort(), IntSort())
mmax = Function('max', IntSort(), IntSort(), IntSort())
clamp = Function('clamp', IntSort(), IntSort(), IntSort(), IntSort())
x, y, v, lo, hi = Ints('x y v lo hi')
s = Solver()
s.add(ForAll([x, y], mmin(x, y) <= x))
s.add(ForAll([x, y], mmin(x, y) <= y))
s.add(ForAll([v, lo, hi],
clamp(v, lo, hi) == mmin(mmax(v, lo), hi)))
s.add(Not(clamp(v, lo, hi) <= hi))
print(s.check()) # unsat
unsat. Two axioms cooperating. The clamp spec fires at
(v, lo, hi), unfolding to min(max(v, lo), hi). That
introduces a fresh min(...) ground term, which fires the
second min axiom at (max(v, lo), hi). Substitute:
clamp(v, lo, hi) <= hi.
The min axioms here are partial. They say min returns some
value at most each argument; they don't pin which one. That's
enough to prove clamp <= hi (the upper bound holds either way)
but not enough for clamp >= lo, which would need a stronger
spec forcing min(x, y) to actually return one of its arguments.
Min over a chain
A min over a chain of nested calls is at most any individual
element. With the same upper-bound axioms:
mmin = Function('min', IntSort(), IntSort(), IntSort())
x, y, a, b, c = Ints('x y a b c')
s = Solver()
s.add(ForAll([x, y], mmin(x, y) <= x))
s.add(ForAll([x, y], mmin(x, y) <= y))
s.add(Not(mmin(mmin(a, b), c) <= a))
print(s.check()) # unsat
unsat. The axiom fires twice. At (a, b), it gives
min(a, b) <= a. At (min(a, b), c), it gives
min(min(a, b), c) <= min(a, b). Z3 chains them via integer
arithmetic: min(min(a, b), c) <= a.
Same axiom, two ground terms, two instantiations. Deeper nesting fires more times. This is what "ForAll" is doing under the hood even when the proof feels obvious: walking the formula's terms, matching each one against the axiom's pattern, asserting the instance.
The cliff
ForAll works until it doesn't. Two demos in.
Three near-identical axioms
Predict before running. For each axiom below, will Z3 return a model?
from z3 import Function, IntSort, Int, ForAll, Solver
x = Int('x')
f = Function('f', IntSort(), IntSort())
axioms = [
('f(x) > x', ForAll([x], f(x) > x)),
('f(x) / 2 == x', ForAll([x], f(x) / 2 == x)),
('f(x) == 2 * x', ForAll([x], f(x) == 2 * x)),
]
for label, axiom in axioms:
s = Solver()
s.set('timeout', 5000) # 5-second cap
s.add(axiom)
print(f'{label:20s} {s.check()}')
All three describe f as relating each integer to something
that depends on it. Run them.
f(x) > x unknown
f(x) / 2 == x unknown
f(x) == 2 * x sat
f(x) == 2 * x is a defining equation; Z3 builds the obvious
model where f doubles its argument. The other two are
inequalities that admit only infinite models, and Z3's
heuristics for synthesizing one in bounded time give up.
f(x) > x and f(x) == 2 * x are syntactic neighbors, and Z3
treats them as if they live in different countries.
Same formula, different trigger
Now an axiom that is the same in both runs except for one annotation, with opposite results.
from z3 import Function, IntSort, Int, Ints, ForAll, Solver
f = Function('f', IntSort(), IntSort())
g = Function('g', IntSort(), IntSort())
a, b, c = Ints('a b c')
x = Int('x')
# Pattern f(g(x)).
s = Solver()
s.set(auto_config=False, mbqi=False)
s.add(ForAll(x, f(g(x)) == x, patterns=[f(g(x))]))
s.add(g(a) == c, g(b) == c, a != b)
print(s.check()) # unknown
# Pattern g(x).
s = Solver()
s.set(auto_config=False, mbqi=False)
s.add(ForAll(x, f(g(x)) == x, patterns=[g(x)]))
s.add(g(a) == c, g(b) == c, a != b)
print(s.check()) # unsat
The axiom says "f inverts g." Three ground facts say g(a) =
c, g(b) = c, a ≠ b.
Apply f to both sides of g(a) = c. By the axiom, f(g(a)) =
a, so a = f(c). By the same argument with b: b = f(c). So
a = b, contradicting a ≠ b. The formula is unsatisfiable.
But with the trigger f(g(x)), Z3 returns unknown. Switch
the trigger to g(x) and Z3 returns unsat in microseconds.
The trigger tells Z3 when to instantiate the axiom. Pattern
f(g(x)) looks for ground terms shaped like f(g(_)) in the
formula; there are none. Only g(a) and g(b) appear. The
axiom never fires, the contradiction is never derived, Z3 gives
up.
Pattern g(x) looks for ground terms shaped like g(_). Two
of those: g(a) and g(b). The axiom fires at each, equality
reasoning closes the loop.
Triggers are a little programming language for controlling when each axiom fires. Pick one poorly and a provable formula becomes "unknown."
Source
The queue and stack axioms are standard equational specifications of algebraic data types, the style introduced by Goguen, Thatcher, and Wagner ("Initial algebra semantics," 1977) and developed in functional-programming pedagogy by Bird and Wadler. The cycle wrapper and the spec-drift framing are original.
The trigger demos are based on examples from the Z3 quantifier instantiation literature, in particular the discussion of pattern selection in Michał Moskal, "Programming with triggers" (SMT 2009), and the worked examples in the Z3 programming guide on advanced quantifier handling.