Skip to main content
  (Week 3)

Practice

From SAT to SMT: richer primitives for encoding, and the right abstraction for performance.

Where We Left Off

For two weeks we have emphasized SAT: boolean variables, clauses, and the CDCL algorithm that makes modern solvers fast. But our very first demo used integers and arithmetic to solve the xkcd restaurant puzzle, and those are not booleans. Z3 accepted them without complaint. This lecture starts our exploration of how theory solvers go beyond booleans and extend SAT to SMT.

Under the Hood

Here is the xkcd restaurant problem from Week 1, slightly simplified. Six menu items, integer quantities, total must be exactly $15.05:

x1 = Int('x1')
x2 = Int('x2')
x3 = Int('x3')
x4 = Int('x4')
x5 = Int('x5')
x6 = Int('x6')

s = Solver()

s.add(x1 >= 0)
s.add(x2 >= 0)
s.add(x3 >= 0)
s.add(x4 >= 0)
s.add(x5 >= 0)
s.add(x6 >= 0)

s.add(215*x1 + 275*x2 + 335*x3 + 355*x4 + 420*x5 + 580*x6 == 1505)

Every Z3 solver has a method called .sexpr() that shows the representation Z3 is actually working with. This representation is called SMT-LIB, and it is the standard input language for SMT solvers:

(declare-fun x1 () Int)
(declare-fun x2 () Int)
(declare-fun x3 () Int)
(declare-fun x4 () Int)
(declare-fun x5 () Int)
(declare-fun x6 () Int)
(assert (>= x1 0))
(assert (>= x2 0))
(assert (>= x3 0))
(assert (>= x4 0))
(assert (>= x5 0))
(assert (>= x6 0))
(assert (= (+ (* 215 x1)
              (* 275 x2)
              (* 335 x3)
              (* 355 x4)
              (* 420 x5)
              (* 580 x6))
           1505))

Notice (declare-fun x1 () Int). The variables are declared as integers, not booleans. The constraints use arithmetic operations like * and + and >=. None of this is SAT. This is SMT: Satisfiability Modulo Theories. The "modulo theories" part means Z3 is reasoning about integers using the rules of integer arithmetic, not just boolean satisfiability.

The Python API is a thin layer over this representation. When you write Int('x1'), Z3 creates an integer variable in its theory of integer arithmetic. When you write s.add(x1 >= 0), Z3 generates (assert (>= x1 0)) in its internal representation.

Encoding with Theories: Sudoku

SAT solvers work with boolean variables and clauses. To solve a problem that is not naturally boolean, you have to encode it as one. Sometimes that encoding is painful. Sudoku is a good example.

The puzzle

A sudoku board is a 9x9 grid divided into nine 3x3 blocks. Some cells are filled in (the givens). The goal: fill every empty cell with a digit from 1 to 9 so that each row, each column, and each 3x3 block contains all nine digits exactly once.

Five rules define a valid solution:

  1. Givens: pre-filled cells keep their values
  2. Cell range: each cell holds exactly one value from 1 to 9
  3. Row: each row contains all nine values, no repeats
  4. Column: each column contains all nine values, no repeats
  5. Block: each 3x3 block contains all nine values, no repeats

The rules are easy to state. The question is how to tell a solver about them.

In code, we represent the board as a 2D list board[r][c] where r is the row (0 to 8) and c is the column (0 to 8). A given cell holds its value (1 to 9). A blank cell holds 0.

Both our SAT and SMT encodings solve this puzzle instantly. The solved board (values found by the solver in blue):

The SAT encoding

In pure SAT there are no integers. We need a way to represent "cell (r, c) has value v" using only booleans. The standard approach: create a boolean variable xr,c,v for every cell and every possible value. If xr,c,v is true, cell (r, c) holds value v.

# If n = 3, then size = 9; if n = 4, then size = 16, etc.
size = n * n

# Create the solver and the variables.
s = Solver()

# Create one boolean variable per row, column, and value.
# x[r][c][v] = "cell (r,c) has value v"
# v is 0-indexed internally (0 to 8); we convert at I/O
x = [[[
  Bool(f'x_{r}_{c}_{v}')
  for v in range(size) ]
  for c in range(size) ]
  for r in range(size) ]

That is a 9×9×9 matrix: 9 rows, 9 columns, 9 possible values. 729 boolean variables for an 81-cell puzzle. Now we encode each rule as clauses over these variables.

Givens. If cell (r, c) has a given value g (where board[r][c] is 1-indexed), we assert the corresponding boolean variable as a unit clause. The board uses values 1 to 9; our variables are 0-indexed, so we subtract 1:

xr,c,g1
for r in range(size):
    for c in range(size):
        if board[r][c] != 0:
            s.add(x[r][c][board[r][c] - 1])

Cell range. Each cell has at least one value (a disjunction over all values for that cell) and at most one value (pairwise exclusion over all values for that cell). For each cell (r, c):

At least one:

vxr,c,v

At most one:

v1<v2¬xr,c,v1¬xr,c,v2
# At least one value per cell
# x[r][c] is a list of 9 booleans (one per value).
# Z3's Or() accepts a list: Or([a, b, c]) means a ∨ b ∨ c.
for r in range(size):
    for c in range(size):
        s.add(Or(x[r][c]))

# At most one value per cell
for r in range(size):
    for c in range(size):
        for v1 in range(size):
            for v2 in range(v1 + 1, size):
                s.add(Or(Not(x[r][c][v1]), Not(x[r][c][v2])))

Rows. For each row r and value v, that value appears at least once in the row (a disjunction across columns) and at most once (pairwise exclusion across columns):

At least one per row:

cxr,c,v

At most one per row:

c1<c2¬xr,c1,v¬xr,c2,v
# At least one of each value per row
for r in range(size):
    for v in range(size):
        s.add(Or([x[r][c][v] for c in range(size)]))

# At most one of each value per row
for r in range(size):
    for v in range(size):
        for c1 in range(size):
            for c2 in range(c1 + 1, size):
                s.add(Or(Not(x[r][c1][v]), Not(x[r][c2][v])))

Columns. The same structure, iterating over rows instead of columns. For each column c and value v:

At least one per column:

rxr,c,v

At most one per column:

r1<r2¬xr1,c,v¬xr2,c,v
# At least one of each value per column
for c in range(size):
    for v in range(size):
        s.add(Or([x[r][c][v] for r in range(size)]))

# At most one of each value per column
for c in range(size):
    for v in range(size):
        for r1 in range(size):
            for r2 in range(r1 + 1, size):
                s.add(Or(Not(x[r1][c][v]), Not(x[r2][c][v])))

Blocks. The same pattern again. For each 3x3 block and each value, the value appears at least once and at most once among the nine cells of that block:

for br in range(n):
    for bc in range(n):
        cells = [ (br * n + r, bc * n + c)
                  for r in range(n)
                  for c in range(n) ]

        # At least one of each value per block
        for v in range(size):
            s.add(Or([x[r][c][v] for r, c in cells]))

        # At most one of each value per block
        for v in range(size):
            for i in range(len(cells)):
                for j in range(i + 1, len(cells)):
                    r1, c1 = cells[i]
                    r2, c2 = cells[j]
                    s.add(Or(Not(x[r1][c1][v]),
                             Not(x[r2][c2][v])))

Every rule follows the same pattern: a disjunction for "at least one" and pairwise exclusion for "at most one." The full SAT encoding produces 729 boolean variables and over 12,000 clauses. Some of these constraints are redundant. We include all of them to make the encoding's intent clear. Which ones could be dropped is a good exercise.

Z3 does provide an AtMost constraint that would simplify the "at most one" clauses. But AtMost is a pseudo-boolean constraint, not pure SAT. Using it would already be stepping beyond booleans. The pairwise encoding here is what a pure SAT solver actually receives.

The SMT encoding

With integer variables, we have one variable per cell instead of one per cell-value pair:

size = n * n
s = Solver()

# One integer variable per cell
cells = [[
    Int(f'c_{r}_{c}')
    for c in range(size) ]
    for r in range(size) ]

That is a 9×9 matrix. 81 integer variables for an 81-cell puzzle. Two things changed. First, the variables are integers, not booleans. Since the puzzle is about integers, and the solver understands integers, the encoding is direct: no need to decompose values into booleans. Second, the integer theory gives us Distinct(), a built-in constraint that asserts all its arguments take different values. This is not cheating. It is exactly what you get by moving from SAT to SMT: the theory provides richer primitives that match the structure of the problem. Distinct() takes a list of variables of the same sort and asserts they all take different values. Now the same five rules:

Givens. A pre-filled cell is an equality constraint. No index conversion needed since the solver works with integers directly:

for r in range(size):
    for c in range(size):
        if board[r][c] != 0:
            s.add(cells[r][c] == board[r][c])

Cell range. Each cell is an integer between 1 and 9:

for r in range(size):
    for c in range(size):
        s.add(cells[r][c] >= 1)
        s.add(cells[r][c] <= size)

Rows. Each row has all different values:

for r in range(size):
    s.add(Distinct(cells[r]))

Columns. Each column has all different values:

for c in range(size):
    s.add(Distinct([cells[r][c] for r in range(size)]))

Blocks. Each 3x3 block has all different values:

for br in range(n):
    for bc in range(n):
        block = [ cells[br * n + r][bc * n + c]
                  for r in range(n)
                  for c in range(n) ]
        s.add(Distinct(block))

Distinct() handles both "at least one" and "at most one" in a single constraint. The full SMT encoding uses 81 variables and 219 constraints. To be fair, each Distinct() constraint asks more of the solver than a single boolean clause does: the solver needs equality reasoning machinery to handle it. But the Python code we have to write is much shorter and simpler, and for many problems that tradeoff is worth making. The right choice depends on the problem and the team building the solution.

The contrast

Both encodings solve the same puzzle and produce the same answer. The difference is in the reduction.

The SAT encoding fights a mismatch: the problem is about integers, but the solver only understands booleans. The programmer has to build the integer representation (729 boolean variables for 81 cells) and manually decompose "all different" into pairwise exclusion clauses (over 12,000 of them).

The SMT encoding has no mismatch. The problem is about integers, and the solver understands integers. Each cell is one variable. "All different" is one call to Distinct(). The encoding says what it means.

A simpler reduction is easier to get right. When the encoding is 20 lines instead of 80, there are fewer places for the kind of mistake where the solver gives you a correct answer to the wrong question.

Abstraction for Performance

The sudoku comparison showed that theories can make encoding easier. Theories can also let you hide details that do not matter. When the solver does not have to reason about the bits, it can answer questions much faster, and sometimes questions it would otherwise be unable to answer at all. You get to reason at the right level of detail for your actual problem.

The problem

Consider two functions on 64-bit bitvectors:

def sq(y):
    return y * y

def sqabs(y):
    return abs(y) * abs(y)

Are they equivalent for all 264 inputs?

To ask Z3 this question, we need to express the computations symbolically. We declare y as a symbolic 64-bit bitvector, then build Z3 expressions for y * y and abs(y) * abs(y). Z3 does not have abs, so we write our own bvabs using If:

BW = 64

def bvabs(y):
    return If(y < 0, -y, y)

y = BitVec('y', BW)

sq    = y * y
sqabs = bvabs(y) * bvabs(y)

Now sq and sqabs are symbolic expressions, not concrete values. They represent the computation of y * y and bvabs(y) * bvabs(y) for an unknown y.

How to ask the question

Mathematically, you might write our goal as:

y. sq(y)=sqabs(y)

That is, "for all y, sq(y) = sqabs(y)." You can encode this directly in Z3 as:

s.add(ForAll([y], sq == sqabs))

This version does work at small bit widths. To decide a closed ForAll query, Z3 uses a technique called model-based quantifier instantiation (MBQI): it proposes a candidate model for the ground part of the problem, checks whether the quantified body holds in that model, and uses any counterexample as a concrete instantiation to refine. The loop is incomplete in general, and over bit-vector multiplication every refinement step requires a fresh bit-blasted check, so it scales poorly in the width. As the bit width grows, Z3 gives up:

Bit width Result Time
8 sat 0.02s
16 sat ~4s
32 unknown (times out)
64 unknown (times out)

We have seen sat and unsat before. unknown is a third possible result from check(). It means the solver could not decide the question. It is not saying the formula is true or false. It is saying "I gave up." For quantified queries over expensive theories, unknown is common. When you hit it, you have to find a different way to ask the question.

Even more interesting: if you rewrite the query by hand using the logical moves we walk through below, Z3 handles all four bit widths comfortably (well under a second up through 64 bits, compared to a timeout here). The rewrite is a logically equivalent transformation the solver could in principle do on its own. In practice, the manual version sidesteps Z3's quantifier heuristics and lands us in territory the solver handles efficiently.

To understand why, take a step back. So far we have only worked with formulas that have no quantifiers: the propositional formulas in Weeks 1 and 2, and now Z3 expressions over integer and bitvector variables. These all live in what is called the quantifier-free fragment of first-order logic. The quantifier-free fragment is decidable: there is an algorithm that always terminates with a yes-or-no answer. SAT and SMT solvers exploit this. The problem is hard (NP-complete for many theories), but often tractable in practice.

Once you add quantifiers, the picture changes dramatically. Validity in full first-order logic is undecidable (Church and Turing, 1936-37). No algorithm can correctly answer every quantified query in general. Z3 does support quantifiers, but through incomplete heuristics: it may succeed, or it may not. With bitvectors and multiplication, "may not" is the common case. We will come back to all this in Week 5.

So we need a different way to ask the question. Fortunately we have one. It takes two steps.

Step 1: quantifier negation. A standard fact about how and relate says that "for all y, P(y)" and "there is no y where ¬P(y)" are the same statement:

y. P(y)¬y. ¬P(y)

"P holds for every input" and "no input makes P fail" are two ways of saying the same thing. This is pure quantifier logic, independent of validity or satisfiability. Notice that the right side still has a quantifier. We have not escaped quantified logic yet.

Step 2: check for unsat with a free variable. Now we apply the duality from Week 1: a formula is valid if and only if its negation is unsatisfiable. Our goal is y. P(y), which is the same as ¬y. ¬P(y) by step 1. Showing that is valid is the same as showing its negation y. ¬P(y) is unsatisfiable:

y. P(y) validy. ¬P(y) unsat

We have turned "prove valid for all inputs" into "show unsatisfiable." And now the convenient part: when you hand Z3 a formula with a free variable and call s.check(), Z3 searches for an assignment to that variable that makes the formula true. Calling check() on ¬P(y) with y free is implicitly asking: does there exist a y such that ¬P(y) is true? This move (replacing a bound variable with a free one and letting the solver hunt for an assignment) is a lightweight form of what logicians call Skolemization.

Putting it all together: to check y. P(y), we hand Z3 ¬P(y) with y free. If Z3 says unsat, there is no y satisfying ¬P(y), so the original universal claim holds. If Z3 says sat, it has found a counterexample.

The rewritten query is in the quantifier-free fragment. No ForAll, no Exists, just a formula with a free variable. It is decidable. Solvers handle it well.

s = Solver()
s.add(Not(sq == sqabs))

result = s.check()
# sat   -> found a counterexample, property is violated
# unsat -> no counterexample exists, property holds for ALL inputs

UNSAT often means verified

To check that a property P holds for all inputs, hand Z3 Not(P) with the inputs as free variables and call s.check(). If Z3 returns unsat, no input violates P, so P is verified. If Z3 returns sat, the model is a counterexample.

This is not a Z3 performance trick. It is how solver-based verification works. When the solver returns unsat, it has established that no input in the entire space violates the property. In this style of verification, UNSAT is how the solver tells you "verified." Rosette's verify, CBMC, and bounded model checkers all work this way: they search for counterexamples and report success when none exist.

Not(property) followed by checking for unsat is the verification pattern we will use for the rest of the course.

Attempt 1: full bitvector semantics

With the counterexample formulation and real 64-bit multiplication:

s = Solver()
s.add(Not(sq == sqabs))
result = s.check()   # unsat in ~1 second

Correct: the functions are equivalent. But it takes about a second. That is another reduction under the hood. Z3 bit-blasts the 64-bit multiply into a circuit of boolean gates and hands it to the same CDCL engine we saw in Week 2. At 64 bits the circuit is big, and even CDCL takes a while to chew through it.

Attempt 2: uninterpreted multiply, no axiom

In Attempt 1, Z3 had to reason about bitvector multiplication all the way down to the bits. Z3 lets us hide that detail from the solver by declaring an uninterpreted function: umul, a function symbol with a fixed signature and no fixed meaning. Z3 knows umul takes two 64-bit values and returns a 64-bit value, and nothing else. It is free to pick any function that fits.

BV = BitVecSort(BW)

# umul takes two 64-bit BVs and returns a 64-bit BV.
# Z3 knows nothing else about it.
umul = Function('umul', BV, BV, BV)

Now rebuild sq and sqabs using umul instead of the built-in *, and ask the same counterexample question as before:

sq    = umul(y, y)
sqabs = umul(bvabs(y), bvabs(y))

s = Solver()
s.add(Not(sq == sqabs))
result = s.check()   # sat in 0.01 seconds

Fast. But wrong. check() returned sat, which means Z3 found a counterexample: a value of y where sq(y) differs from sqabs(y) in this model. We can pull the values out and see:

m = s.model()

print(f"y        = {m[y]}")
print(f"sq(y)    = {m.eval(sq)}")
print(f"sqabs(y) = {m.eval(sqabs)}")

# y        = 12189670989262225408
# sq(y)    = 18446744073709551615
# sqabs(y) = 0

m[y] gives the value Z3 picked for our symbolic variable. m.eval(expr) evaluates any expression in the model, including one that uses an uninterpreted function like umul.

In this "counterexample" sq(y) and sqabs(y) really do come out different. But we know the real functions are equivalent. For any y, y×y equals (y)×(y). So what happened?

Z3 is free to assign any function it likes to umul. We declared umul as a two-argument function from bitvectors to bitvectors and said nothing else about it. So Z3 was free to pick a umul where umul(y, y) is the max 64-bit value and umul(-y, -y) is zero. We can ask the model for umul directly and see the function Z3 chose:

print(m[umul])

# [(12189670989262225408, 12189670989262225408) -> 18446744073709551615,
#  else -> 0]

That is the entire function: a one-entry lookup table plus a default. umul(y, y) maps to the max 64-bit value because Z3 wrote that entry. Every other input, including umul(-y, -y), falls through to the else branch and returns 0. That is enough to make sq(y) and sqabs(y) disagree, and the counterexample is self-consistent. It is just not consistent with real bitvector multiplication.

So sat here is honest and useless. Z3 answered the question we actually asked: is there some umul that breaks the equivalence? Yes. That is not the question we meant to ask.

If you get the reduction wrong, you get a correct answer to the wrong question.

Attempt 3: uninterpreted multiply with one axiom

The key property of real multiplication that makes sq and sqabs equivalent: squaring commutes with negation. y×y=(y)×(y) for any y. We tell the solver just this one fact:

axiom = umul(y, y) == umul(-y, -y)

s = Solver()
s.add(axiom)
s.add(Not(sq == sqabs))
result = s.check()   # unsat in 0.02 seconds

Fast and correct. The solver does not need to know anything else about multiplication. It reasons about equality between terms, one case at a time:

Either way, sq(y) and sqabs(y) land in the same equivalence class of terms, and the solver reports unsat for "can they ever be different?" This kind of reasoning (tracking which terms are equal to which other terms) is called congruence closure, and it is exactly what Theory phase is about after the break.

The scaling test

The full bitvector approach bit-blasts multiplication, so its cost grows with bit width. The uninterpreted function approach reasons about equality, so bit width barely matters:

Bit width Full BV UF + axiom
32 ~0.04s ~0.01s
64 ~0.6s ~0.03s
128 ~20s ~0.08s
256 unknown ~0.2s

At 256 bits, the full bitvector approach times out and Z3 returns unknown. The uninterpreted function approach finishes in about a fifth of a second. The proof Z3 finds here is term-level: it works on the shape of the expressions, not on the bit values. Z3 still does a little bit-level preprocessing on the BV arguments, which is why the UF + axiom times grow mildly with bit width. The growth is nothing like Full BV, which is what we would expect if the bits actually mattered for the proof.

The engineering skill

Choosing the right abstraction level is an engineering decision. Too concrete (full bitvector semantics) and the solver is slow. Too abstract (uninterpreted function with no axioms) and the solver gives a wrong answer. The right level of abstraction (uninterpreted function with the axioms your proof actually needs) is fast and correct.

This is the same tradeoff that shows up throughout software engineering: how much detail do you model? The solver is a tool. You decide what to tell it.

The SMT Architecture

We have used three theories so far today: integer arithmetic (the xkcd problem), bitvector arithmetic (sq and sqabs), and equality with uninterpreted functions (the umul move). Each of these is a separate reasoning system inside Z3. How do they all work together?

The picture of an SMT solver is a boolean skeleton plus a collection of specialized theory solvers:

graph TD
    accTitle: SMT solver architecture
    accDescr: An SMT solver routes the boolean structure of a formula to a CDCL engine and its theory literals to specialized theory solvers, one per theory.

    F["Query
(logical formula)"] F --> B["Boolean
skeleton"] F --> L["Theory
literals"] B --> CDCL["CDCL
(Week 2)"] L --> T1["Integer
arithmetic"] L --> T2["Bitvector
arithmetic"] L --> T3["Arrays"] L --> T4["Equality +
uninterpreted
functions"] CDCL <-.-> T1 CDCL <-.-> T2 CDCL <-.-> T3 CDCL <-.-> T4

The CDCL engine from Week 2 handles the boolean structure: the ands, ors, and nots connecting everything together. Each theory solver handles conjunctions of literals in its own domain: the integer solver knows about +, *, <, =; the bitvector solver knows about bvadd, bvmul, and bit-blasting; the array solver knows about select and store; the equality solver knows about = and uninterpreted function symbols.

Every theory solver presents the same interface to the CDCL engine: "give me a conjunction of literals in my theory, and I will tell you whether it is satisfiable." The CDCL engine does not need to know anything else about the theory.

How the CDCL engine and the theory solvers cooperate — who talks to whom, when, and in what order — is the architecture of SMT. It is called DPLL(T), and it is Week 5 material. Today we are going to look inside one of these theory solvers: the equality solver. It is the simplest one, and it is the foundation that the others all build on.

What We Learned

Theories let you talk to the solver in the language of your problem. Today's key ideas:

How Does Z3 Decide This?

Before we go to break, one more question. Consider this formula:

f3(a)=a  f5(a)=a  f(a)a

Here f is a completely uninterpreted function and a is a constant in some uninterpreted sort. Z3 knows nothing about f: it could be anything. Is there an assignment of f and a that satisfies all three constraints at once? Think about it for a minute before reading on.

In Z3 we need one more piece we have not used yet: an uninterpreted sort. Declaring S = DeclareSort('S') tells Z3 "there is some set S; I do not care what is in it." Then a is an element of S, and f is a function from S to S. Nothing about S is fixed; Z3 is free to pick any set it likes when it builds a model.

S = DeclareSort('S')
f = Function('f', S, S)
a = Const('a', S)

# Build nested applications step by step
f1 = f(a)
f2 = f(f1)
f3 = f(f2)
f4 = f(f3)
f5 = f(f4)

s = Solver()
s.add(f3 == a)
s.add(f5 == a)
s.add(f1 != a)

result = s.check()   # unsat in ~5 milliseconds

Z3 says unsat. No such f and a exist. And it figures this out in about five milliseconds. Remember that f is uninterpreted: Z3 did not enumerate all possible functions, or all possible values for a. It reasoned about the equality structure of the formula and determined that the constraints contradict each other.

How? That is Theory phase. The algorithm is called congruence closure, and it is what we will spend the next fifty minutes on.

Demo Code

All demo files for this phase are in the course code repo. Clone the repo and run them locally with python3 sudoku-smt.py etc. (requires z3-solver; see Setup).