Practice

Two live demos before any theory: an NP-complete puzzle and a real verification bug.

The Mental Model

You already know how to declare what you want. SQL says find rows matching these conditions. Solvers extend this idea to correctness: find me all inputs where this property breaks, or prove it never breaks.

The tool we use is Z3, a solver built at Microsoft Research and now widely used in industry. AWS runs billions of solver queries per day.

Warm-Up: The Restaurant Problem

xkcd #287 poses a question: which combinations of appetizers sum to exactly $15.05?

This is a variant of the subset-sum problem, which is NP-complete in general. Z3 answers it in milliseconds.

xkcd287.py

from z3 import Int, Or, Solver, sat

# Menu prices in cents (avoids floating point).
menu = {
    'mixed_fruit':       215,
    'french_fries':      275,
    'side_salad':        335,
    'hot_wings':         355,
    'mozzarella_sticks': 420,
    'sampler_plate':     580,
}

# How many of each item do we order?
order = {name: Int(name) for name in menu}

s = Solver()

# Each quantity is non-negative.
for item in order.values():
    s.add(item >= 0)

# Total must be exactly $15.05.
s.add(sum(order[name] * price for name, price in menu.items()) == 1505)

# Find ALL solutions.
while s.check() == sat:
    m = s.model()
    items = [(name, m[order[name]].as_long())
             for name in menu if m[order[name]].as_long() > 0]
    desc = ', '.join(f'{c}x {name}' for name, c in items)
    print(f"  {desc}")
    # Block this solution and search for the next.
    s.add(Or(*[order[name] != m[order[name]] for name in menu]))

Running it:

  7x mixed_fruit
  1x mixed_fruit, 2x hot_wings, 1x sampler_plate

Two solutions. The solver found both and proved there are no others.

The pattern is: create variables, add constraints, call check(). If it returns sat, call model() to read out values. The blocking loop at the end adds a constraint that rules out the current solution. It is a standard technique for enumerating all solutions.

(Credit: XKCD #287 by Randall Munroe; demo idea from Dennis Yurichev, SAT/SMT by Example, p. 47.)

The Real Demo: Verify, Debug, Synthesize

Two engineers each implement unsigned 32-bit division by 2. Are their implementations equivalent?

bvudiv2.py

from z3 import BitVec, BitVecVal, UDiv, LShR, Solver, ForAll, sat

def bvudiv2(x):
    """Unsigned division by 2 (reference implementation)."""
    return UDiv(x, BitVecVal(2, 32))

def bvudiv2_a(x):
    """Unsigned division by 2 via arithmetic right shift."""
    return x >> 1

Act 1: Verify

We ask Z3 to find any 32-bit input where the two implementations disagree:

bvudiv2.py

x = BitVec('x', 32)
s = Solver()
s.add(bvudiv2(x) != bvudiv2_a(x))

if s.check() == sat:
    cex = s.model()
    xval = cex[x].as_long()
    print(f"NOT equivalent! Counterexample: x = {xval}")
    print(f"bvudiv2(x)   = {cex.evaluate(bvudiv2(x))}")
    print(f"bvudiv2_a(x) = {cex.evaluate(bvudiv2_a(x))}")

Output:

NOT equivalent! Counterexample: x = 2147483648
bvudiv2(x)   = 1073741824
bvudiv2_a(x) = 3221225472

The solver checked all 4,294,967,296 possible 32-bit inputs and found a disagreement in milliseconds.

Act 2: Debug

The counterexample is $x = 2^{31} = 0x80000000$ , the bit pattern with only the top bit set. As unsigned: 2147483648. As signed: $- 2^{31}$ (INT_MIN).

Notice the two outputs: bvudiv2 returns $2^{30} = 1073741824$ (correct: $2^{31} \div 2 = 2^{30}$ ), but bvudiv2_a returns $3221225472$ . Where does that wrong value come from?

The top bit of 0x80000000 is 1. Python's >> on a Z3 BitVec is arithmetic right shift: it copies the top bit into vacated positions. So 0x80000000 >> 1 = 0xC0000000 = 3221225472. Logical right shift fills vacated positions with zero: LShR(0x80000000, 1) = 0x40000000 = 1073741824. That is the answer we want.

We fix it and verify:

bvudiv2.py

def bvudiv2_fix(x):
    return LShR(x, 1)   # logical right shift

s2 = Solver()
s2.add(bvudiv2(x) != bvudiv2_fix(x))
print("Correct!" if s2.check() != sat else "Still broken.")

Output:

Correct!

The solver verified our fix holds for all $2^{32}$ inputs.

Act 3: Synthesize

Can the solver find the correct shift amount, rather than us supplying it?

bvudiv2.py

shift_amount = BitVec('shift_amount', 32)
s3 = Solver()
s3.add(ForAll([x], bvudiv2(x) == LShR(x, shift_amount)))

if s3.check() == sat:
    m = s3.model()
    print(f"Found it! LShR(x, {m[shift_amount]}) is correct for all inputs.")

Output:

Found it! LShR(x, 1) is correct for all inputs.

ForAll quantifies over all possible inputs. The solver searches for a value of shift_amount such that the equivalence holds universally. We will not explain ForAll in depth here. That is Week 8 material. For now, just notice that the same tool that finds counterexamples can also synthesize correct implementations.

Verify, Debug, Synthesize

Verify: assert a property, ask if it can be violated, get a counterexample or a proof
Debug: read the counterexample to understand why the bug exists
Synthesize: leave a hole in your program, assert correctness, let the solver fill the hole

Same tool throughout. By Week 8 you will do all three for your own programs. The next phase explains the mechanics: how a solver actually decides satisfiability.

Common Pitfalls

>> is arithmetic shift on BitVec. Python's >> on a Z3 BitVec is arithmetic right shift. This demo is a worked example of exactly why that matters. Use LShR when you want logical right shift.

Int variables are unbounded integers, not machine integers. z3.Int('x') creates a variable over the mathematical integers — no overflow, no bitwidth. Use BitVec('x', 32) when you mean a 32-bit quantity. Mixing them up produces formulas about the wrong type of number.

If your encoding is wrong, the solver's answer is meaningless. The solver finds inputs satisfying the constraints you gave it. If those constraints do not faithfully encode your actual problem, the result tells you nothing about that problem. Every verification result is only as good as its encoding.

This is the central risk in solver-based reasoning: a correct answer to the wrong question. The solver cannot tell you whether your encoding is right. That judgment requires you.

Full Demo Code

The complete demo files for this phase. Copy them locally and run them with python xkcd287.py etc. (requires z3-solver; see Setup).

xkcd287.py

"""XKCD 287: the NP-complete restaurant order.

"We'd like exactly $15.05 of appetizers, please."
Find combinations of menu items that sum to exactly $15.05.

This is a subset-sum / knapsack problem — NP-complete in general.
Z3 solves it in milliseconds. Then we enumerate ALL solutions.

Credit: XKCD comic by Randall Munroe (xkcd.com/287).
        Demo idea from Dennis Yurichev, SAT/SMT by Example (p.47).
"""

from z3 import Int, Or, Solver, sat

# Menu prices in cents (avoids floating point).
menu = {
    'mixed_fruit':       215,
    'french_fries':      275,
    'side_salad':        335,
    'hot_wings':         355,
    'mozzarella_sticks': 420,
    'sampler_plate':     580,
}

# How many of each item do we order?
order = {name: Int(name) for name in menu}

s = Solver()

# Each quantity is non-negative.
for item in order.values():
    s.add(item >= 0)

# Total must be exactly $15.05.
s.add(sum(order[name] * price for name, price in menu.items()) == 1505)

# Find ALL solutions.
while s.check() == sat:
    m = s.model()
    items = [(name, m[order[name]].as_long())
             for name in menu if m[order[name]].as_long() > 0]
    desc = ', '.join(f'{c}x {name}' for name, c in items)
    total = sum(c * menu[name] for name, c in items)
    print(f"  {desc}  (${total / 100:.2f})")

    # Block this solution and search for the next one.
    s.add(Or(*[order[name] != m[order[name]] for name in menu]))

bvudiv2.py

"""bvudiv2: verify, debug, synthesize — the three-act solver demo.

Two implementations of unsigned 32-bit division by 2:
  bvudiv2   — uses Z3's built-in UDiv (correct)
  bvudiv2_a — uses arithmetic right shift (buggy!)

Act 1 (Verify):     "Are they equivalent for ALL 32-bit inputs?"
Act 2 (Debug):      "Where exactly do they disagree?"
Act 3 (Synthesize): "Can the solver FIND the correct shift?"
"""

from z3 import BitVec, BitVecVal, UDiv, LShR, Solver, ForAll, sat

# ── The two implementations ─────────────────────────────────────────

def bvudiv2(x):
    """Unsigned division by 2 (reference implementation)."""
    return UDiv(x, BitVecVal(2, 32))

def bvudiv2_a(x):
    """Unsigned division by 2 via arithmetic right shift (BUGGY)."""
    return x >> 1  # arithmetic shift — sign-extends the top bit!

# ── Act 1: Verify ───────────────────────────────────────────────────

print("=== Act 1: Verify ===")
x = BitVec('x', 32)
s = Solver()
s.add(bvudiv2(x) != bvudiv2_a(x))

if s.check() == sat:
    cex = s.model()
    xval = cex[x].as_long()
    print(f"  NOT equivalent! Counterexample: x = {xval}")
    print(f"  bvudiv2(x)   = {cex.evaluate(bvudiv2(x))}")
    print(f"  bvudiv2_a(x) = {cex.evaluate(bvudiv2_a(x))}")
else:
    print("  Equivalent for all inputs.")

# ── Act 2: Debug ────────────────────────────────────────────────────

print(f"\n=== Act 2: Debug ===")
print(f"  x = {xval} = 2^31 = 0x80000000")
print(f"  As signed 32-bit: {xval if xval < 2**31 else xval - 2**32}  (INT_MIN)")
print(f"  The top bit is 1.")
print(f"  Arithmetic right shift (>>) copies the top bit: 0x80000000 >> 1 = 0xC0000000")
print(f"  Logical right shift fills with 0:               0x80000000 >> 1 = 0x40000000")
print(f"  For unsigned division by 2, we want 0x40000000 = 2^30 = {2**30}.")

def bvudiv2_fix(x):
    """Unsigned division by 2 via logical right shift (correct)."""
    return LShR(x, 1)

s2 = Solver()
s2.add(bvudiv2(x) != bvudiv2_fix(x))
if s2.check() == sat:
    print(f"  Still broken!")
else:
    print(f"  LShR(x, 1) is correct for all 2^32 inputs.")

# ── Act 3: Synthesize ───────────────────────────────────────────────
# Uses ForAll — Week 8 material. For now, just watch.

print(f"\n=== Act 3: Synthesize ===")
shift_amount = BitVec('shift_amount', 32)
s3 = Solver()
s3.add(ForAll([x], bvudiv2(x) == LShR(x, shift_amount)))

if s3.check() == sat:
    m = s3.model()
    print(f"  Found it! LShR(x, {m[shift_amount]}) is correct for all inputs.")
else:
    print(f"  No solution.")