(Week 7)

Practice

A small absolute-value function with an INT_MIN overflow, a chain of assignments where naive encoding blows up, a loop checked by bounded model checking, and the Zune Y2K bug.

The verification spectrum

Approaches to checking whether a program does what you want form a spectrum from cheap and shallow to expensive and certain. The trade-off is effort against the strength of what you can claim.

Ad-hoc tests run a handful of inputs and inspect the results by eye. They are cheap to write and they find shallow bugs on the inputs they cover. They say nothing about the inputs you did not test.

Property-based testing generates random inputs that satisfy some precondition and checks an output property. The engineer writes the property and the framework generates the inputs. Coverage improves on a per-property basis, but the framework still says nothing about inputs it did not generate.

Symbolic execution and bounded model checking reason about each path through the program symbolically and consider entire equivalence classes of inputs at once. With loop unrolling at depth k, these tools prove no bug occurs within k iterations of any loop, and make no claim beyond. This is what we build today.

Static analysis abstracts the program at the type or value-flow level and reasons over the abstraction. The abstraction is sound by design but loses precision, so real programs trigger many false alarms.

Interactive proof assistants check a formal proof the engineer writes by hand. The proof obligation can require significant annotation and expertise. In exchange, the engineer gets a machine-checked correctness claim that holds without any bound.

What we build today lives in the middle band: fully automatic, sound up to a loop-unrolling bound, aimed at finding bugs rather than producing total-correctness proofs. Where you sit on the spectrum determines how much annotation the tool needs from you. The rest of Practice builds the engine, exercises it on a few short programs, and applies it to a real bug from a shipped device.

The Zune freeze

On December 31, 2008, every Zune 30GB media player in the world froze for 24 hours. The firmware contained a ConvertDays routine that converted "days since January 1, 1980" into a year by decrementing days year by year. A stripped-down model of the routine:

zune_days.py

def zune_days(days, is_leap):
    while days > 365:
        if is_leap == 1:
            if days > 366:
                days = days - 366
        else:
            days = days - 365
    return days

The devices stayed bricked until January 1, 2009, when the calendar advanced past midnight on its own. By the end of Practice, we will have built a tool that takes a function like this one, hands it to Z3, and gets back the exact input that triggers the bug.

Symbolic execution by hand

Before automating anything, run symbolic execution by hand on a small program.

swap.py

def f(x, y):
    if x > y:
        x = x + y
        y = x - y
        x = x - y
        if x - y > 0:
            assert False
    return (x, y)

The outer if x > y guards a three-line in-place swap. After the swap, the inner if x - y > 0 would trip an assertion. Symbolic execution decides whether any input can actually reach that assertion.

Start with symbols, not numbers

Give each input a fresh symbolic name: x becomes the symbol A, y becomes the symbol B. The symbolic state is a map from variable name to symbolic expression:

graph TD
    accTitle: Initial symbolic state for the swap example
    accDescr: A single node showing the initial state map x maps to A and y maps to B.
    s0["x ↦ A
y ↦ B"]

Nothing has executed yet. The two inputs are unconstrained symbols.

Fork at the first branch

The first statement is if x > y. We evaluate the guard in the current state: substituting x ↦ A and y ↦ B, the guard becomes A > B. There are two ways the program can go from here, so we fork.

graph TD
    accTitle: First branch of the execution tree
    accDescr: Initial state at the top forks into two children. The left edge is labeled A greater than B and leads to an unchanged state. The right edge is labeled A less than or equal to B and leads to a feasible leaf that returns the inputs unchanged.
    s0["x ↦ A
y ↦ B"]
    s0 -->|A > B| s1["x ↦ A
y ↦ B"]
    s0 -->|A ≤ B| ok1["x ↦ A
y ↦ B
(return)"]
    classDef feasible fill:#d9f5c5,stroke:#5a8a3a
    class ok1 feasible

On the right branch (A ≤ B), the body is skipped and control falls to return (x, y). The state is still x ↦ A, y ↦ B. This leaf is feasible: any input with A ≤ B reaches it.

The left branch enters the swap.

Walk through the assignments, assuming `A > B`

Now follow the left branch of the fork: the case where A > B holds. Once inside this branch, the body executes three assignments in sequence. At each step, we substitute the current symbolic expression for each variable on the right-hand side, simplify, and update the state.

graph TD
    accTitle: State after each assignment in the A greater than B branch
    accDescr: A chain of three state updates labeled with the path condition A greater than B. Starting state x maps to A, y maps to B. After x equals x plus y, the state is x maps to A plus B, y maps to B. After y equals x minus y, the state is x maps to A plus B, y maps to A. After x equals x minus y, the state is x maps to B, y maps to A.
    s1["path condition: A > B
state: x ↦ A, y ↦ B"]
    s1 -->|x = x + y| s2["x ↦ A+B
y ↦ B"]
    s2 -->|y = x - y| s3["x ↦ A+B
y ↦ A"]
    s3 -->|x = x - y| s4["x ↦ B
y ↦ A"]

The substitution at each step:

x = x + y. Substitute into the right-hand side using x ↦ A, y ↦ B: the new value of x is A + B. Update the state to x ↦ A+B, y ↦ B.
y = x - y. Substitute into the right-hand side using x ↦ A+B, y ↦ B: the new value of y is (A+B) - B, which simplifies to A. Update the state to x ↦ A+B, y ↦ A.
x = x - y. Substitute into the right-hand side using x ↦ A+B, y ↦ A: the new value of x is (A+B) - A, which simplifies to B. Update the state to x ↦ B, y ↦ A.

After the three assignments, x holds the symbolic expression B and y holds A. The values have swapped, exactly as the integer arithmetic would predict at runtime. The path condition is still just A > B: straight-line code adds nothing to it.

Fork at the second branch

The next statement is if x - y > 0. We evaluate the guard in the current state. With x ↦ B and y ↦ A, the guard becomes B - A > 0. The fork produces two more children, completing the tree.

graph TD
    accTitle: Full execution tree for the swap example
    accDescr: The complete execution tree. The root state x maps to A, y maps to B forks on A greater than B versus A less than or equal to B. The right branch reaches a feasible leaf returning the unchanged state. The left branch passes through three assignments. After x equals x plus y, the state is x maps to A plus B, y maps to B. After y equals x minus y, the state is x maps to A plus B, y maps to A. After x equals x minus y, the state is x maps to B, y maps to A. From there, fork on B minus A greater than 0 versus less than or equal to 0. The B minus A greater than 0 leaf would reach assert false but is infeasible because the path condition requires both A greater than B and B greater than A. The B minus A less than or equal to 0 leaf is feasible and returns the swapped pair.
    s0["x ↦ A
y ↦ B"]
    s0 -->|A > B| s1["x ↦ A
y ↦ B"]
    s0 -->|A ≤ B| ok1["x ↦ A
y ↦ B
(return)"]
    s1 -->|x = x + y| s2["x ↦ A+B
y ↦ B"]
    s2 -->|y = x - y| s3["x ↦ A+B
y ↦ A"]
    s3 -->|x = x - y| s4["x ↦ B
y ↦ A"]
    s4 -->|B - A > 0| bad["x ↦ B
y ↦ A
(assert false)"]
    s4 -->|B - A ≤ 0| ok2["x ↦ B
y ↦ A
(return)"]
    classDef feasible fill:#d9f5c5,stroke:#5a8a3a
    classDef infeasible fill:#f5c7c5,stroke:#a04040
    class ok1,ok2 feasible
    class bad infeasible

The tree has three leaves. Reading each one's path condition (the conjunction of all guards on the way down from the root):

A ≤ B. Feasible. Returns (A, B), the inputs unchanged.
A > B ∧ B - A ≤ 0. Feasible. Returns (B, A), the swapped pair.
A > B ∧ B - A > 0. This is where assert False would fire. But the two guards together require A > B (from the outer fork) and B > A (rearranging B - A > 0). No input satisfies both. The leaf is infeasible.

The assert False is unreachable because the path that would reach it has an unsatisfiable path condition. The function is safe on every input.

What the picture defined

Three terms came out of drawing the tree:

Symbolic state. A map from variable name to a symbolic expression. At each node, the state records what we know about every variable in scope. Assignments update the state by substituting into the right-hand side.
Path condition. The conjunction of guards along the path from the root to a node. The path condition for the bad leaf is (A > B) ∧ (B - A > 0). A leaf is feasible when its path condition has a satisfying assignment and infeasible when it does not.
Execution tree. The whole picture. Branches in the program produce forks in the tree. Straight-line statements extend the current path. Leaves correspond to return statements and assertion sites.

We just did symbolic execution by hand. Mechanically, the same four steps repeat: walk the program's AST, fork at branches, update the state at assignments, and check each leaf's path condition for satisfiability. The fourth step is what we use Z3 for.

Another way to describe what we did: we compiled a Python function into one SMT formula per leaf of the tree. Mini IMP is the source language, Z3 expressions are the object code, and Z3 itself is the backend solver. The next sections automate this compilation.

A first SE engine

The engine automates the compilation we just did by hand. It accepts a Python subset called mini IMP: integers, assignment, if/else, while, return, assert, plus a marker assume(C) we will use later. Anything outside the subset raises a clear error from the engine.

We start the engine on a small absolute-value function and a one-line claim:

my_abs.py

def my_abs(x):
    if x < 0:
        r = -x
    else:
        r = x
    assert r >= 0
    return r

The claim looks obvious. We check it with symbolic execution.

Running the engine produces two paths. One is safe, the other reveals a counterexample at x = INT_MIN, where -x overflows back to INT_MIN and the assertion fails. The rest of this section walks through how the engine got there.

An interpreter on symbolic state

An ordinary interpreter walks the program with a concrete state: a map from variable name to a concrete value. At an assignment x = E, it evaluates E in the state and updates the state with the result.

The symbolic execution engine is just an interpreter on symbolic state: each variable maps to a Z3 expression instead of a concrete number. Each input parameter starts mapped to a fresh symbolic value (x ↦ x_0, etc.). Evaluating x + 1 in this state produces the Z3 expression x_0 + 1, not a number.

In pseudocode, the engine is three functions.

eval evaluates an expression against the current state and returns a Z3 expression:

def eval(expr, state):
    match expr:
        case Const(n):                  # 5, 0, True
            return n

        case Var(x):                    # x, r, n
            return state[x]

        case BinOp(left, op, right):    # x + 1, a * b
            return apply(op, eval(left, state), eval(right, state))

        case Compare(left, op, right):  # x < 0, n == 0
            return apply_cmp(op, eval(left, state), eval(right, state))

        case UnaryOp(op, operand):      # -x, not c
            return apply_unary(op, eval(operand, state))

Var(x) looks up x in the state and returns the Z3 expression currently bound to it. The recursive cases evaluate their sub-expressions the same way and combine the results. apply, apply_cmp, and apply_unary dispatch on the operator and emit the corresponding Z3 operation, sometimes simplifying along the way (0 + x_0 collapses to x_0, for instance).

execute_stmts is the engine's main loop. It threads a list of paths through a sequence of statements:

def execute_stmts(stmts, paths):
    """Thread paths through a sequence of statements.
       Each path is (state, pc). Returns the resulting paths."""
    for stmt in stmts:
        next_paths = []
        for (s, p) in paths:
            next_paths.extend(execute(stmt, s, p))
        paths = next_paths
    return paths

At each step, every current path passes through the statement. Branches multiply paths. Assignments update a single path's state and leave the list size unchanged. The list of paths at the end of the body is the union of all leaves the engine has reached.

execute handles a single statement. Each statement type has its own rule:

Assignment x = E: evaluate E against the current state, update the state to map x to the result.
If / else if C: ... else: ...: evaluate C, fork into two paths, extend each path's path condition with C or ¬C, then recurse on the branch body.
Assertion assert C: evaluate C, ask Z3 whether path condition ∧ ¬C is satisfiable. If yes, the assertion can fail on this path.
Return return E: record E's symbolic value as this path's return value.
While loop: covered in "Loops by bounded unrolling" below.

def execute(stmt, state, pc):
    match stmt:
        case Assign(x, expr):
            v = eval(expr, state)
            return [(state | {x: v}, pc)]

        case If(cond, then_body, else_body):
            c = eval(cond, state)
            return execute_stmts(then_body, [(state, pc + [c])]) \
                 + execute_stmts(else_body, [(state, pc + [Not(c)])])

        case Assert(cond):
            c = eval(cond, state)
            check_assert(pc, c)            # ask Z3 if pc ∧ ¬c is SAT; if so, report
            return [(state, pc + [c])]     # "check, then assume"

        case Return(expr):
            return [(state, pc, eval(expr, state))]

        case While(cond, body):
            ...      # see "Loops by bounded unrolling" below

The engine starts with the function's parameters bound to fresh symbolic values and an empty path condition. Calling execute_stmts(function_body, [(initial_state, [])]) produces the full list of leaves.

These are the moves we made by hand on the swap example, with Z3 doing the feasibility check at the end instead of us inspecting the path condition by eye.

Walking through `my_abs`

The engine starts my_abs with the initial state $σ_{0} = {x \mapsto x_{0}}$ , where $x_{0}$ is a fresh symbolic 32-bit BitVec, and an empty path condition.

Line 2: if x < 0:. Evaluating the guard in the current state gives the Z3 expression $x_{0} < 0$ . The engine forks into two paths.

Path	state	PC after line 2
A (then branch)	${x \mapsto x_{0}}$	$x_{0} < 0$
B (else branch)	${x \mapsto x_{0}}$	$\neg (x_{0} < 0)$

The body of each branch executes on its respective path.

Line 3 on Path A: r = -x. Evaluating -x in Path A's state gives $- x_{0}$ . The state updates to map r to $- x_{0}$ .

Path A	state	PC
after line 3	${x \mapsto x_{0}, r \mapsto - x_{0}}$	$x_{0} < 0$

Line 5 on Path B: r = x. Evaluating x in Path B's state gives $x_{0}$ . The state updates to map r to $x_{0}$ .

Path B	state	PC
after line 5	${x \mapsto x_{0}, r \mapsto x_{0}}$	$\neg (x_{0} < 0)$

Line 6: assert r >= 0. Each path evaluates the assertion in its state and asks Z3 whether PC ∧ ¬assertion is satisfiable.

For Path A:

s = Solver()
s.add(x0 < 0)              # PC
s.add(Not(-x0 >= 0))       # negation of the assertion
s.check()

For Path B:

s = Solver()
s.add(Not(x0 < 0))         # PC
s.add(Not(x0 >= 0))        # negation of the assertion
s.check()

Z3 returns:

Path A: sat, with model $x_{0} = - 2147483648 = - 2^{31}$ . The assertion can fail on this path.
Path B: unsat. The assertion holds on this path.

After the assertion check, the IVL convention is "check, then assume": the engine adds the assertion condition to the path condition. Downstream code may rely on $r \geq 0$ holding.

Line 7: return r. Each path records its return value and terminates. The complete output of symbolic_execute(my_abs):

Path 0:  PC = (x < 0),    return = -x0,  assertion CAN fail
    Counterexample: x = -2147483648  (INT_MIN = -2^31)
Path 1:  PC = ¬(x < 0),   return = x0,   assertion holds

In 32-bit two's complement, INT_MIN has no positive counterpart. Negating it overflows back to INT_MIN, which is negative, so assert r >= 0 fails on Path A. The "obvious" my_abs is wrong on exactly one input out of four billion.

This is the same shape as the bvudiv2 demo from L01: ask Z3 for a counterexample and read off the model. In L01 we wrote the equivalence formula by hand. Here the engine writes it directly from the source code, one query per path.

Encoding matters

The pseudocode substituted each right-hand side directly into the state. That works for my_abs. On slightly bigger programs the substituted expressions blow up. As with any compiler, the choice of intermediate representation matters.

Consider a chain of doubling assignments:

cascade.py

def cascade(x0):
    x1 = x0 + x0
    x2 = x1 + x1
    x3 = x2 + x2
    # ... up to xN
    return xN

Naive substitution

The naive encoding substitutes eagerly. After x1 = x0 + x0, the engine maps x1 directly to the Z3 expression x0 + x0. After x2 = x1 + x1, the engine substitutes:

x_{2} = (x_{0} + x_{0}) + (x_{0} + x_{0})

After x3 = x2 + x2:

x_{3} = ((x_{0} + x_{0}) + (x_{0} + x_{0})) + ((x_{0} + x_{0}) + (x_{0} + x_{0}))

The expression tree doubles in size per step. At depth N it has on the order of 2^N leaves, all of them the same symbol x0. Z3 still has to walk the whole tree to construct the formula.

Dynamic single assignment

The DSA encoding (the engine's default) introduces a fresh symbolic variable per assignment. x1 = x0 + x0 becomes a fresh variable x1_sym together with the equality constraint x1_sym == x0 + x0. The state maps the source variable x1 to the fresh symbol x1_sym. The size of any single expression stays small. The equalities accumulate in a side list called extra, kept on the path next to its pc and state.

At depth 3 the engine's state is:

state = {x0: x0, x1: x1_sym, x2: x2_sym, x3: x3_sym}
extra = [
    x1_sym == x0 + x0,
    x2_sym == x1_sym + x1_sym,
    x3_sym == x2_sym + x2_sym,
]
return value = x3_sym

The return value is a single variable. The extras are three equalities, each of constant size. The total formula has size linear in N.

Comparing sizes

Both encodings give Z3 the same answer to any query. They differ in formula size, which is what Z3 has to traverse to produce that answer.

N	naive size	DSA size
1	3	1
2	7	1
4	31	1
8	511	1
16	131,071	1

The naive expression at depth 16 has more nodes than there are letters on this page. The DSA expression at depth 16 is a single variable plus 16 small equality constraints.

The trick is called dynamic single assignment. It is the same idea LLVM uses in its SSA intermediate representation. Verification IRs like Why3 use it. The engine ships both modes and defaults to DSA.

One rule changes

Only the assignment rule from the earlier pseudocode changes. Instead of substituting the right-hand side into the state, the engine binds the variable to a fresh symbol and records the equality on a side list extra:

case Assign(x, expr):
    v = eval(expr, state)
    fresh = new_symbol(x)               # fresh Z3 var, e.g. x__7
    return [(state | {x: fresh}, pc, extra + [fresh == v])]

A path now carries (state, pc, extra). Every other rule from the pseudocode is unchanged. At an assertion, the engine asks Z3 about pc ∧ extra ∧ ¬assertion: the path condition, the accumulated DSA equalities, and the negated assertion.

A symbolic-execution engine, like any compiler, has IR choices that change the cost of the queries it produces. Production tools spend serious engineering effort here. Naive SE struggles with the assignment rule, and DSA is the standard fix. The full engine, in either mode, is about 300 lines. You could write something close in an afternoon.

Loops by bounded unrolling

So far the engine handles only straight-line and branchy code. Adding loops takes a different mechanism. A while is potentially unbounded, but Python source is finite. The standard approach is to unroll.

The unrolling transformation

The depth-k unrolling of while C do S rewrites the loop as a chain of k nested conditionals:

while C do S ⟶ \underset{k copies}{\underset{⏟}{if C then (S; if C then (S; \dots))}}

After unrolling, the program is loop-free, and the SE rules for if and assignment apply. Each "if C then" pair produces a fork: one path takes the body one more time, one path exits the loop.

For k = 3 and a loop body S, the rewritten form is:

if C: S
    if C: S
        if C: S

The engine forks at each level. Paths can exit at the first if, the second, or the third, depending on when the loop guard becomes false. Paths that still have C true after the third body get flagged bounded_out: the engine reports no result for them.

`sum_to_n` at increasing depths

A classic example:

sum_to_n.py

def sum_to_n(n):
    assume(n >= 0)
    s = 0
    i = 0
    while i < n:
        s = s + i
        i = i + 1
    assert s == n * (n - 1) // 2
    return s

The assertion claims that the loop computes the triangular number 0 + 1 + ... + (n-1) = n(n-1)/2. Run the engine at increasing depths:

k = 0:  2 paths  (1 completed, 1 bounded-out)   safe
k = 1:  3 paths  (2 completed, 1 bounded-out)   safe
k = 2:  4 paths  (3 completed, 1 bounded-out)   safe
k = 3:  5 paths  (4 completed, 1 bounded-out)   safe
k = 5:  7 paths  (6 completed, 1 bounded-out)   safe
k = 8: 10 paths  (9 completed, 1 bounded-out)   safe

At each k, the engine examines all complete paths (the loop exits within k iterations) and finds the assertion holds on each. The one bounded-out path at each depth (where n > k) is flagged but not checked.

The check is honest within the bound. The engine has proven the assertion holds for every input n with 0 <= n <= k. Beyond k, it makes no claim. For unbounded soundness, an alternative is to pick a loop invariant, cut the loop, and check at the level of the invariant rather than at every iteration.

Catching a bug

Now flip one character: change s = s + i to s = s + i + 1. Re-run BMC:

k = 0:  2 paths   safe
k = 1:  3 paths   BUG: 1 violation(s)
    counterexample: n = 1

At depth 1 the engine finds the bug. With n = 1, the loop runs once. After the iteration, s = 0 + 0 + 1 = 1. The claimed value is n*(n-1)//2 = 1*0//2 = 0. The assertion fails.

The bound k = 1 was enough. For this bug, a single loop iteration exposes the off-by-one.

When the bound becomes a proof

A bounded-out path carries the path condition "the input drove the loop guard true through k iterations." That PC might be satisfiable, or it might not be. A bounded-out path with an UNSAT PC is no path at all: no input can reach the bottom of the unrolling with the guard still true. When every bounded-out path is infeasible, BMC has covered every behavior of the program. The bound is no longer a bound, and the result is a full proof.

CBMC names this condition by adding an unwinding assertion at the bottom of the unrolling that says "the loop guard is false here." If that assertion is never violated under any input, the loop terminates within k iterations for every input. This is the same idea expressed as an assertion.

Compare sum_to_n with no upper limit on n against the same loop with assume(n <= 5). Count the feasible bounded-out paths at each depth: bounded-out paths whose PC Z3 reports SAT.

  k |  unbounded n |   n <= 5
 ---+--------------+----------
  0 |            1 |        1
  1 |            1 |        1
  2 |            1 |        1
  3 |            1 |        1
  4 |            1 |        1
  5 |            1 |        0
  6 |            1 |        0
  7 |            1 |        0

In the unbounded column the bounded-out path stays feasible at every k: for any k, the input n = k+1 keeps the loop alive past the unrolling. In the capped column the feasible count drops to zero at k = 5. The cap forces termination within 5 iterations, so the PC i < n after 5 iterations is UNSAT, and BMC has verified the assertion for every input that satisfies the assumption.

What the bound costs

When saturation does not happen, BMC trades unbounded soundness for full automation. Within the bound, BMC is sound and complete: every counterexample is real, and no counterexample exists if the engine reports none. Outside the bound, BMC makes no claim.

For programs with naturally bounded loops (firmware, embedded code, smart contracts, parsers with bounded input), saturation usually does happen and BMC is enough. For programs whose loops range over inputs of unknown size, BMC gives bug-finding power but not a correctness proof.

Production BMC tools (CBMC, KLEE, JPF) add three things our engine does not have: heap models for pointer-rich code, environment models for system calls and external interfaces, and search heuristics for picking which branch to explore first. All of them ultimately reason path-by-path within a bound.

The Zune bug

We saw the Zune freeze at the start of Practice. The bug is a missing else. When days reaches exactly 366 in a leap year, the inner conditional falls through without changing days, and the outer loop runs again on identical state. The same model, with assumes in place and the missing branch called out:

zune_days.py

def zune_days(days, is_leap):
    assume(days >= 1)
    assume(days <= 1000)
    assume(is_leap == 0 or is_leap == 1)
    while days > 365:
        if is_leap == 1:
            if days > 366:
                days = days - 366
            # BUG: no else branch
        else:
            days = days - 365
    return days

The stuck control flow

flowchart TD
    accTitle: Zune loop control flow at days = 366, is_leap = 1
    accDescr: When days = 366 and is_leap = 1, the inner if-condition (days > 366) is false. Control falls through the leap-year branch without decrementing days, returns to the outer while loop, finds days > 365 still true, and re-enters the same path indefinitely.
    start([Enter loop body])
    chk1{is_leap == 1?}
    chk2{days > 366?}
    dec1[days = days - 366]
    dec2[days = days - 365]
    fall[fall through with days unchanged]
    loop{days > 365?}
    start --> chk1
    chk1 -->|yes| chk2
    chk1 -->|no| dec2
    chk2 -->|yes| dec1
    chk2 -->|no| fall
    dec1 --> loop
    dec2 --> loop
    fall --> loop
    loop -->|yes| start
    loop -->|no| done([return])

When days = 366 and is_leap = 1, the path through the body is: enter loop, take the leap branch, check days > 366, find it false, fall through with days unchanged. Back at the loop guard, days > 365 is still true, so we re-enter the same body, take the same path, and fall through again. The state is unchanged from one iteration to the next. The loop runs forever.

How BMC catches it

The engine catches most bugs through assertions: at each assert C it asks Z3 whether pc ∧ ¬C is satisfiable. The Zune routine has no assertion. Its bug is non-termination: the loop never reaches return. The engine cannot prove non-termination directly.

Bounded-out paths. The loop-unrolling section introduced the flag: after unrolling to depth k, any path that still satisfies the loop guard at the bottom is flagged bounded_out. The engine ran out of unrollings before the path finished. Earlier those were the "we don't know" cases, the price BMC pays for bug-finding. For Zune they are the result.

A bounded-out path's path condition encodes a precise question: "is there an input that satisfies the assumes and keeps days > 365 true through k iterations?" If Z3 says SAT, the model is an input that drove the loop through k bodies without exiting.

Run at k = 3. The engine produces 67 paths through the depth-3 unrolling. Most return cleanly. Two are bounded-out with satisfiable PCs. For each of those, ask Z3 to minimize days and keep the smallest witness:

opt = Optimize()
for c in path.constraints():
    opt.add(c)
opt.minimize(path.state['days'])
opt.check()

The smallest witness is days = 366, is_leap = 1. That looks like the Zune freeze: the 366th day of a leap year.

Looking across depths. A single k is not enough. The k = 3 witness only says "the loop ran at least 3 rounds on this input." Maybe at k = 4 the loop exits. The saturation table earlier showed exactly that pattern: feasible bounded-out paths fall to zero once k exceeds the longest run. So we sweep k and watch what happens to the witness:

   k | feasible bounded-out | min-days witness
  ---+----------------------+----------------------
   1 |          3           | days = 366, is_leap = 1
   2 |          2           | days = 366, is_leap = 1
   3 |          2           | days = 366, is_leap = 1
   4 |          2           | days = 366, is_leap = 1
   5 |          2           | days = 366, is_leap = 1

The feasible bounded-out count does not decay, and the minimum-days witness does not move. Compare that to the capped sum_to_n saturation table earlier, where feasible bounded-out hit zero at k = 5 and stayed there. The Zune table is the opposite picture: the witness persists at every depth we try. That stability is the fingerprint of a real infinite loop, distinct from a loop that merely runs for many iterations.

Why the witness is stable. Step the loop body once with days = 366 and is_leap = 1. The outer if is_leap == 1 fires. The inner if days > 366 is false: 366 is not greater than 366. With no else branch on the inner if, the body falls through. days is unchanged. Back at the loop guard, days > 365 is still true. The state at the top of iteration 2 matches the state at the top of iteration 1, byte for byte. Z3 sees the same PC at every depth, so it returns the same witness.

The model of the firmware is seven lines long.

Catching it with an assert

The bounded-out witness is what you reach for when the program has no assertion. We can also inject an assertion that turns non-termination into a normal assertion failure. The standard pattern is a progress assertion: record some measure at the top of the loop body and assert at the bottom that the measure strictly decreased.

For Zune, days is the natural measure. The loop guard while days > 365 only exits when days has been driven low enough, so every body must reduce days. Add days_old = days at the top and assert days < days_old at the bottom:

zune_with_progress.py

def zune_days(days, is_leap):
    assume(days >= 1)
    assume(days <= 1000)
    assume(is_leap == 0 or is_leap == 1)
    while days > 365:
        days_old = days
        if is_leap == 1:
            if days > 366:
                days = days - 366
        else:
            days = days - 365
        assert days < days_old      # progress: each iteration must reduce days
    return days

Run BMC at k = 1. The engine reports an assertion violation on the assert line with counterexample days = 366, is_leap = 1. This is the canonical Zune input, found at minimum depth through the engine's normal assertion-checking pathway.

Why this works as a termination check. A strictly decreasing measure with a lower bound forces termination. days is bounded below by 365 (the loop exits at days <= 365), and the progress assert claims days strictly decreases on every iteration. If both hold for every input, the loop terminates. On the bug input, the body falls through without changing days, the assert fails, and Z3 returns the input that triggered the failure.

The progress assert turns non-termination into an ordinary assertion failure, so the engine's main pathway handles it. Both approaches arrive at the same days = 366, is_leap = 1 input. The variant pattern generalizes: any loop with a well-founded measure admits a progress assertion that catches non-termination as a normal assertion violation.

Tools in this space

The engine we built is a small version of a real category of tools. A few production examples worth knowing for further reading:

Category	Examples	Used for
Bounded SE / BMC	CBMC, KLEE, JPF	Bug-finding in C, LLVM bitcode, and Java with heap models and search heuristics far beyond ours
Concolic testing	SAGE, CUTE, DART	Run the program concretely, collect path conditions, negate them to drive the next concrete run. SAGE found roughly a third of the security bugs in Windows 7 in the mid-2000s
Static analysis	Astrée, Infer	Used in Airbus avionics (Astrée) and across Facebook's codebase (Infer)
Interactive verification	Rocq, Lean, Dafny, Why3, F*	Used in CompCert (verified C compiler) and seL4 (verified microkernel)