Practice
We walk a Hoare-logic hand-proof, derive the WP rules that mechanize it, build the loop-cut transformation that handles while, and verify sum_to_n for every input with one invariant annotation.
Where L07 left off
L07 unrolled loops and asked Z3 about each path within a bound. The engine checked that no assertion could fail within k iterations. The bound was honest: every counterexample was real, and no counterexample existed if the engine reported none. Beyond k, BMC made no claim.
For programs with naturally bounded loops, that was enough. The capped sum_to_n with assume(n <= 5) saturated at k = 5: no input could drive the loop past five iterations, so the unrolling covered every behavior. For unbounded n, no finite k suffices.
We verify the unbounded version with a loop invariant: one predicate per loop, capturing what is true at every iteration. Once a loop has an invariant, the engine verifies the program for every input, regardless of how many times the loop runs.
Euclid's algorithm
Euclid's algorithm computes the greatest common divisor of a and b by repeated subtraction.
def gcd(a, b):
while a != b:
if a > b:
a = a - b
else:
b = b - a
return a
Faster variants use modulo, but the subtraction form has the simplest loop body to reason about. Trace it on three inputs:
| call | (a, b) at each step |
returns |
|---|---|---|
gcd(12, 8) |
(12, 8) → (4, 8) → (4, 4) | 4 |
gcd(15, 6) |
(15, 6) → (9, 6) → (3, 6) → (3, 3) | 3 |
gcd(7, 5) |
(7, 5) → (2, 5) → (2, 3) → (2, 1) → (1, 1) | 1 |
Each call terminates and returns what we would call the gcd. For positive a and b, three properties should hold:
- The loop exits with
a == band a positive result. - The result divides both the original
aand the originalb. - The result is the largest such divisor.
Together these three properties characterize the gcd.
L07's BMC could verify this for inputs that drive the loop through at most k iterations. It could not give a claim about every positive a, b. For that we need a property that holds at every iteration, regardless of how many times the loop runs.
When a > b, the body replaces (a, b) with (a - b, b). What is preserved between the old and new state?
Suppose d divides a and d divides b. Then d divides a - b, since the difference of two multiples of d is itself a multiple of d. Going the other way: if d divides a - b and d divides b, then d divides (a - b) + b = a. So the common divisors of (a, b) are exactly the common divisors of (a - b, b), and their largest member is the same:
The other branch (b := b - a when a < b) is symmetric. So gcd(a, b) does not change as the loop runs. This is the loop's invariant.
At loop exit, a == b, and . Combined with the invariant, the returned value is . We have an informal proof that the algorithm computes the gcd of the original inputs.
The rest of Practice builds the formal machinery that turns this kind of informal argument into a mechanically checkable proof: Hoare logic, the weakest-precondition rules, and the loop-cut transformation that gives the engine something finite to work with.
Hoare logic by hand
A Hoare triple is a claim about partial correctness: if the program runs from a state satisfying the precondition and terminates, then the resulting state satisfies the postcondition . Termination itself is not part of the claim.
Hoare logic gives one inference rule per statement form. A proof of is a tree whose leaves are axiom instances and whose root concludes the triple. The six rules for mini IMP:
⊢ {P} S₁ {R} ⊢ {R} S₂ {Q}
⊢ {P} skip {P} ─────────────────────────────
⊢ {P} S₁; S₂ {Q}
⊢ {Q[x ↦ E]} x := E {Q} ⊢ {P ∧ C} S₁ {Q} ⊢ {P ∧ ¬C} S₂ {Q}
────────────────────────────────────
⊢ {P} if C then S₁ else S₂ {Q}
⊢ {I ∧ C} S {I} P ⇒ P′ ⊢ {P′} S {Q′} Q′ ⇒ Q
───────────────────────────── ───────────────────────────────────
⊢ {I} while C do S {I ∧ ¬C} ⊢ {P} S {Q}
The two axiom-style rules (skip and assignment) have no premises. The four others have premises above the bar and the conclusion below.
The rule of consequence has three premises: two implications and one Hoare triple. It lets us strengthen a precondition or weaken a postcondition before applying another rule. In practice, almost every proof uses consequence at multiple steps to reshape an assertion into something the next rule's pattern accepts.
The while rule requires a loop invariant : a predicate that holds before the loop runs, stays true across every body iteration, and combines with at loop exit to imply the postcondition.
A worked proof
We claim . The invariant is .
{x ≤ n} // precondition
while (x < n) do
{x ≤ n ∧ x < n} // while rule entry: invariant + guard
{x + 1 ≤ n} // consequence (x ≤ n ∧ x < n ⇒ x + 1 ≤ n)
x := x + 1
{x ≤ n} // assignment rule: (x ≤ n)[x ↦ x+1] = x+1 ≤ n
{x ≤ n ∧ ¬(x < n)} // while rule conclusion
{x = n} // consequence (x ≤ n ∧ x ≥ n ⇒ x = n)
The annotations name which rule justifies each line. They do not name a direction. Reading the proof top to bottom is forward reasoning: from the precondition, derive what holds after each step. L07's SE engine did this mechanically, producing one strongest postcondition per path. Reading bottom to top reverses the direction: from the postcondition, work out what must have held before each step. L08's engine implements that direction, producing the weakest precondition of the program.
Either reading reaches the same six annotated steps. Walking top to bottom: the precondition is , and the while rule asks us to prove that the invariant is preserved by one body iteration, given the invariant and the guard at entry.
The consequence rule on the next line reshapes the assertion. The assignment rule's pattern is:
For the step with desired postcondition :
- , the right-hand side of the assignment.
- , the postcondition we want.
- , the substitution.
The required precondition is , exactly the reshaped assertion above.
The while rule concludes the loop with , and a final consequence step simplifies that to .
Each rule application above is mechanical. What requires human judgment is the invariant itself and the choice of when to apply consequence. The rest of Practice automates the rule applications and asks the engineer to supply only the invariant.
Toward VC Gen
The mechanization is a verification condition generator (VC Gen). It compiles an annotated source program into a small batch of Z3 queries in three passes:
- Loop-cut the source into an intermediate verification language (IVL): the same mini IMP grammar with no
whileloops, plus the three IVL primitivesassert,assume, andhavoc. - Walk the IVL backward with the WP rules to produce one verification condition (VC) formula per Hoare obligation.
- Dispatch each VC to Z3. If Z3 reports every VC valid, the original Hoare triple is valid.
graph TD
accTitle: VC Gen pipeline from annotated source to Z3
accDescr: An annotated source program is compiled by loop-cut into an intermediate verification language with no loops. The IVL is walked backward by the WP rules to produce one verification condition formula per Hoare obligation. Z3 then checks each verification condition.
src["annotated source
mini IMP + invariant"]
ivl["IVL
loop-free
(assert, assume, havoc, …)"]
vc["verification conditions
one per obligation"]
z3["Z3
VALID or counterexample"]
src -->|loop-cut| ivl
ivl -->|backward WP| vc
vc -->|check| z3The IVL primitives are the same three we already know from L07. L07's engine walked them forward and produced one strongest postcondition per path. L08's engine reverses the direction, producing one weakest precondition per obligation. What changes between L07 and L08 is the direction of the walk, not the meaning of the primitives.
Compared to L07's BMC, we supply one invariant per loop and get soundness on every input, with no upper bound on iterations. The pipeline runs mechanically downstream of the invariant.
Weakest precondition: the rules
The weakest precondition of a statement relative to a postcondition , written , is the weakest predicate such that holds. "Weakest" means most permissive: any other valid precondition implies it. If reduces to , every state satisfies after running .
The rules for the core loop-free constructs:
wp(skip, Q) = Q
wp(x := E, Q) = Q[x ↦ E]
wp(S₁; S₂, Q) = wp(S₁, wp(S₂, Q))
wp(if C then S₁ else S₂, Q) = (C → wp(S₁, Q)) ∧ (¬C → wp(S₂, Q))
Computing means substituting for everywhere in . A one-line example: to compute with the postcondition , substitute for in :
To land at after the assignment, we needed to start at .
The sequence rule composes: computes the WP through first, then uses that result as the postcondition for . The if rule splits into two branch-guarded WPs and conjoins them. Both directly mirror the corresponding Hoare inference rules from the previous section.
Walking WP on max_two
def max_two(x, y):
if x > y:
m = x
else:
m = y
assert m >= x
assert m >= y
The two assertions at the end are the postcondition we want at the end of the program:
We walk WP backward through the if-else.
Then branch. Apply the assignment rule to :
Since is always true, this simplifies to .
Else branch. Apply the assignment rule to :
Since is always true, this simplifies to .
The if rule combines the two branch WPs with their guards:
Each conjunct is a tautology. holds because implies . The negation is , which is the same as . So is also true.
The precondition is therefore . max_two is correct on every input.
graph TD
accTitle: WP walk backward through max_two
accDescr: The postcondition Q at the end of the program splits via the assignment rule into a branch WP on the then side and a branch WP on the else side. The if rule combines both branches into a single program precondition, which reduces to true.
Q["postcondition
(m ≥ x) ∧ (m ≥ y)"]
then_wp["wp through m := x
x ≥ y"]
else_wp["wp through m := y
y ≥ x"]
pre["program precondition
true"]
Q -->|m ↦ x| then_wp
Q -->|m ↦ y| else_wp
then_wp -->|if rule| pre
else_wp -->|if rule| preWe applied three rules backward through the program: the assignment rule twice (once per branch) and the if rule once. Each step was a substitution or a guard-conditioned conjunction. The engine performs exactly this walk mechanically.
Cutting the loop
The four rules above handle skip, assignment, sequence, and if. For while, no analogous rule exists:
A while loop can iterate any number of times. To produce a finite WP for it, we need a formula that holds regardless of iteration count. The loop invariant is exactly that formula.
L07 hit the same wall going forward: SP for while was not finitely expressible either, and L07 used bounded unrolling to dodge the question. We use the invariant to give the loop a finite WP.
The transformation
For a loop while C do S annotated with invariant I, the loop-cut transformation produces this IVL fragment:
assert I // invariant holds on entry
havoc x ; havoc y ; ... // forget current values of loop targets
assume I // arbitrary iteration that satisfies the invariant
if C
then
S // run one body iteration
assert I // invariant is preserved
assume false // cut analysis here
else
skip // loop exited; continue past
The control flow of the cut form:
flowchart TD
accTitle: Loop-cut transformation: the cut form's control flow
accDescr: The cut form's control flow goes through an assert of the invariant on entry, a havoc of all loop targets to forget the current state, an assume of the invariant to fix an arbitrary iteration, and a conditional on the loop guard. The then-branch runs the body, asserts the invariant is preserved, and assumes false to cut the analysis. The else-branch continues past the loop with the invariant and the negated guard known.
entry([state on entry])
a1[assert I]
h[havoc loop targets]
a2[assume I]
c{C?}
body[run body S]
a3[assert I]
dead[assume false]
exit([continue past loop
¬C ∧ I known])
entry --> a1
a1 --> h
h --> a2
a2 --> c
c -->|true| body
c -->|false| exit
body --> a3
a3 --> dead
classDef obligation fill:#d9f5c5,stroke:#5a8a3a
class a1,a3 obligationThe cut form is what the engine analyzes, not what the program executes. It is a fixed-size IVL fragment that captures what the engine needs to check to certify the loop. Walking each piece:
assert Iat the top requires the invariant to hold when we first reach the loop. This is the loop's entry obligation.havoc x; havoc y; ...runs once for every variable the body assigns. Whatever the current concrete values are, we forget them. We will reason about an arbitrary iteration, not a specific one.assume Iafter the havocs. The invariant promises something about the post-havoc state. We get back exactly whatIsays, no more.if C ... else skipsplits the analysis on the loop guard. The then-branch covers the case where another iteration would run. The else-branch covers the case where the loop has exited.- Then-branch: the engine examines one body iteration symbolically. After the body,
assert Irequires that the body preserved the invariant. This is the preservation obligation. Theassume falsethat follows cuts the analysis here: further iterations are already captured by the invariant, so the engine does not chase them. - Else-branch:
skip, representing the analysis path where the loop has exited. The state known here is¬C ∧ I(loop exited, invariant still holds). The program's postcondition, asserted downstream, must follow from this state. This third check is the sufficiency obligation: the invariant must be sufficient, combined with the loop-exit condition, to imply the postcondition.
The three obligations (entry, preservation, sufficiency) are what the engine reports separately when it verifies a program with a loop.
Three more WP rules
The cut form uses assert, assume, and havoc. Their WP rules complete the table:
wp(assert C, Q) = C ∧ Q
wp(assume C, Q) = C → Q
wp(havoc x, Q) = ∀x. Q
assert C requires both C here and Q downstream. assume C weakens what must be proven from Q to C → Q: we only need to prove Q on paths where C actually holds. havoc x strips knowledge of x, so to guarantee Q after the havoc we need Q to hold for every possible x. That is what the universal quantifier in the rule captures.
The universal quantifier from havoc is the same quantifier we saw in L06 with ForAll. There it was a feature to use directly. Here it shows up naturally because we are summarizing every possible state of the havoced variable.
Each piece of the cut form has a known WP rule. Walking the cut form backward produces a finite WP for the original loop.
Verifying loops
We run the engine on five programs. Each run takes an annotated function, applies the loop-cut, walks the IVL backward, and produces one VC per obligation. Z3 dispatches each VC, and the engine reports per-obligation results.
x_to_n
The program we hand-proved earlier, now mechanized:
def x_to_n(x, n):
assume(x <= n)
while x < n:
invariant(x <= n)
x = x + 1
assert x == n
Engine output:
entry : VALID
preserved : VALID
sufficiency : VALID
Three obligations, all valid. The hand-proof we walked earlier is the same proof the engine discharged in microseconds.
sum_to_n
The program L07 verified with BMC up to a chosen depth k. With an invariant, the engine verifies it for every input.
def sum_to_n(n):
assume(n >= 0)
s = 0
i = 0
while i < n:
invariant(s == i * (i - 1) // 2 and 0 <= i and i <= n)
s = s + i
i = i + 1
assert s == n * (n - 1) // 2
The invariant says two things: equals the partial sum so far (), and stays in the loop's range. Engine output:
entry : VALID
preserved : VALID
sufficiency : VALID
What each obligation said concretely:
- Entry. From the assumed precondition and the initial state , the invariant holds: and .
- Preservation. From an arbitrary state satisfying the invariant and the loop guard , the body produces and . The invariant on the new state, , expands to . Using , the right side is , matching. The range follows from and .
- Sufficiency. From the invariant and the loop exit , the constraints and force , so , which is the postcondition.
Each obligation reduces to a small algebraic check. L07's BMC needed one query per path through a depth- unrolling. With an invariant, three queries cover every iteration count.
sum_to_n with a buggy body
Change the body to s = s + i + 1. The bug is L07's familiar off-by-one.
def sum_to_n_buggy(n):
assume(n >= 0)
s = 0
i = 0
while i < n:
invariant(s == i * (i - 1) // 2 and 0 <= i and i <= n)
s = s + i + 1 # bug
i = i + 1
assert s == n * (n - 1) // 2
Engine output:
entry : VALID
preserved : NOT VALID
counterexample: i = 0, n = 1, s = 0
sufficiency : VALID
The preservation obligation fails. Walking the counterexample step by step:
| step | i | s | n | invariant check |
|---|---|---|---|---|
| body entry | 0 | 0 | 1 | and : ✓ |
after s = s + i + 1 |
0 | 1 | 1 | (intermediate state, not checked) |
after i = i + 1 |
1 | 1 | 1 | : ✗ |
The state at body entry satisfies the invariant. After the buggy body, and . The invariant claims , but actually equals . Preservation is broken.
flowchart TD
accTitle: State walk showing preservation failure for sum_to_n with a buggy body
accDescr: At body entry with i=0, s=0, n=1, the invariant 0 equals 0 times negative one divided by 2 holds. After running the buggy body s equals s plus i plus 1 followed by i equals i plus 1, the state becomes i=1, s=1, n=1. The invariant claim is now 1 equals 1 times 0 divided by 2 equals 0, which is false.
entry["state at body entry
i = 0, s = 0, n = 1
invariant 0 = 0·(-1)/2 = 0: ✓"]
after["state after body
i = 1, s = 1, n = 1
invariant claim: 1 = 1·0/2 = 0: ✗"]
entry -->|run buggy body| after
classDef bug fill:#f5c7c5,stroke:#a04040
class after bugEntry and sufficiency still pass on this program. They check what the invariant promises at the loop's boundary. The body's interior is invisible to them. The bug is exactly where the body fails to preserve the invariant, and per-obligation reporting localizes it there.
sum_to_n with a too-weak invariant
The previous run was a buggy program with a correct invariant. Now the program is correct, but the invariant is too weak to imply the postcondition.
def sum_to_n_weak(n):
assume(n >= 0)
s = 0
i = 0
while i < n:
invariant(0 <= i and i <= n) # too weak: says nothing about s
s = s + i
i = i + 1
assert s == n * (n - 1) // 2
The invariant has been weakened: it only tracks the range of , dropping the relationship between and . Engine output:
entry : VALID
preserved : VALID
sufficiency : NOT VALID
counterexample: i = 6, n = 6, s = 16
Sufficiency fails. The counterexample is a state where the invariant holds ( ✓) and the loop has exited ( so ), but does not equal the postcondition .
The counterexample is not a real execution of the program. If we actually ran sum_to_n_weak(6) on the input, the loop would compute , and the assertion would pass. Z3 picked because nothing in the weak invariant rules it out. After the loop-cut, the invariant is everything the engine knows about at the exit. If the invariant does not pin down , the engine cannot prove the postcondition.
This is the false-alarm direction of the WP theorem: is valid if , but the converse does not hold. A weak invariant gives a weak , and Z3 reports failure even when the program is correct.
The engineering fix is to strengthen the invariant, not change the program. Adding back to the invariant turns the weak version into the version from the previous run, which verifies. Picking a strong-enough invariant is the engineer's job. The engine reports honestly whether the invariant supplied so far is enough.
Back to Euclid
The natural loop invariant for Euclid's algorithm is . The engine has no built-in gcd function and no way to declare one with the axioms it would need. We encode the same content using ghost variables: extra integer variables whose only job is to track relationships between the loop's current state and its initial values.
We capture the initial values in ghost_a and ghost_b, set once before the loop and never modified inside it. Then we maintain the relationships ghost_a == p * a + q * b and ghost_b == r * a + s * b. Initially p = 1, q = 0, r = 0, s = 1, so ghost_a == a and ghost_b == b. Each subtraction in the loop body updates the four coefficients to keep both relationships true.
def euclid_sub(a, b):
assume(a > 0)
assume(b > 0)
ghost_a = a
ghost_b = b
p = 1
q = 0
r = 0
s = 1
while a != b:
invariant(a > 0 and b > 0
and ghost_a == p * a + q * b
and ghost_b == r * a + s * b)
if a > b:
a = a - b
q = p + q
s = r + s
else:
b = b - a
p = p + q
r = r + s
assert (a == b and a > 0
and ghost_a == (p + q) * a
and ghost_b == (r + s) * a)
Engine output:
entry : VALID
preserved : VALID
preserved : VALID
sufficiency : VALID
The four obligations break down as one entry, two preservations (one per branch of the inner if), and one sufficiency. At loop exit a == b, so the sufficiency assert reduces to ghost_a == (p + q) * a and ghost_b == (r + s) * a. The original inputs are both integer multiples of the final value of a. The result is a common divisor of both inputs, the second of the three correctness properties from the top of Practice.
The third property, that the result is the largest common divisor, is not verified by this run. Proving it takes a separate argument parameterized by an arbitrary common divisor d: show that the loop preserves "d divides a" and "d divides b" through the subtraction, so d divides the result at exit. That argument runs as its own demo file (euclid_sub_max.py) and verifies cleanly. Together the two demos characterize the result as the gcd: it is a common divisor of the inputs, and every common divisor of the inputs divides it.
Summary
| program | entry | preserved | sufficiency |
|---|---|---|---|
x_to_n |
VALID | VALID | VALID |
sum_to_n |
VALID | VALID | VALID |
sum_to_n with buggy body |
VALID | NOT VALID | VALID |
sum_to_n with weak invariant |
VALID | VALID | NOT VALID |
euclid_sub |
VALID | VALID | VALID |
A small batch of Z3 queries per program decides correctness for every input. Last week BMC at depth k covered inputs that exit within k iterations. With one invariant per loop, this batch covers every iteration count regardless of how long the loop runs, and per-obligation reporting localizes whichever obligation fails.
Tools in this space
Production tools that implement exactly this VC Gen architecture include Dafny, Why3, F*, and Verus. Each adds richer source languages, more expressive assertion logics, and dedicated tooling for editing and managing annotations on top of the VC Gen we just built.
| Tool | Notes |
|---|---|
| Dafny | Auto-active verification language; the canonical example of this VC Gen architecture. Compiles to Boogie (the IVL most WP-based VC Gen tools share). Used at AWS and Microsoft. |
| Why3 | Verification platform with multiple SMT backends. Used as the verification engine for Frama-C/WP (C) and SPARK (Ada). |
| F* | Language with computational effects and WP-style verification. Used in Project Everest for the verified TLS stack and EverCrypt cryptographic library. |
| Verus | Rust subset verified by an SMT solver. Adds memory-and-functional correctness proofs to Rust code. |
L07 listed Dafny under interactive verification. Dafny's verifier is exactly this VC Gen: a compiler from annotated source through an IVL to a small batch of Z3 queries. The trade-off stayed consistent across every example: one invariant per loop, a small fixed batch of Z3 queries, unbounded soundness.