Skip to main content
  (Week 8)

Practice

We walk a Hoare-logic hand-proof, derive the WP rules that mechanize it, build the loop-cut transformation that handles while, and verify sum_to_n for every input with one invariant annotation.

Where L07 left off

L07 unrolled loops and asked Z3 about each path within a bound. The engine checked that no assertion could fail within k iterations. The bound was honest: every counterexample was real, and no counterexample existed if the engine reported none. Beyond k, BMC made no claim.

For programs with naturally bounded loops, that was enough. The capped sum_to_n with assume(n <= 5) saturated at k = 5: no input could drive the loop past five iterations, so the unrolling covered every behavior. For unbounded n, no finite k suffices.

We verify the unbounded version with a loop invariant: one predicate per loop, capturing what is true at every iteration. Once a loop has an invariant, the engine verifies the program for every input, regardless of how many times the loop runs.

Euclid's algorithm

Euclid's algorithm computes the greatest common divisor of a and b by repeated subtraction.

gcd.py
def gcd(a, b):
    while a != b:
        if a > b:
            a = a - b
        else:
            b = b - a
    return a

Faster variants use modulo, but the subtraction form has the simplest loop body to reason about. Trace it on three inputs:

call (a, b) at each step returns
gcd(12, 8) (12, 8) → (4, 8) → (4, 4) 4
gcd(15, 6) (15, 6) → (9, 6) → (3, 6) → (3, 3) 3
gcd(7, 5) (7, 5) → (2, 5) → (2, 3) → (2, 1) → (1, 1) 1

Each call terminates and returns what we would call the gcd. For positive a and b, three properties should hold:

Together these three properties characterize the gcd.

L07's BMC could verify this for inputs that drive the loop through at most k iterations. It could not give a claim about every positive a, b. For that we need a property that holds at every iteration, regardless of how many times the loop runs.

When a > b, the body replaces (a, b) with (a - b, b). What is preserved between the old and new state?

Suppose d divides a and d divides b. Then d divides a - b, since the difference of two multiples of d is itself a multiple of d. Going the other way: if d divides a - b and d divides b, then d divides (a - b) + b = a. So the common divisors of (a, b) are exactly the common divisors of (a - b, b), and their largest member is the same:

gcd(a,b)=gcd(ab,b)

The other branch (b := b - a when a < b) is symmetric. So gcd(a, b) does not change as the loop runs. This is the loop's invariant.

At loop exit, a == b, and gcd(a,a)=a. Combined with the invariant, the returned value is gcd(aorig,borig). We have an informal proof that the algorithm computes the gcd of the original inputs.

The rest of Practice builds the formal machinery that turns this kind of informal argument into a mechanically checkable proof: Hoare logic, the weakest-precondition rules, and the loop-cut transformation that gives the engine something finite to work with.

Hoare logic by hand

A Hoare triple {P} S {Q} is a claim about partial correctness: if the program S runs from a state satisfying the precondition P and terminates, then the resulting state satisfies the postcondition Q. Termination itself is not part of the claim.

Hoare logic gives one inference rule per statement form. A proof of {P} S {Q} is a tree whose leaves are axiom instances and whose root concludes the triple. The six rules for mini IMP:

                                              ⊢ {P} S₁ {R}   ⊢ {R} S₂ {Q}
       ⊢ {P} skip {P}                         ─────────────────────────────
                                                    ⊢ {P} S₁; S₂ {Q}


       ⊢ {Q[x ↦ E]} x := E {Q}                ⊢ {P ∧ C} S₁ {Q}   ⊢ {P ∧ ¬C} S₂ {Q}
                                              ────────────────────────────────────
                                                 ⊢ {P} if C then S₁ else S₂ {Q}


       ⊢ {I ∧ C} S {I}                        P ⇒ P′   ⊢ {P′} S {Q′}   Q′ ⇒ Q
       ─────────────────────────────          ───────────────────────────────────
       ⊢ {I} while C do S {I ∧ ¬C}                       ⊢ {P} S {Q}

The two axiom-style rules (skip and assignment) have no premises. The four others have premises above the bar and the conclusion below.

The rule of consequence has three premises: two implications and one Hoare triple. It lets us strengthen a precondition or weaken a postcondition before applying another rule. In practice, almost every proof uses consequence at multiple steps to reshape an assertion into something the next rule's pattern accepts.

The while rule requires a loop invariant I: a predicate that holds before the loop runs, stays true across every body iteration, and combines with ¬C at loop exit to imply the postcondition.

A worked proof

We claim {xn} while x<n do x:=x+1 {x=n}. The invariant is xn.

{x ≤ n}                          // precondition
while (x < n) do
  {x ≤ n  ∧  x < n}              // while rule entry: invariant + guard
  {x + 1 ≤ n}                    // consequence (x ≤ n ∧ x < n  ⇒  x + 1 ≤ n)
  x := x + 1
  {x ≤ n}                        // assignment rule: (x ≤ n)[x ↦ x+1] = x+1 ≤ n
{x ≤ n  ∧  ¬(x < n)}              // while rule conclusion
{x = n}                          // consequence (x ≤ n ∧ x ≥ n  ⇒  x = n)

The annotations name which rule justifies each line. They do not name a direction. Reading the proof top to bottom is forward reasoning: from the precondition, derive what holds after each step. L07's SE engine did this mechanically, producing one strongest postcondition per path. Reading bottom to top reverses the direction: from the postcondition, work out what must have held before each step. L08's engine implements that direction, producing the weakest precondition of the program.

Either reading reaches the same six annotated steps. Walking top to bottom: the precondition is xn, and the while rule asks us to prove that the invariant is preserved by one body iteration, given the invariant and the guard at entry.

The consequence rule on the next line reshapes the assertion. The assignment rule's pattern is:

{Q[xE]} x:=E {Q}

For the step x:=x+1 with desired postcondition xn:

The required precondition is x+1n, exactly the reshaped assertion above.

The while rule concludes the loop with xn¬(x<n), and a final consequence step simplifies that to x=n.

Each rule application above is mechanical. What requires human judgment is the invariant itself and the choice of when to apply consequence. The rest of Practice automates the rule applications and asks the engineer to supply only the invariant.

Toward VC Gen

The mechanization is a verification condition generator (VC Gen). It compiles an annotated source program into a small batch of Z3 queries in three passes:

  1. Loop-cut the source into an intermediate verification language (IVL): the same mini IMP grammar with no while loops, plus the three IVL primitives assert, assume, and havoc.
  2. Walk the IVL backward with the WP rules to produce one verification condition (VC) formula per Hoare obligation.
  3. Dispatch each VC to Z3. If Z3 reports every VC valid, the original Hoare triple is valid.
graph TD
    accTitle: VC Gen pipeline from annotated source to Z3
    accDescr: An annotated source program is compiled by loop-cut into an intermediate verification language with no loops. The IVL is walked backward by the WP rules to produce one verification condition formula per Hoare obligation. Z3 then checks each verification condition.
    src["annotated source
mini IMP + invariant"] ivl["IVL
loop-free
(assert, assume, havoc, …)"] vc["verification conditions
one per obligation"] z3["Z3
VALID or counterexample"] src -->|loop-cut| ivl ivl -->|backward WP| vc vc -->|check| z3

The IVL primitives are the same three we already know from L07. L07's engine walked them forward and produced one strongest postcondition per path. L08's engine reverses the direction, producing one weakest precondition per obligation. What changes between L07 and L08 is the direction of the walk, not the meaning of the primitives.

Compared to L07's BMC, we supply one invariant per loop and get soundness on every input, with no upper bound on iterations. The pipeline runs mechanically downstream of the invariant.

Weakest precondition: the rules

The weakest precondition of a statement S relative to a postcondition Q, written wp(S,Q), is the weakest predicate P such that {P} S {Q} holds. "Weakest" means most permissive: any other valid precondition implies it. If wp(S,Q) reduces to true, every state satisfies Q after running S.

The rules for the core loop-free constructs:

wp(skip, Q)                       = Q

wp(x := E, Q)                     = Q[x ↦ E]

wp(S₁; S₂, Q)                     = wp(S₁, wp(S₂, Q))

wp(if C then S₁ else S₂, Q)       = (C → wp(S₁, Q)) ∧ (¬C → wp(S₂, Q))

Computing wp(x:=E,Q) means substituting E for x everywhere in Q. A one-line example: to compute wp(x:=x+1, Q) with Q the postcondition x=6, substitute x+1 for x in Q:

wp(x:=x+1, x=6)=(x=6)[xx+1]=(x+1=6)x=5

To land at x=6 after the assignment, we needed to start at x=5.

The sequence rule composes: wp(S1;S2,Q) computes the WP through S2 first, then uses that result as the postcondition for S1. The if rule splits Q into two branch-guarded WPs and conjoins them. Both directly mirror the corresponding Hoare inference rules from the previous section.

Walking WP on max_two

max_two.py
def max_two(x, y):
    if x > y:
        m = x
    else:
        m = y
    assert m >= x
    assert m >= y

The two assertions at the end are the postcondition we want at the end of the program:

Q=(mx)(my)

We walk WP backward through the if-else.

Then branch. Apply the assignment rule to m:=x:

wp(m:=x, Q)=Q[mx]=(xx)(xy)

Since xx is always true, this simplifies to xy.

Else branch. Apply the assignment rule to m:=y:

wp(m:=y, Q)=Q[my]=(yx)(yy)

Since yy is always true, this simplifies to yx.

The if rule combines the two branch WPs with their guards:

wp(if x>y then m:=x else m:=y, Q)=(x>yxy)(¬(x>y)yx)

Each conjunct is a tautology. x>yxy holds because > implies . The negation ¬(x>y) is xy, which is the same as yx. So ¬(x>y)yx is also true.

The precondition is therefore true. max_two is correct on every input.

graph TD
    accTitle: WP walk backward through max_two
    accDescr: The postcondition Q at the end of the program splits via the assignment rule into a branch WP on the then side and a branch WP on the else side. The if rule combines both branches into a single program precondition, which reduces to true.
    Q["postcondition
(m ≥ x) ∧ (m ≥ y)"] then_wp["wp through m := x
x ≥ y"] else_wp["wp through m := y
y ≥ x"] pre["program precondition
true"] Q -->|m ↦ x| then_wp Q -->|m ↦ y| else_wp then_wp -->|if rule| pre else_wp -->|if rule| pre

We applied three rules backward through the program: the assignment rule twice (once per branch) and the if rule once. Each step was a substitution or a guard-conditioned conjunction. The engine performs exactly this walk mechanically.

Cutting the loop

The four rules above handle skip, assignment, sequence, and if. For while, no analogous rule exists:

wp(while C do S,Q)=?

A while loop can iterate any number of times. To produce a finite WP for it, we need a formula that holds regardless of iteration count. The loop invariant is exactly that formula.

L07 hit the same wall going forward: SP for while was not finitely expressible either, and L07 used bounded unrolling to dodge the question. We use the invariant to give the loop a finite WP.

The transformation

For a loop while C do S annotated with invariant I, the loop-cut transformation produces this IVL fragment:

assert I                  // invariant holds on entry
havoc x ; havoc y ; ...   // forget current values of loop targets
assume I                  // arbitrary iteration that satisfies the invariant
if C
  then
    S                     // run one body iteration
    assert I              // invariant is preserved
    assume false          // cut analysis here
  else
    skip                  // loop exited; continue past

The control flow of the cut form:

flowchart TD
    accTitle: Loop-cut transformation: the cut form's control flow
    accDescr: The cut form's control flow goes through an assert of the invariant on entry, a havoc of all loop targets to forget the current state, an assume of the invariant to fix an arbitrary iteration, and a conditional on the loop guard. The then-branch runs the body, asserts the invariant is preserved, and assumes false to cut the analysis. The else-branch continues past the loop with the invariant and the negated guard known.
    entry([state on entry])
    a1[assert I]
    h[havoc loop targets]
    a2[assume I]
    c{C?}
    body[run body S]
    a3[assert I]
    dead[assume false]
    exit([continue past loop
¬C ∧ I known]) entry --> a1 a1 --> h h --> a2 a2 --> c c -->|true| body c -->|false| exit body --> a3 a3 --> dead classDef obligation fill:#d9f5c5,stroke:#5a8a3a class a1,a3 obligation

The cut form is what the engine analyzes, not what the program executes. It is a fixed-size IVL fragment that captures what the engine needs to check to certify the loop. Walking each piece:

The three obligations (entry, preservation, sufficiency) are what the engine reports separately when it verifies a program with a loop.

Three more WP rules

The cut form uses assert, assume, and havoc. Their WP rules complete the table:

wp(assert C, Q)                   = C ∧ Q
wp(assume C, Q)                   = C → Q
wp(havoc x, Q)                    = ∀x. Q

assert C requires both C here and Q downstream. assume C weakens what must be proven from Q to C → Q: we only need to prove Q on paths where C actually holds. havoc x strips knowledge of x, so to guarantee Q after the havoc we need Q to hold for every possible x. That is what the universal quantifier in the rule captures.

The universal quantifier from havoc is the same quantifier we saw in L06 with ForAll. There it was a feature to use directly. Here it shows up naturally because we are summarizing every possible state of the havoced variable.

Each piece of the cut form has a known WP rule. Walking the cut form backward produces a finite WP for the original loop.

Verifying loops

We run the engine on five programs. Each run takes an annotated function, applies the loop-cut, walks the IVL backward, and produces one VC per obligation. Z3 dispatches each VC, and the engine reports per-obligation results.

x_to_n

The program we hand-proved earlier, now mechanized:

x_to_n.py
def x_to_n(x, n):
    assume(x <= n)
    while x < n:
        invariant(x <= n)
        x = x + 1
    assert x == n

Engine output:

  entry       : VALID
  preserved   : VALID
  sufficiency : VALID

Three obligations, all valid. The hand-proof we walked earlier is the same proof the engine discharged in microseconds.

sum_to_n

The program L07 verified with BMC up to a chosen depth k. With an invariant, the engine verifies it for every input.

sum_to_n.py
def sum_to_n(n):
    assume(n >= 0)
    s = 0
    i = 0
    while i < n:
        invariant(s == i * (i - 1) // 2 and 0 <= i and i <= n)
        s = s + i
        i = i + 1
    assert s == n * (n - 1) // 2

The invariant says two things: s equals the partial sum so far (i(i1)/2), and i stays in the loop's range. Engine output:

  entry       : VALID
  preserved   : VALID
  sufficiency : VALID

What each obligation said concretely:

Each obligation reduces to a small algebraic check. L07's BMC needed one query per path through a depth-k unrolling. With an invariant, three queries cover every iteration count.

sum_to_n with a buggy body

Change the body to s = s + i + 1. The bug is L07's familiar off-by-one.

sum_to_n_buggy.py
def sum_to_n_buggy(n):
    assume(n >= 0)
    s = 0
    i = 0
    while i < n:
        invariant(s == i * (i - 1) // 2 and 0 <= i and i <= n)
        s = s + i + 1          # bug
        i = i + 1
    assert s == n * (n - 1) // 2

Engine output:

  entry       : VALID
  preserved   : NOT VALID
    counterexample: i = 0, n = 1, s = 0
  sufficiency : VALID

The preservation obligation fails. Walking the counterexample step by step:

step i s n invariant check
body entry 0 0 1 0=0·(1)/2=0 and 001: ✓
after s = s + i + 1 0 1 1 (intermediate state, not checked)
after i = i + 1 1 1 1 1=1·0/2=0: ✗

The state at body entry satisfies the invariant. After the buggy body, s=1 and i=1. The invariant claims s=i(i1)/2=0, but s actually equals 1. Preservation is broken.

flowchart TD
    accTitle: State walk showing preservation failure for sum_to_n with a buggy body
    accDescr: At body entry with i=0, s=0, n=1, the invariant 0 equals 0 times negative one divided by 2 holds. After running the buggy body s equals s plus i plus 1 followed by i equals i plus 1, the state becomes i=1, s=1, n=1. The invariant claim is now 1 equals 1 times 0 divided by 2 equals 0, which is false.
    entry["state at body entry
i = 0, s = 0, n = 1
invariant 0 = 0·(-1)/2 = 0: ✓"] after["state after body
i = 1, s = 1, n = 1
invariant claim: 1 = 1·0/2 = 0: ✗"] entry -->|run buggy body| after classDef bug fill:#f5c7c5,stroke:#a04040 class after bug

Entry and sufficiency still pass on this program. They check what the invariant promises at the loop's boundary. The body's interior is invisible to them. The bug is exactly where the body fails to preserve the invariant, and per-obligation reporting localizes it there.

sum_to_n with a too-weak invariant

The previous run was a buggy program with a correct invariant. Now the program is correct, but the invariant is too weak to imply the postcondition.

sum_to_n_weak.py
def sum_to_n_weak(n):
    assume(n >= 0)
    s = 0
    i = 0
    while i < n:
        invariant(0 <= i and i <= n)   # too weak: says nothing about s
        s = s + i
        i = i + 1
    assert s == n * (n - 1) // 2

The invariant has been weakened: it only tracks the range of i, dropping the relationship between s and i. Engine output:

  entry       : VALID
  preserved   : VALID
  sufficiency : NOT VALID
    counterexample: i = 6, n = 6, s = 16

Sufficiency fails. The counterexample is a state where the invariant holds (066 ✓) and the loop has exited (in so i=6=n), but s=16 does not equal the postcondition n(n1)/2=15.

The counterexample is not a real execution of the program. If we actually ran sum_to_n_weak(6) on the input, the loop would compute s=0+1+2+3+4+5=15, and the assertion would pass. Z3 picked s=16 because nothing in the weak invariant rules it out. After the loop-cut, the invariant is everything the engine knows about s at the exit. If the invariant does not pin down s, the engine cannot prove the postcondition.

This is the false-alarm direction of the WP theorem: {P}S{Q} is valid if Pwp(S,Q), but the converse does not hold. A weak invariant gives a weak wp, and Z3 reports failure even when the program is correct.

The engineering fix is to strengthen the invariant, not change the program. Adding s=i(i1)/2 back to the invariant turns the weak version into the version from the previous run, which verifies. Picking a strong-enough invariant is the engineer's job. The engine reports honestly whether the invariant supplied so far is enough.

Back to Euclid

The natural loop invariant for Euclid's algorithm is gcd(a,b)=gcd(aorig,borig). The engine has no built-in gcd function and no way to declare one with the axioms it would need. We encode the same content using ghost variables: extra integer variables whose only job is to track relationships between the loop's current state and its initial values.

We capture the initial values in ghost_a and ghost_b, set once before the loop and never modified inside it. Then we maintain the relationships ghost_a == p * a + q * b and ghost_b == r * a + s * b. Initially p = 1, q = 0, r = 0, s = 1, so ghost_a == a and ghost_b == b. Each subtraction in the loop body updates the four coefficients to keep both relationships true.

euclid_sub.py
def euclid_sub(a, b):
    assume(a > 0)
    assume(b > 0)
    ghost_a = a
    ghost_b = b
    p = 1
    q = 0
    r = 0
    s = 1
    while a != b:
        invariant(a > 0 and b > 0
                  and ghost_a == p * a + q * b
                  and ghost_b == r * a + s * b)
        if a > b:
            a = a - b
            q = p + q
            s = r + s
        else:
            b = b - a
            p = p + q
            r = r + s
    assert (a == b and a > 0
            and ghost_a == (p + q) * a
            and ghost_b == (r + s) * a)

Engine output:

  entry       : VALID
  preserved   : VALID
  preserved   : VALID
  sufficiency : VALID

The four obligations break down as one entry, two preservations (one per branch of the inner if), and one sufficiency. At loop exit a == b, so the sufficiency assert reduces to ghost_a == (p + q) * a and ghost_b == (r + s) * a. The original inputs are both integer multiples of the final value of a. The result is a common divisor of both inputs, the second of the three correctness properties from the top of Practice.

The third property, that the result is the largest common divisor, is not verified by this run. Proving it takes a separate argument parameterized by an arbitrary common divisor d: show that the loop preserves "d divides a" and "d divides b" through the subtraction, so d divides the result at exit. That argument runs as its own demo file (euclid_sub_max.py) and verifies cleanly. Together the two demos characterize the result as the gcd: it is a common divisor of the inputs, and every common divisor of the inputs divides it.

Summary

program entry preserved sufficiency
x_to_n VALID VALID VALID
sum_to_n VALID VALID VALID
sum_to_n with buggy body VALID NOT VALID VALID
sum_to_n with weak invariant VALID VALID NOT VALID
euclid_sub VALID VALID VALID

A small batch of Z3 queries per program decides correctness for every input. Last week BMC at depth k covered inputs that exit within k iterations. With one invariant per loop, this batch covers every iteration count regardless of how long the loop runs, and per-obligation reporting localizes whichever obligation fails.

Tools in this space

Production tools that implement exactly this VC Gen architecture include Dafny, Why3, F*, and Verus. Each adds richer source languages, more expressive assertion logics, and dedicated tooling for editing and managing annotations on top of the VC Gen we just built.

Tool Notes
Dafny Auto-active verification language; the canonical example of this VC Gen architecture. Compiles to Boogie (the IVL most WP-based VC Gen tools share). Used at AWS and Microsoft.
Why3 Verification platform with multiple SMT backends. Used as the verification engine for Frama-C/WP (C) and SPARK (Ada).
F* Language with computational effects and WP-style verification. Used in Project Everest for the verified TLS stack and EverCrypt cryptographic library.
Verus Rust subset verified by an SMT solver. Adds memory-and-functional correctness proofs to Rust code.

L07 listed Dafny under interactive verification. Dafny's verifier is exactly this VC Gen: a compiler from annotated source through an IVL to a small batch of Z3 queries. The trade-off stayed consistent across every example: one invariant per loop, a small fixed batch of Z3 queries, unbounded soundness.