(Week 8)

Theory

The engine in Practice emits one or more verification conditions per program. Theory names what those conditions mean, gives a diagnostic procedure for when the engine says NOT VALID, and shows what termination would take.

Where Practice Left Off

The engine in Practice emits one or more verification conditions per program. A verification condition (VC) is a formula of the form $P ⟹ wp (S, Q)$ that the engine hands to Z3. Z3 returns VALID when the implication is valid.

For Practice's sum_to_n, the engine emitted three VCs:

entry       : VALID
preserved   : VALID
sufficiency : VALID

When every VC is VALID, the Hoare triple ${P} S {Q}$ is valid: every terminating execution of $S$ from a state satisfying $P$ ends in a state satisfying $Q$ . This is soundness of WP. The proof mirrors L07's SP soundness, six cases, arrows reversed.

The converse fails. A correct program can produce a non-valid VC if the invariant is too weak. Practice's sum_to_n_weak was exactly this: the program is correct, but the engine reports NOT VALID. Distinguishing real bugs from too-weak invariants is the engineering work the next section discusses.

L07 stated in §7 that SP and WP are dual: ${P} S {Q}$ is valid if and only if $P ⟹ wp (S, Q)$ is valid, equivalently if and only if $sp (S, P) ⟹ Q$ is valid. The two formulas look different. The verification question is the same. L07's SE engine asked Z3 one question per execution path. L08's WP engine asks Z3 one question per obligation.

Production tools split. CBMC and KLEE work forward by SP. Dafny, Why3, F*, and Verus work backward by WP. Forward gives bug-finding granularity per path. Backward gives correctness proofs from invariant annotations.

Inductive invariants

Walking through `sum_to_n_weak`

Practice's sum_to_n_weak is the false-alarm case. The program is correct, but the supplied invariant is too weak and the engine reports NOT VALID. The walkthrough below diagnoses the failure and strengthens the invariant.

The supplied invariant tracked only the range of $i$ :

invariant(0 <= i and i <= n)

The engine reported:

entry       : VALID
preserved   : VALID
sufficiency : NOT VALID
    counterexample: i = 6, n = 6, s = 16

Sufficiency fails. Z3 produced a state that satisfies the invariant ( $0 \leq 6 \leq 6$ ✓) and the loop-exit condition ( $i = n = 6$ , so $\neg (i < n)$ holds), but violates the postcondition ( $s = 16$ , while $n (n - 1) / 2 = 15$ ). The state is not a real execution. It is a state the invariant allows but the postcondition forbids.

The diagnosis: the invariant doesn't constrain $s$ . At loop exit (when $i = n$ ), the postcondition requires $s = n (n - 1) / 2$ . Add the partial-sum formula as a conjunct:

invariant(0 <= i and i <= n and s == i * (i - 1) // 2)

The engine now reports:

entry       : VALID
preserved   : VALID
sufficiency : VALID

The program is unchanged. Only the invariant got stronger.

The diagnostic flowchart

When the engine reports NOT VALID, the failed obligation localizes the problem:

flowchart TD
    accTitle: Diagnosing a NOT VALID verdict by which obligation failed
    accDescr: When the engine reports NOT VALID, the diagnostic depends on which obligation failed. Entry failure means the precondition does not establish the invariant; the fix is to strengthen the precondition or weaken the invariant at entry. Preservation failure means the body breaks the invariant; this is either a real bug in the body or the invariant is missing a conjunct that the body relies on. Sufficiency failure means the invariant combined with the negated guard does not imply the postcondition; the invariant is too weak at exit and needs a conjunct that captures what the postcondition requires.
    nv["engine reports NOT VALID"]
    obl{which obligation?}
    nv --> obl
    obl -->|entry| e["P does not establish I.
Strengthen P, or
weaken I at the entry."]
    obl -->|preservation| p["Body breaks I.
Either there is a bug,
or I is missing a conjunct."]
    obl -->|sufficiency| s["I + ¬C does not imply Q.
I is too weak at exit.
Add a conjunct for Q."]

The walk on sum_to_n_weak took the sufficiency branch.

What "inductive and sufficient" means

An invariant $I$ for a loop while C do S is inductive and sufficient for postcondition $Q$ when three conditions hold:

Entry. $P ⟹ I$ . The invariant holds when the loop is first reached.
Preservation. ${I \land C} S {I}$ . One iteration of the body preserves the invariant.
Sufficiency. $(I \land \neg C) ⟹ Q$ . The invariant combined with loop exit implies the postcondition.

These are exactly the three obligations the engine emits per loop. Per-obligation reporting is a direct readout of which condition the supplied invariant fails to satisfy.

Pause: predict the failure

Predict 1. Consider this annotated program:

def double(n):
    assume(n >= 0)
    s = 0
    i = 0
    while i < n:
        invariant(s == 2 * i and i <= n)
        s = s + 1
        i = i + 1
    assert s == 2 * n

Which obligation fails, and why?

Answer

Preservation. From s = 2i and i < n, the body produces s = 2i + 1 and i = i + 1. The invariant claim on the new state is s = 2(i+1) = 2i + 2, but the actual new s is 2i + 1. The body breaks the invariant.

By the flowchart, preservation failure means either a bug or a missing conjunct. Here it is a bug: the body should be s = s + 2. The diagnostic localized the failure to the body, which is where the bug lives.

Predict 2. Now consider a different program:

def accumulate(n):
    assume(n >= 0)
    s = 0
    i = 0
    while i < n:
        invariant(s >= 0)
        s = s + i
        i = i + 1
    assert s >= 0

The program is correct. The postcondition holds on every input. Which obligation does the engine report NOT VALID, and why?

Answer

Preservation. The invariant says nothing about i. After the havoc-then-assume that opens the cut form, i is a fresh symbol constrained only by s >= 0, which doesn't mention i. Z3 picks i = -100 at body entry: the invariant holds (s = 0, 0 >= 0), but after s = s + i the new s is -100, which fails s >= 0 on the new state.

The invariant is true on every reachable state of the actual program. s is a sum of non-negatives starting from 0, so s never goes negative. But the engine cannot see this from the invariant alone. The invariant does not carry past the havoc the information that i is non-negative.

The fix is to add 0 <= i (or 0 <= i and i <= n) to the invariant. Now the engine knows i is non-negative at body entry, and preservation goes through.

This is the true-but-not-inductive trap. An invariant can hold at every reachable state and still fail preservation, because the engine reasons about the post-havoc state instead of the actual execution. "Inductive" means strong enough for the engine to prove preservation through its abstract reasoning. The invariant has to be self-supporting in that reasoning. Being true on every actual execution is a weaker property.

Finding invariants in practice

The diagnostic procedure is mechanical once an invariant is on the page. Choosing the right invariant in the first place is the harder part. It requires understanding what the program actually does and what relationship between variables the postcondition needs to see at exit. Production tools (Dafny, Why3, F*) require the engineer to write the invariant. Research efforts (Daikon's dynamic invariant detection, recent ML-based methods) try to infer plausible invariants automatically. Human-supplied invariants remain the norm in shipping verification tools.

Stronger and weaker

Across this lecture and the last, we have called invariants and preconditions "stronger" or "weaker." Those words have a set-theoretic meaning that ties the rule of consequence, the false-alarm direction, and the word "weakest" in WP together.

Predicates as sets of states

A predicate over the program's variables describes a set: the states where the predicate holds. The state with $x = 3, y = 5$ is in the predicate $x > 0$ and outside the predicate $x < 0$ .

A Hoare triple ${P} S {Q}$ is a claim about how $S$ moves states between sets. Every state in the $P$ -set, run through $S$ and terminating, lands in the $Q$ -set.

flowchart LR
    accTitle: Hoare triple as a state transformer between sets
    accDescr: A Hoare triple with precondition P and postcondition Q is a claim about how the statement S moves states between sets. Every state in the P-set, run through S and terminating, lands in the Q-set.
    P["P-set
(starting states)"]
    Q["Q-set
(ending states)"]
    P -->|S| Q

Stronger means smaller

The implication $A ⟹ B$ means every state in the $A$ -set is also in the $B$ -set. As sets, $A \subseteq B$ .

In set terms, stronger means smaller: a stronger predicate is more restrictive, ruling out more states than a weaker one. The everyday meanings of "strong" and "weak" invert here.

flowchart TD
    accTitle: Stronger predicates describe smaller sets
    accDescr: A weaker predicate covers a larger set of states. A stronger predicate covers a smaller subset of those states. The example shows x greater than 0 as the weaker outer set with x greater than 5 as the stronger inner subset.
    subgraph weaker["weaker: x > 0"]
        stronger["stronger: x > 5"]
    end

The rule of consequence from Practice replaces a precondition with a stronger one and a postcondition with a weaker one. The replacement shrinks the input set and grows the output set. Both moves keep the triple valid: a smaller input set means $S$ has to handle fewer starting states, and a larger output set means $S$ has more room to land in.

\frac{P ⟹ M {M} S {N} N ⟹ Q}{{P} S {Q}}

The implications above the bar are the set inclusions: $P$ sits inside $M$ , and $N$ sits inside $Q$ .

The weakest precondition $wp (S, Q)$ is the largest set $P$ such that ${P} S {Q}$ holds. It is the most permissive set of starting states from which $S$ is guaranteed to reach $Q$ .

Invariants as sets

A loop invariant $I$ defines a set of states. The three obligations from the previous section are claims about how three sets sit:

flowchart TD
    accTitle: Three nested sets for inductive invariants
    accDescr: The reachable states at the loop head sit inside the invariant I-set. The I-set intersected with the loop-exit set sits inside the postcondition Q-set.
    subgraph Q["postcondition Q"]
        subgraph I["invariant I"]
            R["reachable states"]
        end
    end

Entry. The reachable states at the loop head sit inside the $I$ -set.
Preservation. The $I$ -set maps into itself across one body iteration. The invariant is closed under the loop body.
Sufficiency. The $I$ -set intersected with the loop-exit set $\neg C$ sits inside the $Q$ -set.

A too-weak invariant has an $I$ -set too large to fit inside $Q$ at exit. A too-strong invariant has an $I$ -set too small to contain the reachable states. The diagnostic flowchart from the previous section identifies which boundary the invariant crossed.

Why "true on every reachable state" is not enough

Recall accumulate. The invariant $s \geq 0$ holds on every reachable state, and the engine still reports preservation as NOT VALID.

The $s \geq 0$ set is much bigger than the reachable set. It contains states where $i$ is large and negative, states no execution of accumulate ever produces. When the engine checks preservation, the loop-cut transformation havocs every loop target and assumes only the invariant. Z3 picks a starting state from anywhere inside the $I$ -set (the middle box in the three-box picture above), including states outside the reachable subset (the inner box). It picks $i = - 100$ , runs the body, and the new $s = - 100$ falls outside the $I$ -set.

"Inductive" in set terms means the invariant set is closed under the loop body: the body maps every state in $I$ to another state in $I$ . "True on every reachable state" is a weaker condition: it requires the invariant to contain the reachable subset, with no constraint on closure under the body.

Termination

Partial correctness, dramatized

Run the engine on this program:

def loop_forever():
    i = 0
    while True:
        invariant(True)
        i = i + 1
    assert False

Engine output:

entry       : VALID
preserved   : VALID
sufficiency : VALID

By soundness of WP, the Hoare triple ${true}$ loop_forever() ${false}$ holds. Reading this literally, the engine has proved False.

${P} S {Q}$ says: if the program terminates from a $P$ -state, then the result satisfies $Q$ . This program never terminates from any state. The if-clause is unsatisfiable, and the implication holds vacuously. The engine has correctly proved partial correctness. Partial correctness gives no guarantee about programs that loop forever.

Everything we have done so far is partial correctness. The verifier reports nothing about termination.

Ranking functions

To prove termination, augment the loop with a ranking function: a non-negative integer expression that strictly decreases on every iteration.

The augmented while rule has two new premises beyond the partial-correctness version:

\frac{I \land C ⟹ R \geq 0 {I \land C \land R = r_{0}} S {I \land R < r_{0}}}{{I} while C do S {I \land \neg C}}

(This is the total-correctness while rule; the partial version had only the second premise, and without the $R < r_{0}$ clause.)

While the loop is running, $R$ is non-negative. One body iteration both preserves the invariant and strictly decreases $R$ . A strictly-decreasing non-negative integer cannot decrease forever, so the loop terminates.

For Practice's x_to_n with body x = x + 1 and guard x < n, a natural ranking function is n - x. The body grows $x$ by 1 each iteration, so $n - x$ decreases by 1. The invariant x <= n gives $n - x \geq 0$ . The loop terminates within $n - x_{initial}$ iterations.

The L07 connection

L07's Zune section used a related pattern. Inside the loop body, save the current value of the measure into a fresh variable, then assert at the bottom of the body that the new value is strictly smaller. If the assertion ever fails, the loop has run a body iteration without progress, which is the fingerprint of non-termination.

Side by side:

L07 progress assertion (inline)	L08 `decreases` clause (annotation)
Save the measure: `days_old = days`	Declare the measure: `decreases days`
Run the body	Run the body
Assert it decreased: `assert days < days_old`	Engine checks the rank-decrease obligation automatically

The two encodings carry the same proof obligation. L07 inserted the obligation inline as an assertion because L07's engine only handled assertions. L08's annotation version is what production tools expose to the engineer.

In production

Dafny, Why3, and F* accept decreases clauses on loops and on recursive functions, where the same idea applies. When the engineer writes a decreases annotation, the tool checks the rank-decreasing obligation as part of its normal VC Gen output. Total correctness is one annotation away from partial.

Mini IMP does not implement decreases. The engine in lectures/l08/demos/ checks only partial-correctness obligations. The ranking-function rule above is the textbook artifact. Production tools have the rule built in.

Verification as reduction

Verification this week joins SAT, theory solvers, and SMT in the same reduction pattern. Every problem in this course has reduced to a question Z3 can dispatch. VC Gen is the reduction for "is this program correct?" with the loop invariant as the human input the engineer supplies.