Theory
The engine in Practice emits one or more verification conditions per program. Theory names what those conditions mean, gives a diagnostic procedure for when the engine says NOT VALID, and shows what termination would take.
Where Practice Left Off
The engine in Practice emits one or more verification conditions per program. A verification condition (VC) is a formula of the form that the engine hands to Z3. Z3 returns VALID when the implication is valid.
For Practice's sum_to_n, the engine emitted three VCs:
entry : VALID
preserved : VALID
sufficiency : VALID
When every VC is VALID, the Hoare triple is valid: every terminating execution of from a state satisfying ends in a state satisfying . This is soundness of WP. The proof mirrors L07's SP soundness, six cases, arrows reversed.
The converse fails. A correct program can produce a non-valid VC if the invariant is too weak. Practice's sum_to_n_weak was exactly this: the program is correct, but the engine reports NOT VALID. Distinguishing real bugs from too-weak invariants is the engineering work the next section discusses.
L07 stated in §7 that SP and WP are dual: is valid if and only if is valid, equivalently if and only if is valid. The two formulas look different. The verification question is the same. L07's SE engine asked Z3 one question per execution path. L08's WP engine asks Z3 one question per obligation.
Production tools split. CBMC and KLEE work forward by SP. Dafny, Why3, F*, and Verus work backward by WP. Forward gives bug-finding granularity per path. Backward gives correctness proofs from invariant annotations.
Inductive invariants
Walking through sum_to_n_weak
Practice's sum_to_n_weak is the false-alarm case. The program is correct, but the supplied invariant is too weak and the engine reports NOT VALID. The walkthrough below diagnoses the failure and strengthens the invariant.
The supplied invariant tracked only the range of :
invariant(0 <= i and i <= n)
The engine reported:
entry : VALID
preserved : VALID
sufficiency : NOT VALID
counterexample: i = 6, n = 6, s = 16
Sufficiency fails. Z3 produced a state that satisfies the invariant ( ✓) and the loop-exit condition (, so holds), but violates the postcondition (, while ). The state is not a real execution. It is a state the invariant allows but the postcondition forbids.
The diagnosis: the invariant doesn't constrain . At loop exit (when ), the postcondition requires . Add the partial-sum formula as a conjunct:
invariant(0 <= i and i <= n and s == i * (i - 1) // 2)
The engine now reports:
entry : VALID
preserved : VALID
sufficiency : VALID
The program is unchanged. Only the invariant got stronger.
The diagnostic flowchart
When the engine reports NOT VALID, the failed obligation localizes the problem:
flowchart TD
accTitle: Diagnosing a NOT VALID verdict by which obligation failed
accDescr: When the engine reports NOT VALID, the diagnostic depends on which obligation failed. Entry failure means the precondition does not establish the invariant; the fix is to strengthen the precondition or weaken the invariant at entry. Preservation failure means the body breaks the invariant; this is either a real bug in the body or the invariant is missing a conjunct that the body relies on. Sufficiency failure means the invariant combined with the negated guard does not imply the postcondition; the invariant is too weak at exit and needs a conjunct that captures what the postcondition requires.
nv["engine reports NOT VALID"]
obl{which obligation?}
nv --> obl
obl -->|entry| e["P does not establish I.
Strengthen P, or
weaken I at the entry."]
obl -->|preservation| p["Body breaks I.
Either there is a bug,
or I is missing a conjunct."]
obl -->|sufficiency| s["I + ¬C does not imply Q.
I is too weak at exit.
Add a conjunct for Q."]The walk on sum_to_n_weak took the sufficiency branch.
What "inductive and sufficient" means
An invariant for a loop while C do S is inductive and sufficient for postcondition when three conditions hold:
- Entry. . The invariant holds when the loop is first reached.
- Preservation. . One iteration of the body preserves the invariant.
- Sufficiency. . The invariant combined with loop exit implies the postcondition.
These are exactly the three obligations the engine emits per loop. Per-obligation reporting is a direct readout of which condition the supplied invariant fails to satisfy.
Pause: predict the failure
Predict 1. Consider this annotated program:
def double(n):
assume(n >= 0)
s = 0
i = 0
while i < n:
invariant(s == 2 * i and i <= n)
s = s + 1
i = i + 1
assert s == 2 * n
Which obligation fails, and why?
Answer
Preservation. From s = 2i and i < n, the body produces s = 2i + 1 and i = i + 1. The invariant claim on the new state is s = 2(i+1) = 2i + 2, but the actual new s is 2i + 1. The body breaks the invariant.
By the flowchart, preservation failure means either a bug or a missing conjunct. Here it is a bug: the body should be s = s + 2. The diagnostic localized the failure to the body, which is where the bug lives.
Predict 2. Now consider a different program:
def accumulate(n):
assume(n >= 0)
s = 0
i = 0
while i < n:
invariant(s >= 0)
s = s + i
i = i + 1
assert s >= 0
The program is correct. The postcondition holds on every input. Which obligation does the engine report NOT VALID, and why?
Answer
Preservation. The invariant says nothing about i. After the havoc-then-assume that opens the cut form, i is a fresh symbol constrained only by s >= 0, which doesn't mention i. Z3 picks i = -100 at body entry: the invariant holds (s = 0, 0 >= 0), but after s = s + i the new s is -100, which fails s >= 0 on the new state.
The invariant is true on every reachable state of the actual program. s is a sum of non-negatives starting from 0, so s never goes negative. But the engine cannot see this from the invariant alone. The invariant does not carry past the havoc the information that i is non-negative.
The fix is to add 0 <= i (or 0 <= i and i <= n) to the invariant. Now the engine knows i is non-negative at body entry, and preservation goes through.
This is the true-but-not-inductive trap. An invariant can hold at every reachable state and still fail preservation, because the engine reasons about the post-havoc state instead of the actual execution. "Inductive" means strong enough for the engine to prove preservation through its abstract reasoning. The invariant has to be self-supporting in that reasoning. Being true on every actual execution is a weaker property.
Finding invariants in practice
The diagnostic procedure is mechanical once an invariant is on the page. Choosing the right invariant in the first place is the harder part. It requires understanding what the program actually does and what relationship between variables the postcondition needs to see at exit. Production tools (Dafny, Why3, F*) require the engineer to write the invariant. Research efforts (Daikon's dynamic invariant detection, recent ML-based methods) try to infer plausible invariants automatically. Human-supplied invariants remain the norm in shipping verification tools.
Stronger and weaker
Across this lecture and the last, we have called invariants and preconditions "stronger" or "weaker." Those words have a set-theoretic meaning that ties the rule of consequence, the false-alarm direction, and the word "weakest" in WP together.
Predicates as sets of states
A predicate over the program's variables describes a set: the states where the predicate holds. The state with is in the predicate and outside the predicate .
A Hoare triple is a claim about how moves states between sets. Every state in the -set, run through and terminating, lands in the -set.
flowchart LR
accTitle: Hoare triple as a state transformer between sets
accDescr: A Hoare triple with precondition P and postcondition Q is a claim about how the statement S moves states between sets. Every state in the P-set, run through S and terminating, lands in the Q-set.
P["P-set
(starting states)"]
Q["Q-set
(ending states)"]
P -->|S| QStronger means smaller
The implication means every state in the -set is also in the -set. As sets, .
In set terms, stronger means smaller: a stronger predicate is more restrictive, ruling out more states than a weaker one. The everyday meanings of "strong" and "weak" invert here.
flowchart TD
accTitle: Stronger predicates describe smaller sets
accDescr: A weaker predicate covers a larger set of states. A stronger predicate covers a smaller subset of those states. The example shows x greater than 0 as the weaker outer set with x greater than 5 as the stronger inner subset.
subgraph weaker["weaker: x > 0"]
stronger["stronger: x > 5"]
endThe rule of consequence from Practice replaces a precondition with a stronger one and a postcondition with a weaker one. The replacement shrinks the input set and grows the output set. Both moves keep the triple valid: a smaller input set means has to handle fewer starting states, and a larger output set means has more room to land in.
The implications above the bar are the set inclusions: sits inside , and sits inside .
The weakest precondition is the largest set such that holds. It is the most permissive set of starting states from which is guaranteed to reach .
Invariants as sets
A loop invariant defines a set of states. The three obligations from the previous section are claims about how three sets sit:
flowchart TD
accTitle: Three nested sets for inductive invariants
accDescr: The reachable states at the loop head sit inside the invariant I-set. The I-set intersected with the loop-exit set sits inside the postcondition Q-set.
subgraph Q["postcondition Q"]
subgraph I["invariant I"]
R["reachable states"]
end
end- Entry. The reachable states at the loop head sit inside the -set.
- Preservation. The -set maps into itself across one body iteration. The invariant is closed under the loop body.
- Sufficiency. The -set intersected with the loop-exit set sits inside the -set.
A too-weak invariant has an -set too large to fit inside at exit. A too-strong invariant has an -set too small to contain the reachable states. The diagnostic flowchart from the previous section identifies which boundary the invariant crossed.
Why "true on every reachable state" is not enough
Recall accumulate. The invariant holds on every reachable state, and the engine still reports preservation as NOT VALID.
The set is much bigger than the reachable set. It contains states where is large and negative, states no execution of accumulate ever produces. When the engine checks preservation, the loop-cut transformation havocs every loop target and assumes only the invariant. Z3 picks a starting state from anywhere inside the -set (the middle box in the three-box picture above), including states outside the reachable subset (the inner box). It picks , runs the body, and the new falls outside the -set.
"Inductive" in set terms means the invariant set is closed under the loop body: the body maps every state in to another state in . "True on every reachable state" is a weaker condition: it requires the invariant to contain the reachable subset, with no constraint on closure under the body.
Termination
Partial correctness, dramatized
Run the engine on this program:
def loop_forever():
i = 0
while True:
invariant(True)
i = i + 1
assert False
Engine output:
entry : VALID
preserved : VALID
sufficiency : VALID
By soundness of WP, the Hoare triple loop_forever() holds. Reading this literally, the engine has proved False.
says: if the program terminates from a -state, then the result satisfies . This program never terminates from any state. The if-clause is unsatisfiable, and the implication holds vacuously. The engine has correctly proved partial correctness. Partial correctness gives no guarantee about programs that loop forever.
Everything we have done so far is partial correctness. The verifier reports nothing about termination.
Ranking functions
To prove termination, augment the loop with a ranking function: a non-negative integer expression that strictly decreases on every iteration.
The augmented while rule has two new premises beyond the partial-correctness version:
(This is the total-correctness while rule; the partial version had only the second premise, and without the clause.)
While the loop is running, is non-negative. One body iteration both preserves the invariant and strictly decreases . A strictly-decreasing non-negative integer cannot decrease forever, so the loop terminates.
For Practice's x_to_n with body x = x + 1 and guard x < n, a natural ranking function is n - x. The body grows by 1 each iteration, so decreases by 1. The invariant x <= n gives . The loop terminates within iterations.
The L07 connection
L07's Zune section used a related pattern. Inside the loop body, save the current value of the measure into a fresh variable, then assert at the bottom of the body that the new value is strictly smaller. If the assertion ever fails, the loop has run a body iteration without progress, which is the fingerprint of non-termination.
Side by side:
| L07 progress assertion (inline) | L08 decreases clause (annotation) |
|---|---|
Save the measure: days_old = days |
Declare the measure: decreases days |
| Run the body | Run the body |
Assert it decreased: assert days < days_old |
Engine checks the rank-decrease obligation automatically |
The two encodings carry the same proof obligation. L07 inserted the obligation inline as an assertion because L07's engine only handled assertions. L08's annotation version is what production tools expose to the engineer.
In production
Dafny, Why3, and F* accept decreases clauses on loops and on recursive functions, where the same idea applies. When the engineer writes a decreases annotation, the tool checks the rank-decreasing obligation as part of its normal VC Gen output. Total correctness is one annotation away from partial.
Mini IMP does not implement decreases. The engine in lectures/l08/demos/ checks only partial-correctness obligations. The ranking-function rule above is the textbook artifact. Production tools have the rule built in.
Verification as reduction
Verification this week joins SAT, theory solvers, and SMT in the same reduction pattern. Every problem in this course has reduced to a question Z3 can dispatch. VC Gen is the reduction for "is this program correct?" with the loop invariant as the human input the engineer supplies.