Practice

PEC is a 2009 tool for proving compiler optimizations correct. We walk it as a case study in solver-aided programming.

Compiler optimizations

A compiler takes source code and produces an executable. Modern compilers rewrite the program along the way for performance: reordering operations, hoisting computations out of loops, inlining functions, eliminating dead code. Every rewrite has to preserve the program's meaning. When one does not, the compiler silently changes what we wrote.

Compilers have bugs

CSmith generates random C programs and compares outputs across multiple compilers, flagging inconsistencies as compiler bugs (PLDI 2011). It has reported over 325 distinct bugs in mainstream C compilers. The GCC and LLVM bugs by compiler stage (paper Table 4):

Stage	GCC	LLVM
Front end	0	10
Middle end	49	75
Back end	17	74
Unclassified	13	43
Total	79	202

EMI is complementary. It takes a real program, mutates code that does not execute on the test input, and checks that the optimizer still produces equivalent output across the variants (PLDI 2014). Over eleven months, EMI reported 147 confirmed unique bugs against GCC and LLVM. The breakdown by bug kind (paper Table 2):

Kind	GCC	LLVM
Wrong code	46	49
Crash	23	10
Performance	10	9
Total	79	68

Optimization passes concentrate the bugs, and most are wrong-code: the program compiles cleanly but produces wrong output.

A buggy optimization

The C program below should return immediately. GCC -O3 used to compile it into an infinite loop.

emi-figure-3.c

int a, b, c, d, e;
int main() {
  for (b = 4; b > -30; b--)
    for (; c;)
      for (;;) {
        e = a > 2147483647 - b;
        if (d) break;
      }
  return 0;
}

Global variables in C default to 0. So c is 0, and the condition of the second for loop is always false. The innermost loop never executes, and main returns immediately.

Two of GCC's loop optimizations interacted badly. Partial Redundancy Elimination identified 2147483647 - b as invariant for the inner loop, and Loop Invariant Motion hoisted it out. After hoisting, the expression overflowed for the negative values b takes during execution. GCC's signed-overflow analysis flagged this as undefined behavior, and the compiler emitted non-terminating code on that path.

EMI found this miscompilation (PLDI 2014, GCC PR 58731).

Of course, this example feels contrived; it was automatically generated in a project to find compiler bugs. But programs an optimizer actually sees often look stranger than that. C++ template instantiation emits code no human writes. Inlining fuses caller and callee, exposing dead branches that did not exist in the source. Code generators of every kind, including autotuners, DSL compilers, and ML frameworks, produce C that humans never type. The optimizer has to be correct on all of them.

Testing finds bugs that testing happens to hit. To trust an optimization on every input, we have to prove it correct.

Verifying optimizations

Two approaches to verified compiler optimization have been around for decades:

	Translation validation	A priori
Timing	After each compilation	Once, before the compiler ships
Scope	This specific input vs this specific output	A class of inputs vs a class of outputs
Cost	Per compilation	One-time

Translation validation runs the optimizer, then checks that this specific input mapped to this specific output preserving semantics. Pnueli et al. introduced the idea (TACAS 1998); Necula extended it to GCC's optimizer (PLDI 2000); Alive2 (Lopes et al., PLDI 2021) does it at LLVM scale today and finds optimizer bugs in mainline regularly. Checking equivalence of two concrete programs is often easier than reasoning statically about the optimization code itself, but the cost is paid every compilation.

A priori verification proves the transformation correct before the compiler ships. Cobalt and Rhodium (Lerner et al., PLDI 2003 and POPL 2005) proved single-statement rewrites. CompCert (Leroy, POPL 2006) verified a whole optimizing compiler, and PEC (Kundu, Tatlock, Lerner, PLDI 2009) extended the approach to many-to-many rewrites including loop optimizations. Alive (Lopes et al., PLDI 2015) brought it to LLVM peephole rules at production scale. The cost is paid once, but the proof obligations get harder as the optimizations get more complex.

PEC

PEC verifies compiler optimizations expressed as parameterized rewrite rules. A single rule can describe an optimization that fires on infinitely many concrete programs, and PEC proves the rule correct once and for all. PEC is expressive: it can support loop optimizations (software pipelining, unrolling, peeling, interchange, fusion) and classical scalar optimizations (common subexpression elimination, copy propagation, branch folding).

PEC proves a rule correct by checking that two parameterized programs are equivalent: the program before the rewrite and the program after. If the two are equivalent under the rule's side conditions, the rewrite is sound for every concrete instance.

This raises four questions:

What is a parameterized program?
How can we represent compiler optimizations as rewrites between parameterized programs?
What does it mean for two parameterized programs to be equivalent?
How can a solver check that?

A parameterized program

A parameterized program contains metavariables: placeholders that stand for arbitrary program pieces. Here is one:

I := 0;
while (I < E) {
  S;
  I++;
}

The three metavariables stand for different kinds of program text:

I stands for any program variable (a name like k, i, or counter).
E stands for any expression (a value-producing piece like 100, n, or length - 1).
S stands for any statement (a single operation or a block of straight-line code).

A parameterized program represents the set of all concrete programs you can produce by replacing each metavariable with text drawn from its category. A concrete program matches the parameterized program under a substitution: a mapping from each metavariable to the text that fills its place. Here is one substitution for the program above:

Metavariable	Substitutes for
`I`	`k`
`E`	`100`
`S`	`a[k] += k`

Applying the substitution produces:

k := 0;
while (k < 100) {
  a[k] += k;
  k++;
}

The parameterized version represents this concrete program along with infinitely many others, one for each substitution.

Rewrite rules

PEC models compiler optimizations as rewrite rules. Each rule is a pair of parameterized programs plus side conditions on how they may be instantiated. The FIND program is what the rule expects to see in the input. The REPLACE program is what the rule produces. The compiler applies the rule by matching FIND against actual code and substituting REPLACE.

The loop peeling rule:

FIND
  I := 0;
  while (I < E) {
    S;
    I++;
  }

REPLACE
  I := 0;
  while (I < E - 1) {
    S;
    I++;
  }
  S;
  I++;

WHERE
  E > 0
  S does not modify I, E

The rule shifts one iteration of the loop out of the body. The side conditions constrain when the rule applies:

E > 0 ensures the loop has at least one iteration to peel. If E = 0, the original loop runs zero times; the REPLACE version would still run S; I++ once and produce a different result.
S does not modify I or E ensures the rewrite preserves the loop's behavior. If S could write I, the peeled iteration's I++ would step a different counter than expected. A write to E would shift the loop bound.

Applying the rule to a concrete loop:

k := 0;
while (k < 100) {
  a[k] += k;
  k++;
}

The substitution I ↦ k, E ↦ 100, S ↦ a[k] += k makes the FIND pattern match. The side conditions check: 100 > 0 holds, and a[k] += k writes only to a[k], not k or 100. Applying the same substitution to REPLACE produces:

k := 0;
while (k < 99) {
  a[k] += k;
  k++;
}
a[k] += k;
k++;

Both programs produce the same final state. The peeled version splits the original's 100 iterations into 99 in the loop plus one in the peeled tail.

Equivalence

L07's symbolic-execution engine reasoned about straight-line program fragments using the strongest postcondition. Each path through the code became a Z3 query. For programs with loops, the engine unrolled to a bound and reasoned within it.

L08 extended the approach to unbounded loops. The cost was loop invariants: we had to provide one for each loop, and the WP engine turned the annotated program into Z3 obligations.

PEC asks a stronger question: do two parameterized programs produce equivalent results, for every instantiation of their metavariables and every starting state?

The picture is two parallel runs from the same starting state:

flowchart TB
  accTitle: Two parallel runs from the same starting state
  accDescr: From σ, run FIND and REPLACE; both final states σ₁' and σ₂' must agree on live variables.

  start([starting state σ])
  F[run FIND]
  R[run REPLACE]
  e1([state σ₁'])
  e2([state σ₂'])
  agree{{agree on live vars}}

  start --> F --> e1
  start --> R --> e2
  e1 --> agree
  e2 --> agree

The live variables are the variables the rest of the program might use. Temporary variables introduced by the rewrite do not need to match.

The loop-peeling example above is one such pair. Under the substitution I ↦ k, E ↦ 100, S ↦ a[k] += k, both the original and the peeled loop end with k = 100 and a holding the same sums. Formally: for every substitution $θ$ satisfying the rule's side conditions and every starting state $σ$ , running $θ (FIND)$ and $θ (REPLACE)$ from $σ$ produces final states that agree on the live variables.

L08 verified a program against a property (a postcondition). PEC verifies a property that links two programs: their final states must agree on the live variables. The next section shows how PEC discharges this obligation, including how it handles the loops inside FIND and REPLACE.

Pairs of programs come up beyond compiler optimization. Refactoring a function, replacing one library with a faster equivalent, lifting a binary back to C: all of these are equivalence-checking problems with the same shape. The PEC technique gives you a starting point.

PEC's algorithm

L08 reasoned about loops with a user-supplied invariant. PEC tries to find the invariants on its own.

PEC discharges the equivalence obligation in three steps:

Find synchronization points in the two CFGs.
Generate invariants at each sync point.
Check that every path between sync points preserves its invariants. Strengthen on failure.

We trace the three steps through loop peeling.

Synchronization points

A synchronization point is a pair of locations, one in FIND and one in REPLACE, where PEC tracks an invariant linking the two program states. PEC seeds the sync points from the CFGs:

An entry sync point pairs the two program starts; an exit sync point pairs the two program ends.
Interior sync points sit at statement metavariables. PEC walks both CFGs in lockstep and inserts a sync point at each occurrence; when S sits inside a loop, the sync point cuts the loop into bounded path segments.
PEC prunes paths the side conditions render infeasible. In loop peeling, the path that exits the loop immediately (I ≥ E) cannot fire: I = 0 at the loop test and side condition E > 0 together rule it out.

A slide titled 'Find Synchronization Points' with two
panels. The left panel lists three bullets: traverse in
lockstep; stop at statement metavariables; prune
infeasible paths. The right panel shows the FIND and
REPLACE control-flow graphs side by side. Dashed red
lines connect paired locations across the two graphs,
marking sync points at the entry, at two interior
locations around the loop body (labeled A and B), and
at the exit. — PEC walks both CFGs in lockstep, marking sync points (dashed red) at the entry, the exit, and the boundaries of statement metavariables. Side conditions prune infeasible paths.

Initial invariants

An invariant at a sync point is a predicate over the two program states $σ_{1}, σ_{2}$ . These predicates compose into the formulas PEC sends to the solver in step 3. PEC seeds each one with $σ_{1} = σ_{2}$ (the states agree here) and conjoins any branch conditions taken along the path to this sync point.

We write $eval (σ, e)$ for the value of expression $e$ when its variables are read from state $σ$ , the same lookup L07's SE engine did when substituting state into an expression to build a path constraint. The seeded invariants for loop peeling:

Sync point	Invariant
Entry	$σ_{1} = σ_{2}$
A	$σ_{1} = σ_{2} \land eval (σ_{1}, I < E) \land eval (σ_{2}, I < E - 1)$
B	$σ_{1} = σ_{2} \land eval (σ_{1}, I < E) \land eval (σ_{2}, I \geq E - 1)$
Exit	$σ_{1} = σ_{2}$

Invariant B captures the geometry of peeling: FIND is still inside the loop ( $I < E$ ) while REPLACE has just exited ( $I \geq E - 1$ ). The peeled iteration on the REPLACE side runs after.

The solver query

For each path between two sync points, PEC builds one query for the solver. The query says: if the predecessor invariant holds and we execute the FIND and REPLACE paths from there in parallel, the successor invariant holds at the other end.

We write $step (σ, p)$ for the state after executing program fragment $p$ starting in state $σ$ ; this is the same forward-execution L07's SE engine performed statement by statement. Let $p_{1}$ be the FIND path S; I++; I < E and let $p_{2}$ be the REPLACE path S; I++; I >= E - 1, one trip through the loop body. The obligation says that starting from invariant A, executing $p_{1}$ and $p_{2}$ in parallel lands at invariant B:

\forall σ_{1}, σ_{2} . A (σ_{1}, σ_{2}) \land σ_{1}^{'} = step (σ_{1}, p_{1}) \land σ_{2}^{'} = step (σ_{2}, p_{2}) \Rightarrow B (σ_{1}^{'}, σ_{2}^{'})

A slide titled 'Check Invariants' showing the ATP query
for one path through the loop body. Left panel: the
quantified formula A of sigma 1 and sigma 2 implies that
after executing S; I++; I less than E in FIND and S; I++;
I at least E minus 1 in REPLACE, the successor invariant
B of sigma 1 prime and sigma 2 prime holds. Right panel:
the two CFGs with the path from sync point A down through
S, I++, and the branch to sync point B highlighted in
green. — One path between sync points becomes one solver query.

PEC sends each obligation to the solver. Two outcomes:

Valid. The invariant holds across this path. Move on.
Invalid. The current invariants are not yet a simulation: some path does not preserve them. PEC strengthens the predecessor invariant by adding the weakest precondition of the successor invariant under the failing path, then retries every path that touches it. The strengthening loop converges in practice but is bounded to guarantee termination; exceeding the bound rejects the rule.

When every path returns Valid, the invariants hold at every sync point: the two parameterized programs agree on the live variables. The rule is proven correct once and for all, for every substitution that satisfies the side conditions.

Why this works

The sync points and their invariants form a simulation relation between FIND and REPLACE: related states step to related states. PEC's ATP queries check that the candidate relation is in fact a simulation; the strengthening loop refines the candidate until they all pass. By induction over execution traces, the relation propagates from entry to exit. The exit invariant forces $σ_{1} = σ_{2}$ , so the two programs agree on the live variables. The XCert follow-up (PLDI 2010) mechanized this argument in Rocq.

Going deeper

For readers who want to dig in:

Talk. The PLDI 2009 talk walks the same material slide by slide.
Code. The PEC repository is about 2,000 lines of OCaml. The main pieces:
src/synch.ml finds sync points by walking the two CFGs in lockstep.
src/check.ml builds the per-path obligations and runs the strengthening loop. The obligation function (~30 lines) is the algorithm we walked above.
src/semantics.ml defines the operator semantics PEC hands to the solver as background axioms.
test/relate/p-loop-peel-01.rwr is the loop-peeling rule we used as the running example, in PEC's rewrite-rule DSL.

Theory next: what writing this kind of tool looks like today.