Skip to main content
  (Week 9)

Practice

PEC is a 2009 tool for proving compiler optimizations correct. We walk it as a case study in solver-aided programming.

Compiler optimizations

A compiler takes source code and produces an executable. Modern compilers rewrite the program along the way for performance: reordering operations, hoisting computations out of loops, inlining functions, eliminating dead code. Every rewrite has to preserve the program's meaning. When one does not, the compiler silently changes what we wrote.

Compilers have bugs

CSmith generates random C programs and compares outputs across multiple compilers, flagging inconsistencies as compiler bugs (PLDI 2011). It has reported over 325 distinct bugs in mainstream C compilers. The GCC and LLVM bugs by compiler stage (paper Table 4):

Stage GCC LLVM
Front end 0 10
Middle end 49 75
Back end 17 74
Unclassified 13 43
Total 79 202

EMI is complementary. It takes a real program, mutates code that does not execute on the test input, and checks that the optimizer still produces equivalent output across the variants (PLDI 2014). Over eleven months, EMI reported 147 confirmed unique bugs against GCC and LLVM. The breakdown by bug kind (paper Table 2):

Kind GCC LLVM
Wrong code 46 49
Crash 23 10
Performance 10 9
Total 79 68

Optimization passes concentrate the bugs, and most are wrong-code: the program compiles cleanly but produces wrong output.

A buggy optimization

The C program below should return immediately. GCC -O3 used to compile it into an infinite loop.

emi-figure-3.c
int a, b, c, d, e;
int main() {
  for (b = 4; b > -30; b--)
    for (; c;)
      for (;;) {
        e = a > 2147483647 - b;
        if (d) break;
      }
  return 0;
}

Global variables in C default to 0. So c is 0, and the condition of the second for loop is always false. The innermost loop never executes, and main returns immediately.

Two of GCC's loop optimizations interacted badly. Partial Redundancy Elimination identified 2147483647 - b as invariant for the inner loop, and Loop Invariant Motion hoisted it out. After hoisting, the expression overflowed for the negative values b takes during execution. GCC's signed-overflow analysis flagged this as undefined behavior, and the compiler emitted non-terminating code on that path.

EMI found this miscompilation (PLDI 2014, GCC PR 58731).

Of course, this example feels contrived; it was automatically generated in a project to find compiler bugs. But programs an optimizer actually sees often look stranger than that. C++ template instantiation emits code no human writes. Inlining fuses caller and callee, exposing dead branches that did not exist in the source. Code generators of every kind, including autotuners, DSL compilers, and ML frameworks, produce C that humans never type. The optimizer has to be correct on all of them.

Testing finds bugs that testing happens to hit. To trust an optimization on every input, we have to prove it correct.

Verifying optimizations

Two approaches to verified compiler optimization have been around for decades:

Translation validation A priori
Timing After each compilation Once, before the compiler ships
Scope This specific input vs this specific output A class of inputs vs a class of outputs
Cost Per compilation One-time

Translation validation runs the optimizer, then checks that this specific input mapped to this specific output preserving semantics. Pnueli et al. introduced the idea (TACAS 1998); Necula extended it to GCC's optimizer (PLDI 2000); Alive2 (Lopes et al., PLDI 2021) does it at LLVM scale today and finds optimizer bugs in mainline regularly. Checking equivalence of two concrete programs is often easier than reasoning statically about the optimization code itself, but the cost is paid every compilation.

A priori verification proves the transformation correct before the compiler ships. Cobalt and Rhodium (Lerner et al., PLDI 2003 and POPL 2005) proved single-statement rewrites. CompCert (Leroy, POPL 2006) verified a whole optimizing compiler, and PEC (Kundu, Tatlock, Lerner, PLDI 2009) extended the approach to many-to-many rewrites including loop optimizations. Alive (Lopes et al., PLDI 2015) brought it to LLVM peephole rules at production scale. The cost is paid once, but the proof obligations get harder as the optimizations get more complex.

PEC

PEC verifies compiler optimizations expressed as parameterized rewrite rules. A single rule can describe an optimization that fires on infinitely many concrete programs, and PEC proves the rule correct once and for all. PEC is expressive: it can support loop optimizations (software pipelining, unrolling, peeling, interchange, fusion) and classical scalar optimizations (common subexpression elimination, copy propagation, branch folding).

PEC proves a rule correct by checking that two parameterized programs are equivalent: the program before the rewrite and the program after. If the two are equivalent under the rule's side conditions, the rewrite is sound for every concrete instance.

This raises four questions:

  1. What is a parameterized program?
  2. How can we represent compiler optimizations as rewrites between parameterized programs?
  3. What does it mean for two parameterized programs to be equivalent?
  4. How can a solver check that?

A parameterized program

A parameterized program contains metavariables: placeholders that stand for arbitrary program pieces. Here is one:

I := 0;
while (I < E) {
  S;
  I++;
}

The three metavariables stand for different kinds of program text:

A parameterized program represents the set of all concrete programs you can produce by replacing each metavariable with text drawn from its category. A concrete program matches the parameterized program under a substitution: a mapping from each metavariable to the text that fills its place. Here is one substitution for the program above:

Metavariable Substitutes for
I k
E 100
S a[k] += k

Applying the substitution produces:

k := 0;
while (k < 100) {
  a[k] += k;
  k++;
}

The parameterized version represents this concrete program along with infinitely many others, one for each substitution.

Rewrite rules

PEC models compiler optimizations as rewrite rules. Each rule is a pair of parameterized programs plus side conditions on how they may be instantiated. The FIND program is what the rule expects to see in the input. The REPLACE program is what the rule produces. The compiler applies the rule by matching FIND against actual code and substituting REPLACE.

The loop peeling rule:

FIND
  I := 0;
  while (I < E) {
    S;
    I++;
  }

REPLACE
  I := 0;
  while (I < E - 1) {
    S;
    I++;
  }
  S;
  I++;

WHERE
  E > 0
  S does not modify I, E

The rule shifts one iteration of the loop out of the body. The side conditions constrain when the rule applies:

Applying the rule to a concrete loop:

k := 0;
while (k < 100) {
  a[k] += k;
  k++;
}

The substitution I ↦ k, E ↦ 100, S ↦ a[k] += k makes the FIND pattern match. The side conditions check: 100 > 0 holds, and a[k] += k writes only to a[k], not k or 100. Applying the same substitution to REPLACE produces:

k := 0;
while (k < 99) {
  a[k] += k;
  k++;
}
a[k] += k;
k++;

Both programs produce the same final state. The peeled version splits the original's 100 iterations into 99 in the loop plus one in the peeled tail.

Equivalence

L07's symbolic-execution engine reasoned about straight-line program fragments using the strongest postcondition. Each path through the code became a Z3 query. For programs with loops, the engine unrolled to a bound and reasoned within it.

L08 extended the approach to unbounded loops. The cost was loop invariants: we had to provide one for each loop, and the WP engine turned the annotated program into Z3 obligations.

PEC asks a stronger question: do two parameterized programs produce equivalent results, for every instantiation of their metavariables and every starting state?

The picture is two parallel runs from the same starting state:

flowchart TB
  accTitle: Two parallel runs from the same starting state
  accDescr: From σ, run FIND and REPLACE; both final states σ₁' and σ₂' must agree on live variables.

  start([starting state σ])
  F[run FIND]
  R[run REPLACE]
  e1([state σ₁'])
  e2([state σ₂'])
  agree{{agree on live vars}}

  start --> F --> e1
  start --> R --> e2
  e1 --> agree
  e2 --> agree

The live variables are the variables the rest of the program might use. Temporary variables introduced by the rewrite do not need to match.

The loop-peeling example above is one such pair. Under the substitution I ↦ k, E ↦ 100, S ↦ a[k] += k, both the original and the peeled loop end with k = 100 and a holding the same sums. Formally: for every substitution θ satisfying the rule's side conditions and every starting state σ, running θ(FIND) and θ(REPLACE) from σ produces final states that agree on the live variables.

L08 verified a program against a property (a postcondition). PEC verifies a property that links two programs: their final states must agree on the live variables. The next section shows how PEC discharges this obligation, including how it handles the loops inside FIND and REPLACE.

Pairs of programs come up beyond compiler optimization. Refactoring a function, replacing one library with a faster equivalent, lifting a binary back to C: all of these are equivalence-checking problems with the same shape. The PEC technique gives you a starting point.

PEC's algorithm

L08 reasoned about loops with a user-supplied invariant. PEC tries to find the invariants on its own.

PEC discharges the equivalence obligation in three steps:

  1. Find synchronization points in the two CFGs.
  2. Generate invariants at each sync point.
  3. Check that every path between sync points preserves its invariants. Strengthen on failure.

We trace the three steps through loop peeling.

Synchronization points

A synchronization point is a pair of locations, one in FIND and one in REPLACE, where PEC tracks an invariant linking the two program states. PEC seeds the sync points from the CFGs:

A slide titled 'Find Synchronization Points' with two
            panels. The left panel lists three bullets: traverse in
            lockstep; stop at statement metavariables; prune
            infeasible paths. The right panel shows the FIND and
            REPLACE control-flow graphs side by side. Dashed red
            lines connect paired locations across the two graphs,
            marking sync points at the entry, at two interior
            locations around the loop body (labeled A and B), and
            at the exit.
PEC walks both CFGs in lockstep, marking sync points (dashed red) at the entry, the exit, and the boundaries of statement metavariables. Side conditions prune infeasible paths.

Initial invariants

An invariant at a sync point is a predicate over the two program states σ1,σ2. These predicates compose into the formulas PEC sends to the solver in step 3. PEC seeds each one with σ1=σ2 (the states agree here) and conjoins any branch conditions taken along the path to this sync point.

We write eval(σ,e) for the value of expression e when its variables are read from state σ, the same lookup L07's SE engine did when substituting state into an expression to build a path constraint. The seeded invariants for loop peeling:

Sync point Invariant
Entry σ1=σ2
A σ1=σ2eval(σ1,I<E)eval(σ2,I<E1)
B σ1=σ2eval(σ1,I<E)eval(σ2,IE1)
Exit σ1=σ2

Invariant B captures the geometry of peeling: FIND is still inside the loop (I<E) while REPLACE has just exited (IE1). The peeled iteration on the REPLACE side runs after.

The solver query

For each path between two sync points, PEC builds one query for the solver. The query says: if the predecessor invariant holds and we execute the FIND and REPLACE paths from there in parallel, the successor invariant holds at the other end.

We write step(σ,p) for the state after executing program fragment p starting in state σ; this is the same forward-execution L07's SE engine performed statement by statement. Let p1 be the FIND path S; I++; I < E and let p2 be the REPLACE path S; I++; I >= E - 1, one trip through the loop body. The obligation says that starting from invariant A, executing p1 and p2 in parallel lands at invariant B:

σ1,σ2. A(σ1,σ2)σ1=step(σ1,p1)σ2=step(σ2,p2)B(σ1,σ2)
A slide titled 'Check Invariants' showing the ATP query
            for one path through the loop body. Left panel: the
            quantified formula A of sigma 1 and sigma 2 implies that
            after executing S; I++; I less than E in FIND and S; I++;
            I at least E minus 1 in REPLACE, the successor invariant
            B of sigma 1 prime and sigma 2 prime holds. Right panel:
            the two CFGs with the path from sync point A down through
            S, I++, and the branch to sync point B highlighted in
            green.
One path between sync points becomes one solver query.

PEC sends each obligation to the solver. Two outcomes:

When every path returns Valid, the invariants hold at every sync point: the two parameterized programs agree on the live variables. The rule is proven correct once and for all, for every substitution that satisfies the side conditions.

Why this works

The sync points and their invariants form a simulation relation between FIND and REPLACE: related states step to related states. PEC's ATP queries check that the candidate relation is in fact a simulation; the strengthening loop refines the candidate until they all pass. By induction over execution traces, the relation propagates from entry to exit. The exit invariant forces σ1=σ2, so the two programs agree on the live variables. The XCert follow-up (PLDI 2010) mechanized this argument in Rocq.

Going deeper

For readers who want to dig in:

Theory next: what writing this kind of tool looks like today.