Practice
Four theory solvers, four engineering problems. Picking the right theory is a reduction choice.
Where We Left Off
Last week we opened one theory solver all the way up. The theory of equality and uninterpreted functions had four axioms, one decision procedure (congruence closure), and a worked algorithm you traced by hand. Z3 has more theory solvers than that.
Today we meet four more: linear real arithmetic (LRA), linear integer arithmetic (LIA), bitvectors (BV), and arrays. Each one decides a different kind of problem. Each one is the right reach for some engineering situation and the wrong reach for others.
The thread across the four demos is a question you already have a catch phrase for. If you get the reduction wrong, the solver gives you a correct answer to the wrong question. Today that catch phrase gets four different meanings: four distinct ways the wrong-theory choice goes wrong.
LRA: Resource Allocation and Optimization
The problem
A campus dining hall builds a grain bowl from three ingredients: tofu, beans, and rice. Nutrition and cost per kilogram cooked, using rough USDA averages:
| Ingredient | Cost per kg | Protein per kg | Carbs per kg |
|---|---|---|---|
| Tofu | $3.00 | 170g | 20g |
| Beans | $2.50 | 90g | 240g |
| Rice | $1.50 | 30g | 280g |
For every kilogram of bowl, hit the dietary targets: at least 80g of protein and at least 200g of carbohydrates. (A 500g bowl at these ratios has 40g of protein and 100g of carbs, a solid meal for an adult.) What's the cheapest blend that meets both floors?
The linear program
Before writing any code, state the problem formally.
Let , , and be kilograms of tofu, beans, and rice per kilogram of output bowl. All three are continuous quantities. A bowl with kg of tofu is a meaningful portion, not a rounding error.
Objective. Minimize cost per kilogram of bowl:
Constraints. Non-negativity, the blend-sum, and the two nutrition floors:
Three unknowns, a linear objective, a few linear inequalities. This is a linear program. Our job is to hand it to a solver.
The encoding
Translating the LP to Z3 is nearly mechanical.
from z3 import Real, Optimize
t = Real('t')
b = Real('b')
r = Real('r')
opt = Optimize()
opt.add(t >= 0, b >= 0, r >= 0)
opt.add(t + b + r == 1) # 1 kg of bowl
opt.add(170*t + 90*b + 30*r >= 80) # protein floor
opt.add( 20*t + 240*b + 280*r >= 200) # carbs floor
cost = opt.minimize(3.00*t + 2.50*b + 1.50*r)
opt.check()
Three things are new. The variables are real-valued:
Real('t') declares a variable that ranges over the rationals
(Z3 uses rationals internally; for linear arithmetic the
difference between rationals and reals does not matter). The
solver we create is Optimize, not Solver: it answers
feasibility and finds an optimum. And opt.minimize(...)
registers a linear objective and returns a handle we query for
the optimal value.
Everything else is familiar. opt.add(...) adds constraints
exactly as Solver does. opt.check() runs the solver. This
time it does more than sat/unsat: it also finds the point that
minimizes the objective.
The result
m = opt.model()
print(m[t]) # 7/25
print(m[b]) # 9/50
print(m[r]) # 27/50
print(opt.lower(cost)) # 21/10
Per kilogram of bowl: kg tofu, kg beans, kg rice. Minimum cost . Notice Z3 reported the answer as an exact rational, not a float. That is a feature of LRA: the solver reasons over rationals and returns exact answers.
m[t], m[b], and m[r] are the values at the minimum;
opt.lower(cost) is the best lower bound Z3 could prove for the
objective (for our problem, equal to the achieved value).
Where the numbers come from
Why those particular fractions? The blend-sum constraint lets us eliminate one variable. Substituting into the two nutrition floors collapses the problem from three unknowns to two:
Plus , , and (so that is non-negative). Five inequalities in two unknowns cut out a polygon in the plane. Every point inside that polygon is a feasible bowl, with determined by the blend-sum.
The cost, written in and alone:
Cost grows with both and , so cheaper bowls lie toward the lower-left of the plot. The dark-gray dashed line in the figure is the isocost contour through the optimum: every blend on that line costs exactly \$2.10. The optimum sits at the vertex where this contour first touches the polygon coming from outside.
Rice has no axis of its own in this plot. It is the slack , determined by the other two variables. The green blend-cap line is the rice-zero contour (everything on it has , so ), and rice grows as you move perpendicular to the blend cap toward the origin. The lighter dotted line in the figure is an iso-rice contour at . Our optimum has , so the contour passes just past the optimum on the blend-cap side.
Solving the 2×2 system at the binding vertex,
gives , , and (r = 1 - 7/25 - 9/50 = 27/50). That's the answer Z3 reported.
The optimum of a linear objective over a feasible polytope always sits at a vertex. A solver doesn't search the interior. It walks from vertex to vertex, looking for a lower cost, until no adjacent vertex improves. That algorithm is called simplex, and it is the Theory phase.
What if we used integers?
The dining hall scales ingredients by the gram, so integers seem
natural. Swap Real for Int and run the same problem:
from z3 import Int, Optimize
t = Int('t')
b = Int('b')
r = Int('r')
opt = Optimize()
opt.add(t >= 0, b >= 0, r >= 0)
opt.add(t + b + r == 1)
opt.add(170*t + 90*b + 30*r >= 80)
opt.add( 20*t + 240*b + 280*r >= 200)
cost = opt.minimize(3.00*t + 2.50*b + 1.50*r)
opt.check()
print(m[t], m[b], m[r]) # 0 1 0
print(opt.lower(cost)) # 5/2
Z3 returns sat. But look at the answer: , ,
. Pure beans, at per kilogram.
What happened? The constraint with over integers forces exactly one of them to be 1 and the others to be 0. The solver checked each option: pure tofu fails the carbs floor; pure rice fails the protein floor; pure beans satisfies both. Beans it is.
Pure beans is a correct answer, but to a different question. Our
LRA blend costs per kilogram. The integer "solution"
costs , a 19% penalty. Worse, the solver reported sat
silently, without any hint that the blend you actually wanted was
outside its search space.
Wrong theory, wrong question
You meant to ask about ingredient ratios: continuous
fractions of tofu, beans, and rice per kilogram of output.
Writing Int changed the question to "which single
ingredient is cheapest while hitting both floors?" The answer
to that question is pure beans. It is even feasible. It is
also not what you meant, and the solver has no way to tell
you that.
You reach for LRA when the quantities in your problem are genuinely continuous. Bulk mixing ratios, timing budgets in seconds, fluid volumes, currency amounts at arbitrary precision: LRA is the right theory. Rounding to integers changes the problem, sometimes quietly.
How it scales
Three ingredients is still a toy. Real LPs have hundreds or thousands of variables. Here is the same blending problem with the ingredient list extended. Each new ingredient cycles through the tofu, beans, and rice profiles, and prices rise with the index. Timings:
| Ingredients | Solver time |
|---|---|
| 3 | 0.0005s |
| 10 | 0.0008s |
| 100 | 0.0060s |
| 500 | 0.0572s |
Growth is mild. Linear programming is in P: there is a polynomial-time decision procedure, and practical LP solvers are fast on realistic inputs. LRA is the one theory today that sits in P. The other three are NP-complete.
What we take away
LRA is the solver you reach for when the problem is linear in continuous quantities and you care about feasibility or an optimum. The whole category falls under this one theory: resource allocation, scheduling with fractional times, mixture and blending problems, electrical and mechanical equilibrium, finance under simplified assumptions.
After the break, we look inside. The decision procedure is called simplex, and it walks the corners of the feasible polytope. The optimum we just saw was one of those corners: the unique vertex where both nutrition constraints bind at once.
LIA: Integer Constraints and Why They're Harder
The problem
A compiler considers a nested loop that might benefit from reordering. Reordering inner and outer loops can improve cache locality, enable vectorization, or expose parallelism. The legality question is always the same: do any two distinct iterations access the same memory address in a conflicting way?
Concretely:
for i in range(N):
for j in range(N):
A[3*i + 1] = f(A[5*j + 3])
Each iteration writes to address 3*i + 1 and reads from
address 5*j + 3. The write and read strides do not match, and
the offsets shift them past each other. The reorder is legal
iff no write-read pair hits the same address across distinct
iterations.
The dependence query
Ask the solver directly: does there exist with each index in and such that ?
Decision variables. Four indices, one per iteration coordinate:
Constraints. Iteration bounds, the write-read conflict equation, and distinctness:
If the system is satisfiable, a conflicting iteration pair exists and the reorder is unsafe. If it is unsatisfiable, no such pair exists and the reorder is legal.
The encoding at
from z3 import Int, Solver, Or
i = Int('i')
j = Int('j')
ip = Int('ip')
jp = Int('jp')
N = 4
s = Solver()
s.add(0 <= i, i < N)
s.add(0 <= j, j < N)
s.add(0 <= ip, ip < N)
s.add(0 <= jp, jp < N)
s.add(3*i + 1 == 5*jp + 3)
s.add(Or(i != ip, j != jp))
print(s.check()) # unsat
Z3 returns unsat. At N = 4 the reorder is legal. The
interesting question is why.
Where the integer solutions live
Rearrange the conflict equation: . This is a linear Diophantine equation, a linear equation we are asking to solve over the integers. The first question is whether any integer solution exists at all, ignoring the iteration bounds for a moment. The answer is yes. The greatest common divisor of the coefficients on the left side is , and divides the right-hand side , so integer solutions are possible. (If the GCD did not divide the right-hand side, no integer solution could exist.)
Once one integer solution exists, a whole family exists, evenly spaced along the solution line. For our equation, the family is
Enumerating a few values:
Integer solutions exist; they are just spread out.
The compiler's real question is not "do integer solutions exist anywhere?" but "do any integer solutions exist inside the iteration box ?" This is where things get harder.
The figure makes the sensitivity to concrete. The Diophantine line crosses the box, and over the reals it provides an infinite supply of solutions inside the box. None of those real solutions are integer points. The nearest integer solution, , sits on the boundary of the box and outside of it. Only when we enlarge the box to does the first integer solution fit inside.
At , no conflict exists in the integer-lattice version of the problem. Z3 proves this by real integer-programming work, not by a one-line arithmetic trick: it has to reason about the coefficients, the GCD, and the bound box together.
What if we used LRA?
Integer-constrained variables are harder to reason about than
real-valued ones. One tempting shortcut is to relax the
question to LRA (an LP relaxation) and hope the relaxed answer
agrees. Swap Int for Real and run the same system at
:
from z3 import Real, Solver, Or
i = Real('i')
j = Real('j')
ip = Real('ip')
jp = Real('jp')
# ... same constraints as before ...
s.add(3*i + 1 == 5*jp + 3)
s.add(Or(i != ip, j != jp))
print(s.check()) # sat
print(s.model())
# witness: i = 24/11, jp = 10/11
LRA returns sat. The line crosses the
box over the reals, and LRA picks a fractional point
on that line. The witness does
satisfy the conflict equation, and it does lie in
as a region of the real plane. It just does not
correspond to any actual iteration of the loop. Loops iterate
at integer points.
Wrong theory, wrong question
A compiler that trusted the LP relaxation would refuse to reorder this loop, citing a phantom dependence at that no real iteration could ever exhibit. Correct answer to the wrong question, paid for with a lost optimization.
This is the honest version of the LRA-vs-LIA tradeoff. LRA is decidable in polynomial time and often gives the answer you want. But when the answer hinges on which integer points are in your box, the LP relaxation cannot tell.
When conflicts do appear
The answer changes with one more iteration. Run the same encoding at :
# Same constraints, N = 5 instead of 4
s.check() # sat
print(s.model())
# witness: i = 4, j = 3, ip = 0, jp = 2
The integer solution now fits the box. Z3
returns sat and hands back a concrete iteration pair. At
the writing iteration touches address
. At a reading iteration
touches address . Same address, distinct
iterations, and the compiler must respect their order.
The answer flipped from unsat to sat just by making the
iteration range one larger. Nothing about the equation
changed; what changed is which lattice points fit the box.
This sensitivity to the iteration bound is exactly what makes
LIA harder than LRA. The decision procedure has to reason
about the integer lattice and the box together, not just the
feasibility of a continuous region.
What we take away
LIA is the solver you reach for when the problem is linear in integer quantities. Compiler loop analysis, scheduling with integer counts, integer-weighted assignment and packing: all land in this theory. Integer quantities are not a cosmetic detail. A real-valued relaxation can quietly change the question and return a phantom answer that does not correspond to any real iteration.
The cost is complexity. LIA is NP-complete. The decision procedure can no longer walk a convex polytope's corners directly, because the answer might sit between corners at an integer point that is not a vertex of the polytope. After the break, we look inside. The algorithm is called branch-and-bound, and it layers integer refinement on top of simplex.
BV: Bit-Level Semantics and Production Correctness
The problem
Binary search is simple. Lower bound, upper bound, midpoint, compare, recurse. It is the textbook algorithm every introductory course teaches and every standard library ships. And for nine years Java's standard library shipped a version of it that was wrong, because of one line of code that assumed mathematical integer semantics when the machine actually provides finite-width ones.
Josh Bloch, in a 2006 Google blog post titled "Extra, Extra: Read All About It: Nearly All Binary Searches and Mergesorts Are Broken," documented that the midpoint calculation overflows when the array is large enough for to exceed . The bug sat in the JDK for nine years.
Our L04 BV demo uses this same scenario. The underlying question is simple: is this midpoint calculation correct? The correct answer depends entirely on which theory of integer arithmetic you mean.
Bits wrap
Before the main event, a one-line reminder of how bit-level
arithmetic differs from the mathematical kind. Take an 8-bit
unsigned slot, a variable of type unsigned char in C, and
compute:
Mathematically . On an 8-bit unsigned slot the answer is . The nine-bit sum only has eight bits of storage, so the leading 1 is discarded and the stored result is . Z3's bitvector arithmetic behaves the same way as the hardware:
from z3 import BitVecVal, simplify
a = BitVecVal(200, 8)
b = BitVecVal(100, 8)
print(simplify(a + b)) # 44
Modular wraparound is the defining feature of the bitvector theory. It is also the source of every binary-search-style bug that tries to treat machine integers as mathematical ones.
The property we want to verify
Return to binary search. The midpoint should satisfy whenever . If that invariant ever fails, the algorithm can index outside the array.
Encode the property as a counterexample search: ask the solver
for any in the valid input range
where falls
outside . unsat means no
counterexample exists and the property holds; sat means the
model is a concrete bug witness.
What Int says
Try Int first. That is, pretend machine integers are
mathematical integers.
from z3 import Int, Solver, And, Not
W = 32 # 32-bit input range
lo = Int('lo')
hi = Int('hi')
s = Solver()
s.add(0 <= lo, lo <= hi, hi < 2**W)
mid = (lo + hi) / 2
s.add(Not(And(lo <= mid, mid <= hi)))
print(s.check()) # unsat
Int returns unsat. In mathematical integer arithmetic the
midpoint is always between its endpoints. That is a true fact
about mathematical integers.
Wrong theory, wrong question
Int answered the question "is the midpoint formula
correct for mathematical integers?" The actual binary
search runs on 32-bit machine integers, where
can wrap around to something
below . Int silently confirms a property
that the real code does not satisfy.
What BV says
Switch to BitVec for the right semantics. Index arithmetic on
real hardware is unsigned 32-bit, so the comparison uses
ULE (unsigned less-or-equal) and the division uses UDiv
(unsigned division):
from z3 import BitVec, Solver, And, Not, ULE, UDiv
lo = BitVec('lo', 32)
hi = BitVec('hi', 32)
s = Solver()
s.add(ULE(lo, hi))
mid = UDiv(lo + hi, 2)
s.add(Not(And(ULE(lo, mid), ULE(mid, hi))))
print(s.check()) # sat
print(s.model())
BV returns sat with a concrete witness. On the run, Z3 gives
us:
| Field | Value |
|---|---|
| mathematically | |
| at 32 bits | |
| computed |
The mathematical sum overflows 32 bits. The high bit is dropped. The computed midpoint is below the lower bound . A binary search using this midpoint on an array of or more elements can land anywhere outside . The JDK shipped this bug; Python and C standard libraries have shipped variants of it. BV catches it because bit-level semantics are what the problem is about.
The fix
Bloch's fix is
The subtraction stays within 32 bits whenever , the division by two shrinks it further, and adding the result back to gives a value inside regardless of how large the endpoints are. BV verifies this:
mid = lo + UDiv(hi - lo, 2)
s = Solver()
s.add(ULE(lo, hi))
s.add(Not(And(ULE(lo, mid), ULE(mid, hi))))
print(s.check()) # unsat
unsat now means for every in
the 32-bit unsigned range with ,
the fixed midpoint is between them. The fix is verified, not
just tested.
What we take away
BV is the solver you reach for whenever correctness depends on the actual bit-level behavior of machine integers. Overflow checks, shift and mask tricks, crypto masking, fixed-point arithmetic, hardware verification, compiler correctness: all live here. An integer-level analysis cannot see overflow; it cannot see a sign extension; it cannot see a shift past the word boundary.
The cost is complexity. BV is NP-complete. The decision procedure works by unrolling every bitvector variable into a vector of booleans and every operator into a boolean circuit. A 32-bit becomes a 32-full-adder ripple-carry circuit; the whole property becomes a CNF formula that the SAT engine from Week 2 can attack. Theory phase shows how bit-blasting works in detail, using the same CDCL machinery we already have.
Arrays: Immutable Maps, Mutable Models
The problem
When do two sequences of stores to an index-to-value map produce the same final map? This is not a question about mutation. The theory of arrays has no mutation. The question is whether two array values, each a total function from indices to values, agree at every index.
This is the theory verification tools use to model mutable state. A heap snapshot is an array value. A sequence of memory writes becomes a sequence of array values, each a slightly-different snapshot. The engineer models evolving state as a walk through snapshots; the theory reasons about the snapshots themselves.
Arrays in Z3: select and store
The theory has two operations. Select(a, i) reads the value
at index i of array a. Store(a, i, v) returns a new
array value (a completely separate function) that agrees
with a everywhere except at index i, where it takes the
value v. The original array a is unchanged and still
there. Stores are functional, not in-place.
from z3 import Array, IntSort, Store, Select, simplify
a = Array('a', IntSort(), IntSort())
# Build a new array value by storing 4 at index 3
a1 = Store(a, 3, 4)
print(simplify(Select(a1, 3))) # 4
print(simplify(Select(a1, 5))) # a[5]
Two reads, two outcomes. Reading at index 3 returns the value
we stored there. Reading at index 5, a different index,
returns a[5]: a symbolic read into the original array a,
untouched by the store. Z3 applied the read-over-write
axiom to simplify each lookup without any manual hints.
The same axiom, stated once and for all, is
One equation. That, plus the fact that store produces a new
array value rather than modifying the old one, is the whole
theory. Everything else we ask the solver reduces to applying
that equation.
You can think of the array as a function from indices to
values, and store as returning a slightly-different function.
The transcript framing from one prior offering:
"arrays are just the theory of functions in disguise."
What the theory does not give you
Two things the theory deliberately omits. Students coming from Python or C tend to assume they are there; they are not.
No length. Array(name, I, V) is a total function from
every value of the index sort I to a value of the value sort
V. Select(a, -1), Select(a, 10**100), and Select(a, 0)
are all valid expressions, and each returns some value of type
V. The theory does not know about array sizes, out-of-range
indices, or buffer overflows. If your problem is about a
10-slot buffer, declare the array as usual and add constraints
that every index you read or store at lies in [0, 10). The
solver will reason under those constraints. Outside them, the
array is still defined; you just are not asking about it.
No mutation. Every store produces a new array value; the
old one is unchanged and still there. There is no "mutating
the array" at the theory level. If you want to model a store
that changes over time, represent each point in time as a
separate array variable. A sequence of five writes to a heap
becomes five distinct array variables, each a snapshot of the
heap after one more write. The engineer writes the walk; the
theory reasons about the snapshots.
When two stores commute
Two store operations commute when applying them in either order produces the same array value. Ask Z3 by searching for an index where the two resulting arrays disagree:
x = Array('x', IntSort(), IntSort())
# Order A: store 10 at index 3, then 20 at index 5
a1 = Store(x, 3, 10)
a2 = Store(a1, 5, 20)
# Order B: same two stores, reverse order
b1 = Store(x, 5, 20)
b2 = Store(b1, 3, 10)
# Find any index where the two arrays disagree
i = Int('i')
s = Solver()
s.add(Select(a2, i) != Select(b2, i))
print(s.check()) # unsat
unsat. No index distinguishes a2 from b2. The two stores
target different indices, and neither depends on the other's
result, so both orderings build the same array value.
Verification pattern carries over from earlier theories: we
asked for a counterexample (an index where the arrays
disagree) and got unsat, so the property "the arrays agree
everywhere" holds for all indices.
When two stores collide
Change one index so both stores target the same slot:
x = Array('x', IntSort(), IntSort())
# Order A: store 10 at 3, then 99 at 3
a1 = Store(x, 3, 10)
a2 = Store(a1, 3, 99)
# Order B: store 99 at 3, then 10 at 3
b1 = Store(x, 3, 99)
b2 = Store(b1, 3, 10)
i = Int('i')
s = Solver()
s.add(Select(a2, i) != Select(b2, i))
print(s.check()) # sat
print(s.model())
# witness i = 3
# a[3] = 99, b[3] = 10
sat, and Z3 hands us a concrete disagreement: index 3. In
order A the final array value reads 99 at index 3; in order B
it reads 10. Both sequences apply the same two stores at the
same index; the second store always wins, so the final arrays
differ whenever the second stored values differ.
Why this matters for engineering
Compilers reordering memory writes, databases serializing transactions, filesystems reordering disk writes, merge algorithms combining concurrent edits: all of these systems model mutable state using arrays, and all of them ask the same theory question. Do these two sequences of stores produce the same final array value? The theory gives the exact condition, decidable by the solver in a handful of axiom applications.
What we take away
Arrays is the theory you reach for when you want to reason about indexed lookup tables: total functions from an index sort to a value sort. If you are modeling mutable state, the theory gives you the values, and you are responsible for writing the walk through them. Memory in a verifier, heap in a model checker, records in a database, cells in a filesystem: everything in that list is an engineering application that represents state as a sequence of array values, not as one mutable array.
The alternative would be to bit-blast the whole store into one
large BitVec with explicit offset arithmetic. That is
technically correct and practically impossible. A heap with
addressable bytes becomes a 4-gigabyte boolean
formula. The theory of arrays exists so we never have to write
that formula.
After the break, Theory phase pulls back the curtain on how the array solver actually works. The answer has a surprise in it, and the surprise is good news: we have already built most of the machinery it needs.
Four Problems, Four Theories
Four engineering problems, four theories, four distinct ways the wrong choice would have gone wrong.
Cafeteria blending wanted LRA because the quantities were
real-valued ratios; Int collapsed the problem to whole
ingredients and reported no feasible mix. Compiler loop
reordering wanted LIA because iteration spaces are integer
lattices; the LP relaxation invented phantom dependences at
fractional coordinates. Binary-search correctness wanted BV
because machine integers wrap; integer-level reasoning silently
proved a real bug away. Memory-update equivalence wanted arrays
because select and store already capture the right
intuition; bit-blasting memory into one enormous boolean
formula would have been correct in principle and unusable in
practice.
Picking the theory is itself a reduction decision. Whatever question your problem is actually asking, your encoding either asks the solver that question or asks it a different one.
After the break, we look inside the four solvers and see how each answers the question its theory is built for.
Demo Code
All demo files for this phase are in the
course code repo.
Clone the repo and run them locally with python3 01-lra-blending.py
etc. (requires z3-solver; see Setup).