Congruence Closure: Deciding Equality

How does a solver reason about equality without knowing what the functions actually do?

In Practice you saw Z3 decide this formula in a few milliseconds:

f^{3} (a) = a \land f^{5} (a) = a \land f (a) \neq a

The function $f$ is completely uninterpreted. Z3 knows nothing about it except that it maps elements of some sort to elements of the same sort. And yet Z3 confidently reports unsat. How?

It did not try every possible interpretation of $f$ . Uncountably many of those exist. It reasoned about the structure of equality itself. The algorithm behind that reasoning is congruence closure.

Before we look at the algorithm, try the formula by hand. Is $f^{3} (a) = a \land f^{5} (a) = a \land f (a) \neq a$ satisfiable? If you say yes, give an $f$ and an $a$ that work. If you say no, explain why no $f$ can work.

Take a minute. It is harder than it looks.

What is a theory?

In Practice you saw Z3 handle two problems with rich structure. Sudoku-SMT declared Int variables and asked for them to be Distinct. sq/sqabs declared an uninterpreted function umul and added one axiom about it. In both cases, Z3 was reasoning about symbols that had specific meanings.

That is what a theory is: a signature (a collection of symbols) plus axioms (rules about what those symbols mean).

The theory of integer arithmetic fixes the meaning of $+$ , $-$ , $<$ , and the integer constants. Its axioms are the usual rules of integer arithmetic.
The theory of equality fixes the meaning of $=$ . Its axioms say that equality is reflexive, symmetric, and so on. We will look at them in a moment.

When Z3 sees $x + y = 5$ , the $+$ is interpreted by the integer arithmetic theory. When it sees $f (x) = f (y)$ , the $f$ is uninterpreted: the solver knows nothing about it except that it is a function. An uninterpreted function is a function whose behavior is constrained only by the equalities you explicitly assert about it, plus whatever axioms the theory of equality imposes on all functions.

Inside an SMT solver, each theory has its own theory solver with a simple contract: given a conjunction of literals from my theory, tell me whether they are satisfiable. That is the solver interface sketched at the end of Practice. The rest of this page zooms in on one theory solver: the one that decides equality and uninterpreted functions. The theory is called EUF, short for equality with uninterpreted functions.

The axioms of equality

The theory of equality has four axioms. The first three are the properties of any equivalence relation: reflexive, symmetric, and transitive. The fourth is function congruence: applying a function to equal arguments gives equal results.

Reflexivity. Every term equals itself.

\forall x . x = x

Symmetry. If $x$ equals $y$ , then $y$ equals $x$ .

\forall x, y . x = y ⟹ y = x

Transitivity. If $x$ equals $y$ and $y$ equals $z$ , then $x$ equals $z$ .

\forall x, y, z . x = y \land y = z ⟹ x = z

Together these three make $=$ an equivalence relation. They are the rules any reasonable notion of "the same thing" has to satisfy.

The same three facts can also be written as inference rules, with premises above a horizontal bar and the conclusion below. You read each rule as "if everything above the bar holds, then what is below the bar holds."

\frac{}{x = x} (Refl) \frac{x = y}{y = x} (Sym) \frac{x = y y = z}{x = z} (Trans)

Reflexivity has no premises, so the space above the bar is empty: the conclusion holds outright. Symmetry has one premise. Transitivity has two.

Function congruence. If $f$ is a function and its arguments are pairwise equal, then the results are equal.

\forall \bar{x}, \bar{y} . ⋀_{i} x_{i} = y_{i} ⟹ f (\bar{x}) = f (\bar{y})

Or as an inference rule:

\frac{x_{1} = y_{1} \dots x_{n} = y_{n}}{f (x_{1}, \dots, x_{n}) = f (y_{1}, \dots, y_{n})} (Cong)

Adding this fourth axiom promotes the equivalence relation into a congruence relation. The name captures the idea: congruent inputs always produce congruent outputs.

Congruence goes forward, not backward

Function congruence says if the inputs are equal, then the outputs are equal. It does not say if the outputs are equal, then the inputs are equal. That reverse property has a name: injectivity. Congruence flows forward through function applications, not backward.

All functions satisfy congruence. That is part of what it means to be a function. Some functions are also injective, but not all.

To see why, consider squaring. The function $sq (x) = x \cdot x$ takes an integer to its square. It is a perfectly good function. But $sq (- 3) = sq (3) = 9$ , even though $- 3 \neq 3$ . Equal outputs do not let us conclude the inputs were equal.

Or in software terms: hash collisions. If hash("alice") == hash("bob"), that does not mean "alice" == "bob". Two different strings can hash to the same value. Hash functions are congruent (they are functions, after all) but they are not injective.

This asymmetry is fiddly, and easy to get confused about when thinking about equality, functions, and congruence together. It is worth keeping straight because it shows up directly in the second worked example below: the solver happily reports $f (x) = f (y) \land x \neq y$ satisfiable, because congruence gives it no way to force $x = y$ .

These four axioms are all we need. The rest of this page is about how to decide whether a conjunction of equalities and disequalities is consistent with them. That is the job of congruence closure.

Terms and subterms

In SAT, formulas are built from boolean variables combined with operators like $\land$ , $\lor$ , and $\neg$ . Inside a theory like equality, the basic unit is no longer a boolean variable but a term: a variable, a constant, or a function symbol applied to other terms.

t : : = x ∣ c ∣ f (t_{1}, \dots, t_{n})

Terms are a way to name values. The term $a$ names some value. The term $f (a)$ names the value you get by applying $f$ to whatever $a$ names. A term by itself is not a claim about anything. It is just a handle on a value.

Truth claims come from predicates: symbols that take terms and produce a truth value. The theory of equality has a single predicate, $=$ , with its negation $\neq$ . So $f (x) = f (y)$ is a truth claim that $f (x)$ and $f (y)$ name the same value, and $f (a) \neq a$ is a truth claim that $f (a)$ and $a$ name different values. Congruence closure decides whether a given collection of such claims can all hold at once.

Other theories bring their own predicates. Integer arithmetic adds $<$ , $\leq$ , $>$ , and $\geq$ . Equality has a fifth axiom, predicate congruence, that says a predicate applied to equal arguments gives the same truth value, mirroring function congruence. EUF has only $=$ , so we never need the rule in this lecture, but it matters once there are other predicates around. That is one of the reasons predicate congruence reappears in Lecture 5.

A subterm of a term is any term appearing inside it, including the term itself. It helps to draw terms as trees. Take $f (f (f (a)))$ . Its tree has three $f$ nodes stacked above the leaf $a$ :

graph TD
    accTitle: AST for the term f(f(f(a)))
    accDescr: Four nodes in a chain. Three f nodes stacked vertically, with a at the bottom.
    n3["f"] --> n2["f"]
    n2 --> n1["f"]
    n1 --> n0["a"]

Every subtree of the term tree is a subterm. The subtree at the leaf is $a$ . The subtree one level up is $f (a)$ . One more is $f (f (a))$ . The whole tree is $f (f (f (a)))$ . Four subtrees, four subterms:

$a$
$f (a)$
$f (f (a))$
$f (f (f (a)))$

The rule generalizes to functions with more than one argument. Take $g (a, h (b))$ . Its root is a $g$ node with two children: the leaf $a$ on the left, and an $h$ node whose only child is the leaf $b$ :

graph TD
    accTitle: AST for the term g(a, h(b))
    accDescr: Root g node with two children. Left child is a. Right child is an h node with a single child b.
    g["g"] --> ga["a"]
    g --> gh["h"]
    gh --> ghb["b"]

Subterms are subtrees, same rule. Four of them:

$a$
$b$
$h (b)$
$g (a, h (b))$

Writing nested function applications out in full gets unwieldy fast. From here on we use a superscript shorthand: $f^{n} (t)$ means $f$ applied $n$ times to $t$ . For example:

$f^{3} (a)$ is $f (f (f (a)))$
$f^{5} (a)$ is $f (f (f (f (f (a)))))$
$f^{0} (a)$ is just $a$

The four subterms of $f^{3} (a)$ are:

$a$
$f (a)$
$f^{2} (a)$
$f^{3} (a)$

Congruence closure

The algorithm organizes terms into congruence classes: groups of terms it has decided must all name the same value. Initially every term is in its own singleton class. Equalities cause classes to merge. Each merge can trigger further merges through function congruence: if $x$ and $y$ end up in the same class, then $f (x)$ and $f (y)$ must be too, even if we never asserted that directly. One equality can ripple through the class structure this way. Disequalities are the final check: if both sides of a disequality ended up in the same class, the formula is unsatisfiable; otherwise it is satisfiable.

The three-step recipe

Given a conjunction of equalities and disequalities over terms, congruence closure does three things:

Place each subterm into its own congruence class.
Merge classes as required by each equality, propagating the resulting congruences.
Check each disequality. If both sides landed in the same class, report unsat. Otherwise report sat.

The first step is bookkeeping. The third step is a lookup. All the interesting work is in step 2, where merging one pair of terms can force further merges through function congruence. That is the propagation step, and it is where the algorithm earns its name.

Walkthrough: f³(a) = a ∧ f⁵(a) = a ∧ f(a) ≠ a

Back to the opening challenge. We will trace congruence closure by hand on the formula

f^{3} (a) = a \land f^{5} (a) = a \land f (a) \neq a .

Before running the algorithm, we collect every subterm that appears in the formula. The formula has four top-level terms: $a$ , $f (a)$ , $f^{3} (a)$ , and $f^{5} (a)$ . Taking all their subterms gives us six:

$a$
$f (a)$
$f^{2} (a)$
$f^{3} (a)$
$f^{4} (a)$
$f^{5} (a)$

$f^{2} (a)$ and $f^{4} (a)$ never appear as top-level terms but show up as subterms inside $f^{3} (a)$ and $f^{5} (a)$ .

Step 1 places each of the six into its own singleton class:

{a} {f (a)} {f^{2} (a)} {f^{3} (a)} {f^{4} (a)} {f^{5} (a)}

graph LR
    accTitle: Initial state — six singleton classes
    accDescr: The six subterms a, f(a), f squared of a, f cubed of a, f to the fourth of a, and f to the fifth of a each sit alone in their own congruence class.
    subgraph c1 [ ]
        n0["a"]
    end
    subgraph c2 [ ]
        n1["f(a)"]
    end
    subgraph c3 [ ]
        n2["f²(a)"]
    end
    subgraph c4 [ ]
        n3["f³(a)"]
    end
    subgraph c5 [ ]
        n4["f⁴(a)"]
    end
    subgraph c6 [ ]
        n5["f⁵(a)"]
    end
    n0 ~~~ n1 ~~~ n2 ~~~ n3 ~~~ n4 ~~~ n5
    classDef cc fill:#fffde7,stroke:#f57c00
    class c1,c2,c3,c4,c5,c6 cc

Step 2 processes the two equalities, one at a time.

Process $f^{3} (a) = a$ . Merge the classes containing $f^{3} (a)$ and $a$ . They were both singletons, so this is a plain two-class merge:

{a, f^{3} (a)} {f (a)} {f^{2} (a)} {f^{4} (a)} {f^{5} (a)}

graph LR
    accTitle: After merging f cubed of a with a
    accDescr: a and f cubed of a are now in the same congruence class. The other four classes are unchanged.
    subgraph c1 [ ]
        n0["a"]
        n3["f³(a)"]
    end
    subgraph c2 [ ]
        n1["f(a)"]
    end
    subgraph c3 [ ]
        n2["f²(a)"]
    end
    subgraph c5 [ ]
        n4["f⁴(a)"]
    end
    subgraph c6 [ ]
        n5["f⁵(a)"]
    end
    n0 ~~~ n1 ~~~ n2 ~~~ n4 ~~~ n5
    classDef cc fill:#fffde7,stroke:#f57c00
    class c1,c2,c3,c5,c6 cc

Now the propagation check. Because $a$ and $f^{3} (a)$ are in the same class, function congruence forces $f (a)$ and $f (f^{3} (a)) = f^{4} (a)$ into the same class too. Merge them:

{a, f^{3} (a)} {f (a), f^{4} (a)} {f^{2} (a)} {f^{5} (a)}

graph LR
    accTitle: After propagating to f of a and f to the fourth of a
    accDescr: f of a and f to the fourth of a are now in the same class by function congruence.
    subgraph c1 [ ]
        n0["a"]
        n3["f³(a)"]
    end
    subgraph c2 [ ]
        n1["f(a)"]
        n4["f⁴(a)"]
    end
    subgraph c3 [ ]
        n2["f²(a)"]
    end
    subgraph c6 [ ]
        n5["f⁵(a)"]
    end
    n0 ~~~ n1 ~~~ n2 ~~~ n5
    classDef cc fill:#fffde7,stroke:#f57c00
    class c1,c2,c3,c6 cc

That triggers another merge. $f (a)$ and $f^{4} (a)$ are now in the same class, so by the same rule, $f (f (a)) = f^{2} (a)$ and $f (f^{4} (a)) = f^{5} (a)$ must be too. Merge:

{a, f^{3} (a)} {f (a), f^{4} (a)} {f^{2} (a), f^{5} (a)}

graph LR
    accTitle: After propagating to f squared of a and f to the fifth of a
    accDescr: f squared of a and f to the fifth of a are now in the same class. The six subterms sit in three pairs.
    subgraph c1 [ ]
        n0["a"]
        n3["f³(a)"]
    end
    subgraph c2 [ ]
        n1["f(a)"]
        n4["f⁴(a)"]
    end
    subgraph c3 [ ]
        n2["f²(a)"]
        n5["f⁵(a)"]
    end
    n0 ~~~ n1 ~~~ n2
    classDef cc fill:#fffde7,stroke:#f57c00
    class c1,c2,c3 cc

Pause. Three neat pairs. That is the state after processing $f^{3} (a) = a$ together with all its propagation consequences. One asserted equality created three merges: the direct one, and two that rippled up through function congruence.

Process $f^{5} (a) = a$ . Now $f^{5} (a)$ is in the class ${f^{2} (a), f^{5} (a)}$ , and $a$ is in the class ${a, f^{3} (a)}$ . Merging them combines those two classes:

{a, f^{2} (a), f^{3} (a), f^{5} (a)} {f (a), f^{4} (a)}

graph LR
    accTitle: After merging f to the fifth of a with a
    accDescr: The class containing f squared of a and f to the fifth of a merges with the class containing a and f cubed of a, forming a four-element class.
    subgraph c1 [ ]
        n0["a"]
        n2["f²(a)"]
        n3["f³(a)"]
        n5["f⁵(a)"]
    end
    subgraph c2 [ ]
        n1["f(a)"]
        n4["f⁴(a)"]
    end
    n0 ~~~ n1
    classDef cc fill:#fffde7,stroke:#f57c00
    class c1,c2 cc

Propagation again. This time the reasoning chains through several steps:

After the merge we just did, $a$ and $f^{2} (a)$ are in the same class.
Function congruence says f(a) and f(f2(a))=f3(a) must also be in the same class. But right now they are not:
- $f^{3} (a)$ is in the big class ${a, f^{2} (a), f^{3} (a), f^{5} (a)}$ .
- $f (a)$ is in the other class ${f (a), f^{4} (a)}$ .
Merging $f (a)$ with $f^{3} (a)$ therefore merges those two classes entirely.

Everything collapses into one class:

{a, f (a), f^{2} (a), f^{3} (a), f^{4} (a), f^{5} (a)}

graph LR
    accTitle: Final state — one collapsed class
    accDescr: All six subterms a, f of a, f squared of a, f cubed of a, f to the fourth of a, and f to the fifth of a are now in the same congruence class.
    subgraph c1 [ ]
        n0["a"]
        n1["f(a)"]
        n2["f²(a)"]
        n3["f³(a)"]
        n4["f⁴(a)"]
        n5["f⁵(a)"]
    end
    classDef cc fill:#fffde7,stroke:#f57c00
    class c1 cc

Every term is now congruent to every other term.

Step 3. Check the disequality $f (a) \neq a$ . But $f (a)$ and $a$ are in the same class. Contradiction. Report unsat.

Two equalities were enough. Function congruence turned them into five merges in total, which collapsed the six subterms into a single class. The disequality $f (a) \neq a$ cannot hold in that class.

Contrast: f(x) = f(y) ∧ x ≠ y

A much smaller example, different in shape, that lands the congruence-goes-forward-not-backward point from the axioms section. Consider the formula:

f (x) = f (y) \land x \neq y

The subterms are $x$ , $y$ , $f (x)$ , and $f (y)$ . Each starts in its own singleton class:

{x} {y} {f (x)} {f (y)}

graph LR
    accTitle: Initial state — four singleton classes
    accDescr: x, y, f(x), and f(y) each sit alone in their own congruence class.
    subgraph c1 [ ]
        nx["x"]
    end
    subgraph c2 [ ]
        ny["y"]
    end
    subgraph c3 [ ]
        nfx["f(x)"]
    end
    subgraph c4 [ ]
        nfy["f(y)"]
    end
    nx ~~~ ny ~~~ nfx ~~~ nfy
    classDef cc fill:#fffde7,stroke:#f57c00
    class c1,c2,c3,c4 cc

Process $f (x) = f (y)$ . Merge their two singleton classes:

{x} {y} {f (x), f (y)}

graph LR
    accTitle: After merging f(x) with f(y)
    accDescr: f(x) and f(y) are in the same class. x and y are each in their own class.
    subgraph c1 [ ]
        nx["x"]
    end
    subgraph c2 [ ]
        ny["y"]
    end
    subgraph c3 [ ]
        nfx["f(x)"]
        nfy["f(y)"]
    end
    nx ~~~ ny ~~~ nfx
    classDef cc fill:#fffde7,stroke:#f57c00
    class c1,c2,c3 cc

No propagation this time. Propagation fires when the merged terms have parents in the formula: other function applications with the merged terms as arguments. Here, $f (x)$ and $f (y)$ have no parents. Nothing in the formula contains them as an argument. The merge stops cold.

Check $x \neq y$ . The terms $x$ and $y$ are each in their own singleton class. Different classes. sat.

This is the congruence-goes-forward point made concrete. Merging the function applications $f (x)$ and $f (y)$ did not force the arguments $x$ and $y$ to merge. Function congruence says equal inputs give equal outputs. It does not say equal outputs force equal inputs. The solver happily reports $f (x) = f (y) \land x \neq y$ satisfiable.

We can see the congruence axiom operating in the model. If we ask the solver for actual values, it assigns each class a distinct integer:

x    ↦ 0
y    ↦ 2
f(x) ↦ 1
f(y) ↦ 1

The terms $f (x)$ and $f (y)$ get the same integer because they are in the same class. The terms $x$ and $y$ get different integers because nothing has told the solver they must be equal. Same-class inputs producing the same output value is exactly the congruence axiom, now visible in the model.

Under the hood

We have seen what congruence closure does: organize terms into classes, merge them, check disequalities. The question now is how merge propagation is efficient. When a merge happens, we do not rescan every term in the formula looking for new congruences. Each representative tracks its own parents in a field called ccpar, and we only look at pairs drawn from those sets. That one field is the whole trick.

The data structure

The algorithm operates on a directed acyclic graph of terms. Each subterm is a node in the DAG. Structurally identical subterms are the same node, not two copies: if $f (a, b)$ appears twice in the formula, both occurrences resolve to one node. This structural sharing matters. Without it, "the parent terms of this class" would not be a well-defined set; with it, every term has exactly one entry in the DAG and the parents are easy to enumerate.

Each node stores four things:

fn — the function symbol at the top of the term, or the constant name if it is a leaf
args — the list of argument nodes, empty for leaves
find — a pointer to this node's representative in the union-find structure (self if the node is its own representative)
ccpar — the set of nodes that have a member of this class as an argument, only meaningful when the node is a representative

In Python:

class Node:
    def __init__(self, fn, args):
        self.fn = fn
        self.args = args
        self.find = self       # self-loop means "I am the representative"
        self.ccpar = set()     # valid only at representatives

The find field is a self-loop at creation time: every new node is the representative of its own singleton class. As the algorithm merges classes, non-representative nodes get their find fields pointed elsewhere, and representative nodes accumulate ccpar entries. The moving parts of the algorithm are entirely in these two mutable fields.

The three procedures

The algorithm is three short procedures. Together they are under 25 lines of Python.

find

def find(n):
    if n.find is n:            # n is its own representative
        return n
    n.find = find(n.find)      # path compression: cache the representative on n
    return n.find

find reads like the mathematical definition of "the representative of n." If n points at itself, n is its own representative. Otherwise the representative is n's parent's representative, obtained by recursing on n.find.

The assignment n.find = find(n.find) is path compression: before returning, we cache the representative on n directly. Next time someone asks find(n), the chain is one hop long. Over many calls, whole classes collapse into flat stars, and find becomes nearly free.

congruent

def congruent(n1, n2):
    return (
        n1.fn == n2.fn                and  # same function symbol
        len(n1.args) > 0              and  # both are function applications
        len(n1.args) == len(n2.args)  and  # same arity
        all(find(a) is find(b)             # arguments pairwise in the same class
            for a, b in zip(n1.args, n2.args))
    )

congruent decides whether two nodes are congruent in the current state. Two nodes are congruent when they apply the same function symbol, have the same arity, and all their arguments are pairwise in the same equivalence class (their finds agree). Leaves never trigger this check because of the len(n1.args) > 0 guard: they can only become equivalent through explicit assertions.

merge

def merge(n1, n2):
    r1 = find(n1)
    r2 = find(n2)
    if r1 is r2:               # already merged, nothing to do
        return

    p1 = set(r1.ccpar)         # snapshot parent sets BEFORE rewiring
    p2 = set(r2.ccpar)

    r1.find = r2               # r1 joins r2's class
    r2.ccpar = p1 | p2         # r2 absorbs both parent sets
    r1.ccpar = set()           # r1 is no longer a representative

    for t1 in p1:              # check every pair of parents, one from each side
        for t2 in p2:
            if find(t1) is not find(t2) and congruent(t1, t2):
                merge(t1, t2)  # newly congruent: cascade the merge

merge is the heart of the algorithm. Three things happen:

Find the two representatives. If the nodes are already in the same class (same representative), return immediately. There is nothing to do.
Snapshot the two classes' ccpar sets, then rewire: r1 joins r2's class, r2 absorbs both parent sets, and r1's own ccpar is cleared since it is no longer a representative.
Check every pair of parents with one from each original class. For any pair that has become congruent after the rewire, recursively call merge on them.

The snapshot in step 2 is the subtle bit. The rewire overwrites ccpar on both representatives, so if we read the parent sets after the rewire we cannot tell which side they came from. We need the pre-rewire partition to know which pairs to check. Snapshot first, rewire second.

The recursive call in step 3 is what produces propagation cascades. In the f³(a) walkthrough above, every "same rule, merge again" beat was one of these recursive calls firing.

The other standard union-find optimization is union by rank: when rewiring, attach the shorter tree under the taller one so the resulting tree stays shallow. Applied to the rewire step in merge and combined with path compression in find, it gives amortized near-constant find and makes congruence closure overall $O (n \log n)$ . We skip it here for clarity. Correctness does not need it.

Walkthrough: f(a, b) = a ∧ f(f(a, b), b) ≠ a

A second worked example, this time traced through the data structure itself. Consider the formula:

f (a, b) = a \land f (f (a, b), b) \neq a

The four subterms are $a$ , $b$ , $f (a, b)$ , and $f (f (a, b), b)$ . We number them bottom up so the simplest term gets the smallest ID: $a$ is node 1, $b$ is node 2, $f (a, b)$ is node 3, and $f (f (a, b), b)$ is node 4. Because $b$ is an argument of both node 3 and node 4, structural sharing makes it a single node 2 with two incoming edges.

Initial state. The DAG is built once, before the algorithm runs. Every node is its own representative. Each function application registers itself in the ccpar set of each of its arguments' representatives: node 1 (a) is used only by node 3, so ccpar[1] = {3}; node 2 (b) is used by both node 3 and node 4, so ccpar[2] = {3, 4}; node 3 is used only by node 4, so ccpar[3] = {4}; node 4 has no parents, so ccpar[4] = {}.

graph TD
    accTitle: Initial DAG for f(a, b) = a and f(f(a, b), b) != a
    accDescr: Four nodes, all representatives. Node 4 is an f application with find 4 and empty ccpar, with children node 3 and node 2. Node 3 is an f application with find 3 and ccpar containing 4, with children node 1 and node 2. Node 1 is the leaf a with find 1 and ccpar containing 3. Node 2 is the leaf b with find 2 and ccpar containing 3 and 4.
    n4["4: f
find = 4
ccpar = {}"] --> n3["3: f
find = 3
ccpar = {4}"]
    n4 --> n2["2: b
find = 2
ccpar = {3, 4}"]
    n3 --> n1["1: a
find = 1
ccpar = {3}"]
    n3 --> n2
    classDef rep fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px
    classDef follower fill:#f3e5f5,stroke:#7b1fa2,stroke-width:1px
    class n1,n2,n3,n4 rep
    linkStyle 0,1,2,3 stroke-width:2.5px

node	fn	args	find	ccpar
1	`a`	—	1	{3}
2	`b`	—	2	{3, 4}
3	`f`	[1, 2]	3	{4}
4	`f`	[3, 2]	4	{}

Process $f (a, b) = a$ : merge(3, 1).

Node 3 and node 1 are both representatives of their own singleton classes. Snapshot the parent sets: p1 = {4} from node 3 and p2 = {3} from node 1. Then rewire: node 3 joins node 1's class, and node 1 absorbs both parent sets.

graph TD
    accTitle: DAG state after merging f(a, b) with a
    accDescr: Node 3 now has find pointing at node 1, shown as a dashed follower arrow. Node 1's ccpar has grown to contain 3 and 4. Node 3's ccpar is empty. Nodes 1, 2, and 4 remain representatives with thick borders; node 3 has a thin border indicating it is a follower.
    n4["4: f
find = 4
ccpar = {}"] --> n3["3: f
find = 1
ccpar = {}"]
    n4 --> n2["2: b
find = 2
ccpar = {3, 4}"]
    n3 --> n1["1: a
find = 1
ccpar = {3, 4}"]
    n3 --> n2
    n3 -.-> n1
    classDef rep fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px
    classDef follower fill:#f3e5f5,stroke:#7b1fa2,stroke-width:1px
    class n1,n2,n4 rep
    class n3 follower
    linkStyle 0,1,2,3 stroke-width:2.5px
    linkStyle 4 stroke:#7b1fa2

node	fn	args	find	ccpar
1	`a`	—	1	{3, 4}
2	`b`	—	2	{3, 4}
3	`f`	[1, 2]	1	{}
4	`f`	[3, 2]	4	{}

Now the propagation check. The only pair in p1 × p2 = {4} × {3} is (4, 3). Are nodes 4 and 3 congruent? Both are applications of f with two arguments, so function symbol and arity line up. At argument position 0, node 4's child is node 3 and node 3's child is node 1; find(3) = find(1) = 1, so they are in the same class. At position 1, both nodes have node 2 as their child, trivially in the same class as itself. Congruent. Cascade: merge(4, 3).

Cascade: merge(4, 3).

Node 4 is its own representative, and node 3's representative is now node 1. Different classes, so we merge. Snapshot: p1 = {} since node 4 has no parents in the formula, and p2 = {3, 4} from node 1. Rewire: node 4 joins node 1's class, and the combined ccpar {} ∪ {3, 4} is the same set node 1 already had.

graph TD
    accTitle: DAG state after the propagation merges node 4 with node 3's class
    accDescr: Nodes 3 and 4 now have find pointing at node 1, shown as dashed follower arrows. Nodes 1 and 2 remain representatives with thick borders; nodes 3 and 4 have thin borders indicating they are followers. Node 1's ccpar contains 3 and 4.
    n4["4: f
find = 1
ccpar = {}"] --> n3["3: f
find = 1
ccpar = {}"]
    n4 --> n2["2: b
find = 2
ccpar = {3, 4}"]
    n3 --> n1["1: a
find = 1
ccpar = {3, 4}"]
    n3 --> n2
    n3 -.-> n1
    n4 -.-> n1
    classDef rep fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px
    classDef follower fill:#f3e5f5,stroke:#7b1fa2,stroke-width:1px
    class n1,n2 rep
    class n3,n4 follower
    linkStyle 0,1,2,3 stroke-width:2.5px
    linkStyle 4,5 stroke:#7b1fa2

node	fn	args	find	ccpar
1	`a`	—	1	{3, 4}
2	`b`	—	2	{3, 4}
3	`f`	[1, 2]	1	{}
4	`f`	[3, 2]	1	{}

The propagation loop runs over {} × {3, 4}, which is empty. Nothing left to check. Return.

Check $f (f (a, b), b) \neq a$ .

Is find(4) = find(1)? Both are 1. The disequality is violated, and the formula is reported unsat.

One equality was enough. Function congruence turned it into two merges, which collapsed three of the four nodes into one class. The disequality $f (f (a, b), b) \neq a$ cannot hold in that class.

Why this matters

The congruence closure algorithm we just traced is the core theory inside modern SMT solvers. Every theory solver (arithmetic, bitvectors, arrays) eventually needs to reason about equality between terms, and when it does, it hands the work off to a congruence closure engine. Learning this one algorithm gives you a window into how Z3 and its cousins actually work inside.

The complexity is polynomial. With path compression on find and union by rank on the rewire step, the algorithm runs in $O (n \log n)$ on a formula with $n$ subterms. In practice this is fast enough that congruence closure is never the bottleneck. When SMT solvers are slow, it is almost always on other theories, not on equality reasoning.

Remember the $sq (y) = sqabs (y)$ proof from Practice? The version where we replaced real multiplication with an uninterpreted function and one axiom, and Z3 decided it in milliseconds? When Z3 processed our axiom $umul (y, y) = umul (- y, - y)$ alongside the negated property, congruence closure is what produced the contradiction. The same find, congruent, merge loop you just traced by hand was running inside Z3.

The same term DAG plus ccpar structure is the foundation of e-graphs, used in compiler optimization and program synthesis. Equality saturation runs congruence closure in a loop, growing the graph by applying rewrite rules until it reaches a fixed point, then picking the best representative from each class. Tools like egg (a general-purpose e-graph library) and Herbie (floating-point accuracy) are built on this idea. The algorithm you just traced is the engine underneath.

Next week (Lecture 4) we survey the other theory solvers: linear real and integer arithmetic, bitvectors, arrays. Each has its own decision procedure, and each follows the same "give me a conjunction of literals, tell me sat or unsat" contract we sketched for the theory solver interface. Lecture 5 shows how the theory solvers and CDCL plug together as DPLL(T), the architecture of modern SMT.