Theory
The mechanics underneath the solver: propositional logic, normal forms, and the DPLL algorithm that searches for satisfying assignments.
Propositional Logic
Propositional logic is a formal language. Like any formal language it has two parts: syntax (what counts as a well-formed expression) and semantics (what the expressions mean). We will build both from the ground up, then use them to define the property SAT solvers decide.
Syntax
Every propositional formula is built from three kinds of piece:
- Variables: , , , , ...
- Constants: (top) and (bot).
- Connectives: (not), (and), (or), (implies), (iff).
The constants and stand for true and false. We will make that precise in Semantics.
The grammar says how to combine them:
A formula is either a variable, a constant, or a compound built by applying a connective to one or two smaller formulas. Every compound formula has smaller formulas inside it, all the way down to the atoms.
Operator precedence lets us drop parentheses when the grouping is unambiguous. In order from tightest to loosest: , , , , . Implication and biconditional associate to the right. A few examples of how precedence resolves unparenthesized formulas:
| As written | Fully parenthesized |
|---|---|
The last row is the formula we will use as a running example throughout this section. Its parentheses are doing real work: without the right-hand pair, precedence would give us the previous row, a different formula.
Formulas as Trees
The grammar is recursive: each compound formula is built by applying a connective to one or two smaller formulas, which are themselves built the same way. That is the definition of a tree. Every propositional formula has a natural tree structure falling out of the grammar, with the outermost connective at the root and the atoms at the leaves.
Here is the running formula as a tree:
graph TD
accTitle: Expression tree of the running formula
accDescr: Tree for (not p and q) or (r implies s). The root is a disjunction. Its left child is a conjunction whose children are a negation (with child p) and q. Its right child is an implication with children r and s.
root["∨"]
L["∧"]
R["⟹"]
N["¬"]
p["p"]
q["q"]
r["r"]
s["s"]
root --- L
root --- R
L --- N
L --- q
N --- p
R --- r
R --- sEach internal node is labeled with a connective and has one or two children that are themselves trees. Each leaf is a variable or a constant. The shape of the tree is the shape of the formula: the root is the outermost connective, its left subtree is everything to the left of the disjunction, and its right subtree is everything to the right.
Point at any internal node and you can read off a subformula. The node is the subformula . The node is . The node is . The root node is the whole formula . A subformula is exactly a subtree: pick an internal node, take everything hanging below it, and that is the subformula rooted at that node. This goes in both directions. Every subformula of the running formula corresponds to exactly one subtree of the diagram.
Every transformation we write in the rest of this section walks this tree. The normal-form conversions (core, NNF, DNF, CNF) recurse on the grammar, which is the same as recursing on the tree. When you read the pseudocode in those sections, picture the function moving from the root of the tree toward the leaves, rebuilding subformulas as it goes. The tree view is the mental model that makes every later algorithm on this page make sense.
Semantics
Syntax says what counts as a well-formed formula. Semantics says what the formula means. A formula's meaning is a truth value (true or false), but that truth value depends on what the variables stand for. To pin it down we need an interpretation.
An interpretation is an assignment of truth values to variables: a total function from each variable to either true or false. "Total" means every variable gets exactly one value: there are no unassigned variables, and no variable gets both. In Python, an interpretation is a dict like {'p': False, 'q': True, 'r': False, 's': False} that includes every variable appearing in the formula. The constants and are already fixed to true and false, so together with the interpretation we know the truth value at every leaf of the tree. From there we can evaluate bottom-up, applying the truth-table rule for each connective at each internal node until we reach the root.
In pseudocode, evaluation is a straight recursion on the formula, matching the grammar case by case:
eval(f, I):
match f:
case ⊤: true
case ⊥: false
case x: I(x)
case ¬a: not eval(a, I)
case a ∧ b: eval(a, I) and eval(b, I)
case a ∨ b: eval(a, I) or eval(b, I)
case a → b: (not eval(a, I)) or eval(b, I)
case a ↔ b: eval(a, I) = eval(b, I)
The base cases (, , variables) read off a truth value directly. The compound cases recurse on the immediate subformulas and combine the results with the boolean operation that corresponds to the connective. The combining rules are the standard truth tables for the boolean operators:
| true | false |
| false | true |
| true | true | true | true | true | true |
| true | false | false | true | false | false |
| false | true | false | true | true | false |
| false | false | false | false | true | true |
Each row of a truth table is one possible combination of values for the inputs. The columns on the right give the value of the compound expression for that row's inputs. To evaluate, say, when is true and is false, find the row with and , then read across to the column: false. Implication is the one connective whose table sometimes surprises people. An implication is true whenever its premise is false, regardless of the conclusion. The only way to make false is to have a true premise and a false conclusion. The biconditional is true exactly when both sides agree.
A worked trace. Take the running formula under the interpretation . Evaluation proceeds from the leaves up the tree. Each node below is tagged with its evaluation step number and the value computed at that node:
graph TD
accTitle: Evaluation trace of the running formula under I
accDescr: Same tree as before with each node tagged by a step number and computed truth value. Step 1 is p, false. Step 2 is the negation of p, true. Step 3 is q, true. Step 4 is the conjunction of not p and q, true. Step 5 is r, true. Step 6 is s, false. Step 7 is the implication from r to s, false. Step 8 is the root disjunction, true.
root["`∨
*8: true*`"]
L["`∧
*4: true*`"]
R["`⟹
*7: false*`"]
N["`¬
*2: true*`"]
p["`p
*1: false*`"]
q["`q
*3: true*`"]
r["`r
*5: true*`"]
s["`s
*6: false*`"]
root --- L
root --- R
L --- N
L --- q
N --- p
R --- r
R --- sThe step numbers trace a left-to-right, bottom-up walk: evaluate the left subtree completely, then the right subtree, then combine at the root. Reading the tree in that order:
- Step 1: is false (from ).
- Step 2: is true (negation of step 1).
- Step 3: is true (from ).
- Step 4: is true (conjunction of steps 2 and 3).
- Step 5: is true (from ).
- Step 6: is false (from ).
- Step 7: is false (implication from step 5 to step 6).
- Step 8: is true (disjunction of steps 4 and 7).
Different interpretations can produce different truth values. The interpretation makes false, so the left conjunction is false; is still false; and the whole formula evaluates to false under . Whether a formula is true depends on the interpretation.
Inference Rules
The same rules can be written more formally as inference rules. We introduce one piece of notation: the double turnstile . For an interpretation and a formula , we write to mean " is true under " (read " models "). The crossed-out form means " is false under ."
These two forms are duals of each other. Because interpretations are total, every formula is either true or false under , never neither and never both. So is exactly the same as saying " does not hold," and vice versa. The two ways of writing a negative judgment mean the same thing. The duality matters when you read the rules below: any time a rule uses in a premise or conclusion, you can read it as "and is false under " without losing anything.
An inference rule has a horizontal bar separating its premises (above) from its conclusion (below): if every premise holds, the conclusion holds. A rule with nothing above the bar is an axiom; its conclusion holds unconditionally. Some connectives need more than one rule because there is more than one way to make the compound formula true. The convention is that if no rule derives , then is false under .
Each connective gets two kinds of rule. Introduction rules say how to conclude that a compound formula is true: how to build up a judgment about the compound from judgments about its parts. Elimination rules say how to use a judgment about a compound formula: what you can conclude about the parts when you know the compound holds. Intro rules build things up; elim rules take them apart.
The constants are fixed: is always true and is always false. These are axioms because there is no subformula to check. There is nothing to eliminate from a constant, so they have no elimination rules.
Introduction rules for and :
A variable is true under exactly when the interpretation assigns it the value true. The interpretation itself is the only thing to consult, so both the intro and the elim rule are essentially one-step lookups.
Introduction rule for :
Elimination rule for :
A negation is true exactly when is false. The intro rule turns a failure to hold into a negation judgment. The elim rule turns a negation judgment back into a failure to hold.
Introduction rule for :
Elimination rule for :
A conjunction is true when both conjuncts are true. The intro rule requires both subformulas to hold. The elim rules let us extract either conjunct from the compound: knowing the conjunction holds tells us each piece holds on its own.
Introduction rule for :
Elimination rules for :
A disjunction is true when at least one disjunct is true. The intro rules give us two ways to build a disjunction: either disjunct holding is enough. The elim rules capture disjunctive syllogism: if the disjunction holds and one side fails, the other side must hold.
Introduction rules for :
Elimination rules for :
An implication is true in two situations: when the premise is false (a vacuously true implication), or when the conclusion is true. The elim rules are the classical ways of using an implication: modus ponens takes you forward (premise true, so conclusion must be true) and modus tollens takes you backward (conclusion false, so premise must be false).
Introduction rules for :
Elimination rules for :
A biconditional is true when the two sides agree: both true or both false. The intro rules build the biconditional from either agreement pattern. The elim rules let us propagate across a known biconditional: once we know the two sides agree, the truth value of one side determines the other.
Introduction rules for :
Elimination rules for :
The inference rules and the pseudocode are the same semantics expressed two ways. The inference rules are the declarative statement of what evaluation means; the pseudocode is an operational procedure for computing it. Both agree on the truth value of every formula under every interpretation.
Satisfiability, Validity, and Duality
With syntax and semantics in place, we can ask the questions a SAT solver actually answers. Every propositional formula falls into one of three categories depending on how it behaves across all possible interpretations.
A formula is satisfiable if at least one interpretation makes it true. An interpretation that makes a formula true is called a model of the formula. The running formula is satisfiable: the interpretation from the Semantics trace is a model.
A formula is valid if every interpretation makes it true. Valid formulas are also called tautologies. An example: . No matter what truth value you assign to , one of or is true, so the disjunction is true. The running formula is not valid: we showed that the interpretation makes it false.
A formula is unsatisfiable if no interpretation makes it true. An example: . For any interpretation, either is false (making the left conjunct false) or is false (making the right conjunct false), and either way the conjunction is false. Unsatisfiable formulas are also called contradictions.
Satisfiability and validity are connected by a simple duality:
The reasoning: is valid means every interpretation makes true, which means no interpretation makes false, which means no interpretation makes true, which means is unsatisfiable. The implication runs both directions.
This duality is not just mathematical bookkeeping. It means that a SAT solver is also a validity checker. Build one tool that decides satisfiability and you get validity for free: to check whether is valid, ask the solver whether is satisfiable. If the solver says no, is valid. If the solver says yes, the satisfying interpretation it returns is a counterexample to 's validity.
The same duality is how we check whether two formulas are equivalent. Two formulas and are equivalent when every interpretation gives them the same truth value. Equivalently, is valid. Equivalently, is unsatisfiable. So to check whether two formulas are equivalent, ask the solver whether their biconditional's negation is satisfiable. If unsatisfiable, they are equivalent. If satisfiable, the model is an interpretation on which they disagree.
This is exactly the trick from Practice. The bvudiv2 demo asked whether a fast bit-twiddling implementation and a straightforward reference implementation compute the same function on all 32-bit inputs. Rather than enumerating all inputs, we asserted that the two implementations produce different results and asked Z3 whether that assertion was satisfiable. Unsatisfiable meant the implementations agree on every input. Satisfiable would have returned a specific input on which they disagree. The "assert the negation and check satisfiability" move is the duality at work, and it is the foundation of how SAT and SMT solvers are used to verify programs.
Formulas as Data
To compute with formulas we need to represent them as data. We use nested Python tuples: each compound formula is a tuple whose first element is a string tag naming the connective, followed by the tuple representations of its subformulas. Variables are bare strings. The constants and are Python's True and False.
Here is the running formula as a Python tuple:
('or',
('and', ('not', 'p'), 'q'),
('implies', 'r', 's'))
The indentation is just for readability: Python sees one nested tuple. Read the outermost tuple head first: 'or' says the whole formula is a disjunction. Its two remaining elements are the left and right disjuncts, each a tuple in its own right. The left disjunct ('and', ('not', 'p'), 'q') is a conjunction whose conjuncts are ('not', 'p') (a unary negation of the variable 'p') and 'q' (a bare variable). The right disjunct ('implies', 'r', 's') is an implication between the variables 'r' and 's'. Compare this to the tree diagram from Formulas as Trees: the tuple is the tree, walked from the outside in, with one tuple per node.
graph TD
accTitle: Tree of the running formula labeled with tuple tags
accDescr: Same tree as before, but each node is labeled with the Python tuple tag that names it. The root is tagged 'or'. Its left child is tagged 'and' and its right child is tagged 'implies'. The left subtree contains a 'not' node with child 'p' and a leaf 'q'. The right subtree has leaves 'r' and 's'.
root["'or'"]
L["'and'"]
R["'implies'"]
N["'not'"]
p["'p'"]
q["'q'"]
r["'r'"]
s["'s'"]
root --- L
root --- R
L --- N
L --- q
N --- p
R --- r
R --- sEach internal node's tuple head is exactly the string tag shown in the diagram. The children of the node are the remaining elements of the tuple, in order. A leaf's label is the bare variable string at that position in the tuple. Walking the tuple from the outside in visits the nodes of the tree from the root down. This is what makes "formulas as trees" and "formulas as data" the same thing viewed from two angles.
The Python implementation of the evaluation pseudocode falls out of this representation. Each case in the match statement corresponds to one rule, and Python's structural pattern matching unpacks the tuples for us:
def eval(f, I):
match f:
case True: return True
case False: return False
case str(x): return I[x]
case ('not', a): return not eval(a, I)
case ('and', a, b): return eval(a, I) and eval(b, I)
case ('or', a, b): return eval(a, I) or eval(b, I)
case ('implies', a, b): return (not eval(a, I)) or eval(b, I)
case ('iff', a, b): return eval(a, I) == eval(b, I)
The match f: statement dispatches on the structure of f. The constant cases True and False catch and . The str(x) case catches bare-variable atoms and looks them up in the interpretation dict I. The compound cases destructure the tagged tuple, bind the subformulas to names, and recurse. Every case corresponds to one rule from the Semantics subsection, and the structure of the function mirrors the structure of the formula.
This is the formulas-as-data pattern. The same recursive structure that defines the syntax also drives the computation. Every transformation in the rest of this section follows the same shape: match on the formula's structure, recurse on subformulas, rebuild the result from the pieces.
Normal Forms
SAT solvers do not work directly on arbitrary formulas. They require a specific shape: Conjunctive Normal Form (CNF).
Core Form
Propositional logic has five connectives (, , , , ), but only three are needed. Implication and biconditional can be eliminated:
A formula is in core form if it uses only , , and :
The conversion applies the two equivalences above recursively:
to_core(f):
match f:
# atoms pass through
case ⊤: ⊤
case ⊥: ⊥
case x: x
case ¬a:
¬to_core(a)
case a ∧ b:
to_core(a) ∧ to_core(b)
case a ∨ b:
to_core(a) ∨ to_core(b)
# eliminate →
case a → b:
to_core(¬a ∨ b)
# eliminate ↔ (duplicates a and b)
case a ↔ b:
to_core((¬a ∨ b) ∧ (¬b ∨ a))
The case duplicates its arguments: a ↔ b becomes two copies of a and b. For a single this is a constant doubling. But nested can compound: each level doubles, so nested biconditionals can produce output. Everything else is linear. In practice deeply nested chains are uncommon, and Tseitin (below) avoids this entirely.
Negation Normal Form (NNF)
A formula in core form may still have negations deep inside: , , etc. Negation Normal Form (NNF) pushes all negations down to the leaves so they appear only directly on atoms:
The conversion uses two mutually recursive functions. to_nnf(f) converts f normally. to_nnf_neg(f) converts f as if there were a negation above it. When to_nnf encounters a , it switches to to_nnf_neg. When to_nnf_neg encounters a , the double negation cancels and it switches back to to_nnf. De Morgan's laws happen naturally: negating a conjunction produces a disjunction (and vice versa).
# input: a formula in core form
to_nnf(f):
match f:
# atoms are already NNF
case ⊤: ⊤
case ⊥: ⊥
case x: x
# negation: switch to to_nnf_neg
case ¬a:
to_nnf_neg(a)
case a ∧ b:
to_nnf(a) ∧ to_nnf(b)
case a ∨ b:
to_nnf(a) ∨ to_nnf(b)
# convert ¬f to NNF — a negation is pending from above
to_nnf_neg(f):
match f:
# the pending negation attaches to an atom
case ⊤: ⊥
case ⊥: ⊤
case x: ¬x
# double negation: pending ¬ meets another ¬, they cancel
case ¬a:
to_nnf(a)
# De Morgan: ¬(a ∧ b) = ¬a ∨ ¬b
case a ∧ b:
to_nnf_neg(a) ∨ to_nnf_neg(b)
# De Morgan: ¬(a ∨ b) = ¬a ∧ ¬b
case a ∨ b:
to_nnf_neg(a) ∧ to_nnf_neg(b)
NNF conversion is linear in the size of the core form. Each node is visited once. The only blowup in the full pipeline is from elimination in to_core (see above).
Disjunctive Normal Form (DNF)
A formula is in Disjunctive Normal Form (DNF) if it is a disjunction of terms, where each term is a conjunction of literals:
The two levels are: terms group literals with AND, then DNF groups terms with OR. To convert an NNF formula to DNF, distribute over :
# input: a formula in NNF
to_dnf(f):
match f:
# base case: a literal is already a one-literal term
case l where is_literal(l):
l
# OR is the top-level connective in DNF, passes through
case a ∨ b:
to_dnf(a) ∨ to_dnf(b)
# AND: recursively convert both sides, then distribute
case a ∧ b:
distribute_and(to_dnf(a), to_dnf(b))
# distribute AND over a pair of DNF formulas
distribute_and(f, g):
match f, g:
# f has OR at top: distribute into both branches
case (a ∨ b), g:
distribute_and(a, g) ∨ distribute_and(b, g)
# g has OR at top: distribute into both branches
case f, (a ∨ b):
distribute_and(f, a) ∨ distribute_and(f, b)
# both are terms (no OR remains): just conjoin
case f, g:
f ∧ g
The blowup happens in distribute_and: every OR in either argument gets split, doubling the formula. Each application can double the formula size. Trace one step on :
Two clauses of two literals each produce four terms. Each additional clause doubles the count. The formula has clauses in CNF but terms in DNF:
| CNF size | DNF size | |
|---|---|---|
| 1 | 1 clause | 2 terms |
| 2 | 2 clauses | 4 terms |
| 5 | 5 clauses | 32 terms |
| 10 | 10 clauses | 1024 terms |
Checking whether a DNF formula is satisfiable is trivially easy: pick any term whose literals are consistent (no variable appears both positive and negative) and set those literals to true. If every term is inconsistent, the formula is unsatisfiable, but that is rare. DNF-SAT is solvable in linear time.
Conjunctive Normal Form (CNF)
A formula is in Conjunctive Normal Form (CNF) if it is a conjunction of clauses, where each clause is a disjunction of literals:
The two levels are the mirror of DNF: clauses group literals with OR, then CNF groups clauses with AND. To convert an NNF formula to CNF, distribute over . The mechanics are symmetric to DNF:
# input: a formula in NNF
to_cnf(f):
match f:
# base case: a literal is already a one-literal clause
case l where is_literal(l):
l
# AND is the top-level connective in CNF, passes through
case a ∧ b:
to_cnf(a) ∧ to_cnf(b)
# OR: recursively convert both sides, then distribute
case a ∨ b:
distribute_or(to_cnf(a), to_cnf(b))
# distribute OR over a pair of CNF formulas
distribute_or(f, g):
match f, g:
# f has AND at top: distribute into both branches
case (a ∧ b), g:
distribute_or(a, g) ∧ distribute_or(b, g)
# g has AND at top: distribute into both branches
case f, (a ∧ b):
distribute_or(f, a) ∧ distribute_or(f, b)
# both are clauses (no AND remains): just disjoin
case f, g:
f ∨ g
Same structure, symmetric rules. The blowup happens in distribute_or. Trace one step on :
Two terms of two literals each produce four clauses. The formula has terms in DNF but clauses in CNF:
| DNF size | CNF size | |
|---|---|---|
| 1 | 1 term | 2 clauses |
| 2 | 2 terms | 4 clauses |
| 5 | 5 terms | 32 clauses |
| 10 | 10 terms | 1024 clauses |
Naive CNF conversion blows up just as badly as naive DNF conversion. Both directions are exponential in general.
CNF is what SAT solvers use. Checking whether a CNF formula is satisfiable is hard (this is the NP-complete SAT problem). The next two sections explain why CNF is the right choice despite the blowup.
def cnf(f):
"""Distribute OR over AND — can blow up exponentially."""
return _flatten(_distribute('or', 'and', nnf(f)))
Why CNF?
If both conversions blow up, why do SAT solvers use CNF instead of DNF? Two reasons.
First, there is a trick (the Tseitin transformation, next section) that gives compact CNF by introducing auxiliary variables. The trick works because clauses in CNF are ANDed together: each clause is a constraint that limits what values the variables can take. Auxiliary variables can be pinned to exact meanings by adding constraints. DNF terms are ORed together: each term offers an independent way to satisfy the formula. Adding auxiliary variables to DNF terms does not constrain them in the same way. Tseitin does not work for DNF.
Second, checking whether a DNF is satisfiable is trivially easy. Pick any term whose literals are consistent (no variable appears both positive and negative), set those literals to true, and you are done. If every term is inconsistent, the formula is unsatisfiable, but that is rare. This means DNF cannot compactly encode hard problems. CNF-SAT is hard precisely because Tseitin lets you pack a huge number of problems into compact CNF formulas. We want the hard representation because that is where the interesting problems live.
Tseitin Transformation
Naive CNF conversion fails at scale. The Core Form section already showed that nested biconditionals can blow up exponentially because to_core duplicates both sides of every . The Conjunctive Normal Form section showed the same exponential blowup for to_cnf distributing over . Either way, the conversion is unusable on real inputs.
The Tseitin transformation fixes this with a different approach. Instead of distributing connectives across each other (which copies subformulas), Tseitin gives every compound subformula a fresh name and adds a small constraint tying the name to its meaning. The result is a CNF that grows linearly with the size of the input rather than exponentially. The catch: the new CNF is not logically equivalent to the original. It is equisatisfiable, which is enough for SAT.
Tseitin handles arbitrary propositional formulas directly. There is no need to preprocess to core form, NNF, or anything else. Every connective gets its own constant-size encoding, including the connective that hurts naive conversion most.
Name every subtree
The core idea, in one sentence: walk the formula tree, give every internal node a fresh auxiliary variable, and write down a biconditional that says "this auxiliary is true exactly when the subformula at this node is true." Then the original formula is true exactly when the auxiliary at the root is true.
Take the running formula . Three of its subformulas use a binary connective: the conjunction , the implication , and the root disjunction. (Negation is unary, not binary, so stays as the literal and gets no auxiliary.) Introduce three fresh variables, one for each subformula above:
Notice that 's definition references and instead of mentioning the original subformulas again. Once a subformula has a name, every reference to it uses the name.
The Tseitin output is the conjunction of three biconditionals plus an assertion that the root holds:
The leading says "the root subformula is true," which is what we wanted to assert about the original formula. The three biconditionals constrain the auxiliaries to mean the right things. The whole conjunction is satisfiable exactly when the original formula is satisfiable (we will see why below).
Here is the running formula's tree with each internal node tagged by its auxiliary variable:
graph TD
accTitle: Tseitin auxiliary assignment for the running formula
accDescr: The tree of (not p and q) or (r implies s) with each binary connective node labeled by an auxiliary variable. The root disjunction is a3. Its left child, the conjunction of not p and q, is a1. Its right child, the implication from r to s, is a2. The negation node and all leaf nodes have no auxiliary.
root["`∨
*a_3*`"]
L["`∧
*a_1*`"]
R["`⟹
*a_2*`"]
N["¬"]
p["p"]
q["q"]
r["r"]
s["s"]
root --- L
root --- R
L --- N
L --- q
N --- p
R --- r
R --- sTwo analogies
Tseitin is let-binding for formulas. Just as let x = e in ... names a subexpression once and reuses the name, Tseitin names each subformula once and references the name everywhere that subformula appears. The biconditionals are the let-bindings; the auxiliary variables are the bound names.
It is also a form of hash-consing. Hash-consing means storing each distinct value exactly once and referring to it by a stable name. Java programmers know the same idea as string interning: "hello".intern() returns a canonical reference shared by every other interned "hello" in the program. Tseitin gives each compound subformula one auxiliary variable and reuses it everywhere that subformula appears. When the same subformula occurs twice in the original, Tseitin assigns it one auxiliary and the second occurrence uses that same auxiliary by reference. The size of the output is proportional to the number of distinct subformulas, not the number of textual occurrences.
Biconditionals as CNF clauses
The Tseitin output above is not yet CNF. It is a conjunction of biconditionals, and biconditionals are not clauses. To finish the transformation we need to expand each biconditional into CNF clauses.
Take the conjunction template . Expand the biconditional as two implications:
Eliminate the implications using :
Push the negation in the right conjunct using De Morgan, then distribute over in the left conjunct:
Three short clauses: two binary clauses and one ternary clause. The biconditional is now in CNF.
The same procedure works for every binary connective. The results:
| Biconditional | CNF clauses |
|---|---|
| , , | |
| , , | |
| , , | |
| , , , |
Each template produces at most four clauses, each of length at most three. The size is constant per binary connective, regardless of how that connective is nested in the original formula.
The last row is the punchline. The Core Form section warned that nested chains blow up because to_core duplicates both sides of every biconditional, and each duplication compounds with the next. Tseitin handles with a single biconditional template that expands to four constant-size clauses. The connective that hurts naive conversion most is the connective Tseitin handles cleanest.
Worked example: the running formula end to end
We have established the high-level idea, the diagram, and the templates. Now follow the running formula all the way through to the actual CNF clauses Tseitin produces.
The recursive walk visits each node bottom-up. Literals propagate as themselves; binary connectives get fresh auxiliaries:
- The leaves , , , are literals. Each returns itself.
- is a unary compound. Negation propagates as a literal: it returns .
- is the first binary connective visited. Allocate and record .
- is the second. Allocate and record .
- The root disjunction is the third. Its children, after naming, are and . Allocate and record .
The walk is done. We have three biconditionals plus the assertion that holds. Now apply the templates from the previous subsection to convert each biconditional into clauses.
For the conjunction template gives:
(The third clause has instead of because double negation cancels.)
For the implication template gives:
For the disjunction template gives:
Plus the root assertion as a single unit clause: . The complete Tseitin CNF for the running formula is the conjunction of these ten clauses:
Ten clauses over seven variables (the four originals plus three auxiliaries). The original formula is small enough that naive CNF would also be modest, so Tseitin's overhead doesn't pay for itself on this example. The win shows up at scale, which the comparison table at the end of this section makes concrete.
Why Tseitin is linear
For a formula with binary connectives, Tseitin introduces at most auxiliary variables and at most biconditionals. (It can introduce fewer when the same subformula appears more than once: the memoization in the algorithm reuses one auxiliary across every occurrence.) Each biconditional expands to at most four constant-size clauses, plus the single unit clause asserting the root auxiliary. Total: at most clauses, each of length at most three. That is , linear in the size of the input.
This may seem suspicious because the algorithm calls to_cnf on each biconditional, and the CNF section showed that to_cnf is exponential in the worst case. The resolution: each biconditional has constant size (at most three variables), so to_cnf on it is constant work and produces a constant number of clauses. The exponential blowup in to_cnf only happens on deeply nested formulas. The biconditionals Tseitin produces are shallow by construction, so the blowup never fires.
Equisatisfiable, not equivalent
The Tseitin formula has variables that the original does not: the auxiliaries. So the two formulas have different signatures and are not logically equivalent. They are equisatisfiable: the Tseitin formula has a satisfying assignment if and only if the original formula does.
The forward direction is easy. Given any satisfying assignment of the original formula, set each auxiliary to whatever truth value its subformula evaluates to under that assignment. Every biconditional is satisfied by construction, and the root auxiliary gets the value of the root subformula, which is true. So the assignment extends to a satisfying assignment of the Tseitin formula.
The reverse direction is what we actually use. Given any satisfying assignment of the Tseitin formula, restrict it to the original variables. Because every biconditional holds, the value at every internal node of the tree agrees with what evaluating the subformula bottom-up would give. The root assertion forces the original formula to evaluate to true under the restricted assignment.
For SAT this is enough. We are looking for some assignment that makes the original formula true; we don't care that the Tseitin formula has extra variables. When the solver returns a model, we throw away the auxiliary values and keep the original variables.
Pseudocode
Tseitin walks the formula tree and assigns a fresh auxiliary variable to each conjunction, disjunction, implication, and biconditional, collecting the corresponding constraints as it goes. The result is conjoined with an assertion that the root auxiliary is true. The function uses a memo so that repeated subformulas share an auxiliary.
tseitin(f):
aux = {} # subformula -> auxiliary variable
bicond = [] # collected biconditionals
name(f):
match f:
# literals propagate as themselves
case l where is_literal(l):
l
# negations are pushed onto the named child as a literal
case ¬a:
¬name(a)
# binary connectives get a fresh aux, memoized by subformula;
# `op` below stands for whichever binary connective matched
case a ∧ b | a ∨ b | a → b | a ↔ b:
if f not in aux:
a' = name(a)
b' = name(b)
v = fresh()
aux[f] = v
bicond.append(v ↔ (a' op b'))
aux[f]
top = name(f)
to_cnf(top ∧ ⋀ bicond)
The match arm covers all four binary connectives, so this version of Tseitin handles arbitrary propositional formulas without any preprocessing. The to_cnf call at the end operates on each small biconditional individually; each biconditional is constant size, so the total work is linear.
Compared to naive CNF
For a formula with binary connectives, naive CNF can grow to clauses in the worst case while Tseitin stays linear. A concrete parameterized example: the disjunction of conjunctions
This formula has original variables, conjunctions, and disjunctions, so binary connectives in total. Naive CNF distributes over and produces clauses, each of length . Tseitin introduces one auxiliary per binary connective ( auxiliaries), produces three clauses per biconditional, and adds one root unit clause: clauses, each of length at most three.
| Naive CNF clauses | Tseitin clauses | |
|---|---|---|
| 1 | 2 | 4 |
| 2 | 4 | 10 |
| 5 | 32 | 28 |
| 10 | 1024 | 58 |
| 20 | 1,048,576 | 118 |
Notice that for the smallest cases, naive CNF is actually smaller than Tseitin: Tseitin has constant per-connective overhead from the auxiliaries and biconditionals, and that overhead doesn't pay for itself until the formula is big enough. The crossover here is around . After that, naive doubles for each added conjunction while Tseitin adds a constant. By the gap is four orders of magnitude. Real SAT problems routinely involve thousands or millions of variables, so naive conversion is not even a starting point. Tseitin or one of its variants is what every production solver uses.
Boolean Constraint Propagation
A unit clause is a clause with exactly one literal, such as or . It has only one way to be satisfied: that literal must be true. This is not a guess. It is forced.
Boolean Constraint Propagation (BCP) applies the unit rule to fixpoint: find a unit clause, force its literal, simplify the formula (remove satisfied clauses, remove falsified literals), repeat. Only when no unit clauses remain do we have to make a branching decision.
def unit_propagate(cnf, lit):
"""Simplify CNF after assigning lit to true."""
result = []
for clause in cnf:
if lit in clause:
continue # clause satisfied — remove it
reduced = [l for l in clause if l != -lit]
result.append(reduced) # remove falsified literal
return result
bcp calls unit_propagate in a loop until no unit clauses remain:
def bcp(cnf, assignment):
"""Apply unit rule to fixpoint. Returns (cnf, assignment) or (None, _) on conflict."""
changed = True
while changed:
changed = False
for clause in cnf:
if len(clause) == 1:
lit = clause[0]
if -lit in assignment:
return None, assignment # conflict
if lit not in assignment:
assignment = assignment | {lit}
cnf = unit_propagate(cnf, lit)
changed = True
break
return cnf, assignment
BCP is deduction: conclusions forced by the current partial assignment. The DPLL algorithm structures search around this deduction step.
DPLL
DPLL (Davis-Putnam-Logemann-Loveland) combines BCP with recursive branching and backtracking.
Compare two implementations on the same input:
def search_sat(cnf):
"""Try all 2^n assignments. Correct but exponential."""
def search(vars_left, assignment):
if evaluate(cnf, assignment): return assignment
if not vars_left: return None
v, *rest = vars_left
return search(rest, assignment | {v}) or search(rest, assignment | {-v})
return search(sorted(variables(cnf)), set())
def dpll(cnf):
"""BCP + branching + backtracking."""
def solve(cnf, assignment):
cnf, assignment = bcp(cnf, assignment)
if cnf is None: return None # conflict
if not cnf: return assignment
if any(len(c) == 0 for c in cnf): return None # empty clause
lit = cnf[0][0]
return solve(cnf + [[lit]], assignment) or solve(cnf + [[-lit]], assignment)
return solve(cnf, set())
The branching trick: instead of directly assigning a variable, DPLL adds a unit clause [lit]. Then BCP immediately propagates it. The "branch" is a guess rendered as a forced deduction on the next recursive call.
Tracing BCP
Consider . Converting to CNF:
In DIMACS integers (1=p, 2=q, 3=r): [[1], [-1,2], [-2,3], [-3]].
DPLL calls bcp first. Watch what happens — no branching decisions are made at all:
Start: [[1], [-1,2], [-2,3], [-3]] assignment = {}
unit [1]: force p
[-1,2] loses -1 → [2]
[[2], [-2,3], [-3]] assignment = {p}
unit [2]: force q
[-2,3] loses -2 → [3]
[[3], [-3]] assignment = {p, q}
unit [3]: force r
[-3] loses -3 → [] ← empty clause: CONFLICT
UNSAT
BCP derived a contradiction from the unit clauses alone. search_sat tries all complete assignments before concluding UNSAT. dpll finds the contradiction in a single call to solve, with BCP doing all the work.
The gap in practice
Consider a chain of implications , with forced true and forced false. BCP resolves the contradiction in one sweep: every variable is forced by propagation before any branching decision is made. search_sat must explore the entire tree. Run dpll-toy.py to see the call counts.
This is the difference between search and deduction. Real SAT solvers add conflict clause learning on top of DPLL (that is Week 2), but the deduction step is where most of the work happens.
End-to-end pipeline
The three files form a pipeline. Here is the formula flowing through each stage:
formula ('implies', 'p', ('and', 'q', 'r'))
|
| normal-forms.py: to_nnf()
v
nnf ('or', ('not', 'p'), ('and', 'q', 'r'))
|
| tseitin.py: auxiliaries() + cnf()
v
tseitin cnf ('and', '_aux_1',
('iff', '_aux_0', ('and', 'q', 'r')),
('iff', '_aux_1', ('or', ('not', 'p'), '_aux_0')))
|
| tseitin.py: to_dimacs()
v
dimacs [[3], [-1,-2,3], [1,-3], [2,-3], [-4,3], [4,-3], ...]
var_map: {'q':1, 'r':2, '_aux_0':3, '_aux_1':4, 'p':5}
|
| dpll-toy.py: dpll()
v
assignment {3, 4, -5, ...} (raw literals)
|
| decode via var_map
v
model {'p': False, 'q': True, 'r': True} (or any satisfying assignment)
The DIMACS integer format is the interface between the "produce CNF" side and the "solve CNF" side. Positive integer means variable is true; negative means false. This is the same format real SAT solvers accept on disk.
Demo Code
The runnable Python files for this phase implement all the transformations above: normal-forms.py (evaluation, NNF, DNF, CNF), tseitin.py (Tseitin transformation and DIMACS output), and dpll-toy.py (brute-force search vs. DPLL with BCP). They will be available in the course code repo under lectures/l01/.