CSE 311 Lecture 26:
Limitations of DFAs, NFAs, and Regular Expressions

Emina Torlak and Kevin Zatloukal

Topics

From NFAs to DFAs
A quick wrap-up of Lecture 25.
DFAs $\equiv$ NFAs $\equiv$ regular expressions
They are all the same and represent regular languages.
Languages and representations
Regular, context-free, and other languages.
Proving irregularity
A proof template for showing that a language is not regular.

From NFAs to DFAs

A quick wrap-up of Lecture 25.

NFAs and DFAs

Every DFA is an NFA.
A DFA is an NFA that satisfies more constraints.
Theorem
For every NFA there is a DFA that recognizes exactly the same language.
Proof (and algorithm) idea:
The DFA constructed for an NFA keeps track of all the states that a prefix of an input string can reach in the NFA.
So there will be one state in the DFA for each subset of the states of the NFA that can be reached by some string.
We’ll see how to construct the start state, remaining states and transitions, and the final states of the DFA.

NFAs to DFAs: the start state

The start state of the DFA represents the following set of states in the NFA:
All states reachable from the start state of the NFA using only $\varepsilon$ edges.

NFA

DFA

NFAs to DFAs: states and transitions

Repeat until fixed point:
Let $D_Q$ be a state of the DFA corresponding to a set $Q$ of the NFA states.
Let $a\in\Sigma$ be a symbol for which $D_Q$ has no outgoing edge.
Let $T$ be the (possibly empty) set of NFA states reachable from some state in $Q$ by following one $a$ edge and zero or more $\varepsilon$ edges.
Add a state $D_T$ to the DFA, if not included, that represents the set $T$.
Add an edge labeled $a$ from $D_Q$ to $D_T$.

NFA

DFA

NFAs to DFAs: final states

The final states of the DFA:
Every DFA state that represents a set of NFA states containing a final state.

NFA

DFA

Example: NFA to DFA conversion

NFA

DFA

Exponential blow-up in simulating nondeterminism

In general the DFA might need a state for every subset of states of the NFA.
Power set of the set of states of the NFA.
$n$-state NFA yields DFA with up to $2^n$ states.
We saw an example of this worst case outcome.

The famous “P=NP?” question asks whether a similar blow-up is always necessary to get rid of nondeterminism for polynomial-time algorithms.

DFAs $\equiv$ NFAs $\equiv$ regular expressions

They are all the same and represent regular languages.

Equivalence of DFAs, NFAs, and regular expressions

We have shown how to build an optimal DFA for every regular expression.
Build an NFA.
Convert the NFA to a DFA using the subset construction.
Minimize the resulting DFA.
Theorem
A language is recognized by a DFA (or NFA) if and only if it has a regular expression.

You need to know this fact but we won’t ask you anything about the “only if” direction from DFAs/NFAs to regular expressions.

Languages represented by NFAs, DFAs, and regular expressions are called regular languages.

Languages and representations

Regular, context-free, and other languages.

A hierarchy of languages and representations

In HW8, you’ll (almost :) prove that regular languages are context-free. How do we prove that all finite languages are regular? By showing how to construct a DFA/NFA/RegEx for every finite language!

Converting a finite language to a regular expression

Convert each string in the language $L$ to a regular expression.
This is just each string itself.
Then put these regular expressions together using the $\cup$ operator.
The resulting regular expression accepts exactly the strings in $L$.
Example
$\{010, 11, 21\} \longrightarrow 010\cup11\cup21$

Back to languages and representations …

We saw in Lecture 21 that the language $B$ of all binary palindromes can be represented by a CFG. We also said that $B$ can’t be represented by any regular expression. How would you prove that?

Why isn’t $B$ (binary palindromes) a regular language?

If $B$ were regular, we could express it as a DFA/NFA/RegEx.
Let’s choose a DFA as our hypothetical representation.
Now, recall that $B$ consists of infinitely many strings $wv\overline{w}$
where $\overline{w}$ is the reverse of $w$ and $v\in \{\varepsilon, “0”, “1” \}$.
What would a DFA need to keep track of to decide $B$?
It would need to keep track of $w$ in order to check $\overline{w}$ against it.
But there are infinitely many possible $w$’s and finitely many DFA states!

This is the intuition for why $B$ is not regular. Let’s see how to turn this intuition into a formal proof.

A strategy for proving that $B$ is not regular

Proof by contradiction:
Assume that $B$ is regular.
Therefore, there is a DFA $M$ that recognizes $B$.
Show that $M$ accepts or rejects a string it shouldn’t.
Key Idea 1
If two string prefixes collide by reaching the same state, a DFA can no longer distinguish their suffixes.
Key Idea 2
The machine $M$ has finitely many states, and since the strings in $B$ have infinitely many distinct prefixes, two of them must collide!

We choose an infinite set $S$ of prefixes. This choice must ensure that for every pair of prefixes in $s_a\neq s_b\in S$, there is a suffix $t$ such that one of of $s_at, s_bt$ is in $B$ but not the other.

Proving that $B$ is not regular

Suppose that some DFA $M$ recognizes $B$. We show $M$ accepts or rejects a string it shouldn’t. Consider the set $S = \{0^n1 : n \geq 0 \}$.

Since there are finitely many states in $M$ and infinitely many strings in $S$, there exist strings $0^a1 \in S$ and $0^b1 \in S$ with $a\neq b$ that end in the same state of $M$.

Important: We don’t get to choose $a$ and $b$! We just know they exist.

Now, consider appending $0^a$ to both strings.

Since $0^a1$ and $0^b1$ end in the same state, so do $0^a10^a$ and $0^b10^a$, call it $q$. But then $M$ must make a mistake: $q$ needs to be an accept state since $0^a10^a \in B$, but then $M$ would accept $0^b10^a \not\in B$, which is an error. $\square$

Proving irregularity

A proof template for showing that a language is not regular.

A template for proving that a language $L$ is not regular

① Suppose for contradiction that some DFA $M$ recognizes $L$.

② Consider the set $S = $ {$\ldots$}.

③ Since $S$ is infinite and $M$ has finitely many states, there must be two strings $s_a, s_b$ $\in S$ such that $s_a \neq s_b$ and both end in the same state of $M$.

④ Consider appending $t$ to both $s_a$ and $s_b$.

⑤ Since $s_a$ and $s_b$ end in the same state of $M$, then $s_at$ and $s_bt$ also end in the same state $q$ of $M$. Since $s_at\in L$ and $s_bt\not\in L$, $M$ does not recognize $L$.

⑥ Since $M$ was arbitrary, no DFA recognizes $L$.

$S$ must be infinite, and for every pair of prefixes $s_a\neq s_b\in S$, there is a suffix $t$ such that one of of $s_at, s_bt$ is in $L$ but not the other.

We don’t get to choose $s_a$ and $s_b$!

$t$ should be an accept suffix for $s_a$ but not $s_b$.

Example: prove that $L = \{0^n1^n : n \geq 0\}$ is not regular

Suppose for contradiction that some DFA $M$ recognizes $L$.

Consider the set $S = $ $\{0^n : n\geq 0 \}$.

Since $S$ is infinite and $M$ has finitely many states, there must be two strings $0^a, 0^b$ $\in S$ with $a \neq b$ that both end in the same state of $M$.

Consider appending $1^a$ to both $0^a$ and $0^b$.

Since $0^a$ and $0^b$ end in the same state of $M$, then $0^a1^a$ and $0^b1^a$ also end in the same state $q$ of $M$. Since $0^a1^a\in L$ and $0^b1^a\not\in L$, $M$ does not recognize $L$.

Since $M$ was arbitrary, no DFA recognizes $L$.

Example: prove that $L = \{(^n)^n : n \geq 0\}$ is not regular

Suppose for contradiction that some DFA $M$ recognizes $L$.

Consider the set $S = $ $\{(^n : n\geq 0 \}$.

Since $S$ is infinite and $M$ has finitely many states, there must be two strings $(^a, (^b$ $\in S$ with $a \neq b$ that both end in the same state of $M$.

Consider appending $)^a$ to both $(^a$ and $(^b$.

Since $(^a$ and $(^b$ end in the same state of $M$, then $(^a)^a$ and $(^b)^a$ also end in the same state $q$ of $M$. Since $(^a)^a\in L$ and $(^b)^a\not\in L$, $M$ does not recognize $L$.

Since $M$ was arbitrary, no DFA recognizes $L$.

A fun fact about this proof method

Suppose that for a language $L$, the set $S$ is a largest set of prefix strings with the property that for every pair $s_a \neq s_b \in S$, there is some string $t$ such that one of $s_at, s_bt$ is in $L$ but the other isn’t.

If $S$ is infinite, then $L$ is not regular.

If $S$ is finite, then the minimal DFA for $L$ has precisely $|S|$ states, one reached by each member of $S$.

Summary

DFAs $\equiv$ NFAs $\equiv$ regular expressions.
We’ve shown how to go from a regular expression to an NFA to a DFA.
(And going from an NFA to a DFA can lead to exponential blow-up.)
But we won’t show how to go from an NFA/DFA to a regular expression.
Finite $\subset$ regular $\subset$ context-free languages.
To show that a language is regular, construct a DFA/NFA/RegEx for it.
To show that it is not regular, use the proof method from this lecture.