Limitations of DFAs, NFAs, and Regular Expressions

Emina Torlak and Kevin Zatloukal

- From NFAs to DFAs
- A quick wrap-up of Lecture 25.
- DFAs $\equiv$ NFAs $\equiv$ regular expressions
- They are all the same and represent
*regular languages*. - Languages and representations
- Regular, context-free, and other languages.
- Proving irregularity
- A proof template for showing that a language is not regular.

A quick wrap-up of Lecture 25.

- Every DFA is an NFA.
- A DFA is an NFA that satisfies more constraints.

- Theorem
- For every NFA there is a DFA that recognizes exactly the same language.

- Proof (and algorithm) idea:
- The DFA constructed for an NFA keeps track of
*all*the states that a prefix of an input string can reach in the NFA. - So there will be one state in the DFA for each
*subset*of the states of the NFA that can be reached by some string. - We’ll see how to construct the start state, remaining states and transitions, and the final states of the DFA.

- The start state of the DFA represents the following set of states in the NFA:
- All states reachable from the start state of the NFA using only $\varepsilon$ edges.

NFA

DFA

- Repeat until fixed point:
- Let $D_Q$ be a state of the DFA corresponding to a set $Q$ of the NFA states.
- Let $a\in\Sigma$ be a symbol for which $D_Q$ has no outgoing edge.
- Let $T$ be the (possibly empty) set of NFA states reachable from some state in $Q$ by following one $a$ edge and zero or more $\varepsilon$ edges.
- Add a state $D_T$ to the DFA, if not included, that represents the set $T$.
- Add an edge labeled $a$ from $D_Q$ to $D_T$.

NFA

DFA

- The final states of the DFA:
- Every DFA state that represents a set of NFA states containing a final state.

NFA

DFA

NFA

DFA

- In general the DFA might need a state for every subset of states of the NFA.
- Power set of the set of states of the NFA.
- $n$-state NFA yields DFA with up to $2^n$ states.
- We saw an example of this worst case outcome.

The famous “P=NP?” question asks whether a similar blow-up is always necessary to get rid of nondeterminism for polynomial-time algorithms.

They are all the same and represent *regular languages*.

- We have shown how to build an optimal DFA for every regular expression.
- Build an NFA.
- Convert the NFA to a DFA using the subset construction.
- Minimize the resulting DFA.

- Theorem
- A language is recognized by a DFA (or NFA) if and only if it has a regular expression.

You need to know this fact but we won’t ask you anything about the “only if” direction from DFAs/NFAs to regular expressions.

Languages represented by NFAs, DFAs, and regular expressions are called *regular languages*.

Regular, context-free, and other languages.

In HW8, you’ll (almost :) prove that regular languages are context-free. How do we prove that all finite languages are regular? By showing how to construct a DFA/NFA/RegEx for every finite language!

- Convert each string in the language $L$ to a regular expression.
- This is just each string itself.
- Then put these regular expressions together using the $\cup$ operator.
- The resulting regular expression accepts exactly the strings in $L$.
- Example
- $\{010, 11, 21\} \longrightarrow 010\cup11\cup21$

We saw in Lecture 21 that the language $B$ of all binary palindromes can be represented by a CFG. We also said that $B$ can’t be represented by any regular expression. How would you prove that?

- If $B$ were regular, we could express it as a DFA/NFA/RegEx.
- Let’s choose a DFA as our hypothetical representation.

- Now, recall that $B$ consists of infinitely many strings $wv\overline{w}$
- where $\overline{w}$ is the reverse of $w$ and $v\in \{\varepsilon, “0”, “1” \}$.

- What would a DFA need to keep track of to decide $B$?
- It would need to keep track of $w$ in order to check $\overline{w}$ against it.
- But there are infinitely many possible $w$’s and finitely many DFA states!

This is the intuition for why $B$ is not regular. Let’s see how to turn this intuition into a formal proof.

- Proof by contradiction:
- Assume that $B$ is regular.
- Therefore, there is a DFA $M$ that recognizes $B$.
- Show that $M$ accepts or rejects a string it shouldn’t.

- Key Idea 1
- If two string prefixes collide by reaching the same state, a DFA can no longer distinguish their suffixes.

- Key Idea 2
- The machine $M$ has finitely many states, and since the strings in $B$ have infinitely many distinct prefixes, two of them must collide!

We choose an infinite set $S$ of prefixes.
This choice must ensure that for *every pair* of prefixes in $s_a\neq s_b\in S$,
there is a suffix $t$ such that one of of $s_at, s_bt$
is in $B$ but not the other.

Suppose that some DFA $M$ recognizes $B$. We show $M$ accepts or rejects a string it shouldn’t. Consider the set $S = \{0^n1 : n \geq 0 \}$.

Since there are finitely many states in $M$ and infinitely many strings in $S$, there exist strings $0^a1 \in S$ and $0^b1 \in S$ with $a\neq b$ that end in the same state of $M$.

**Important:** We don’t get to choose $a$ and $b$! We just know they exist.

Now, consider appending $0^a$ to both strings.

Since $0^a1$ and $0^b1$ end in the same state, so do $0^a10^a$ and $0^b10^a$, call it $q$. But then $M$ must make a mistake: $q$ needs to be an accept state since $0^a10^a \in B$, but then $M$ would accept $0^b10^a \not\in B$, which is an error. $\square$

A proof template for showing that a language is not regular.

① Suppose for contradiction that some DFA $M$ recognizes $L$.

② Consider the set $S = $ {$\ldots$}.

③ Since $S$ is infinite and $M$ has finitely many states, there must be two strings $s_a, s_b$ $\in S$ such that $s_a \neq s_b$ and both end in the same state of $M$.

④ Consider appending $t$ to both $s_a$ and $s_b$.

⑤ Since $s_a$ and $s_b$ end in the same state of $M$, then $s_at$ and $s_bt$ also end in the same state $q$ of $M$. Since $s_at\in L$ and $s_bt\not\in L$, $M$ does not recognize $L$.

⑥ Since $M$ was arbitrary, no DFA recognizes $L$.

$S$ must be infinite,
and for *every pair* of prefixes $s_a\neq s_b\in S$,
there is a suffix $t$ such that one of of $s_at, s_bt$
is in $L$ but not the other.

We don’t get to choose $s_a$ and $s_b$!

$t$ should be an accept suffix for $s_a$ but not $s_b$.

Suppose for contradiction that some DFA $M$ recognizes $L$.

Consider the set $S = $ $\{0^n : n\geq 0 \}$.

Since $S$ is infinite and $M$ has finitely many states, there must be two strings $0^a, 0^b$ $\in S$ with $a \neq b$ that both end in the same state of $M$.

Consider appending $1^a$ to both $0^a$ and $0^b$.

Since $0^a$ and $0^b$ end in the same state of $M$, then $0^a1^a$ and $0^b1^a$ also end in the same state $q$ of $M$. Since $0^a1^a\in L$ and $0^b1^a\not\in L$, $M$ does not recognize $L$.

Since $M$ was arbitrary, no DFA recognizes $L$.

Suppose for contradiction that some DFA $M$ recognizes $L$.

Consider the set $S = $ $\{(^n : n\geq 0 \}$.

Since $S$ is infinite and $M$ has finitely many states, there must be two strings $(^a, (^b$ $\in S$ with $a \neq b$ that both end in the same state of $M$.

Consider appending $)^a$ to both $(^a$ and $(^b$.

Since $(^a$ and $(^b$ end in the same state of $M$, then $(^a)^a$ and $(^b)^a$ also end in the same state $q$ of $M$. Since $(^a)^a\in L$ and $(^b)^a\not\in L$, $M$ does not recognize $L$.

Since $M$ was arbitrary, no DFA recognizes $L$.

Suppose that for a language $L$, the set $S$ is a *largest* set of prefix strings
with the property that for every pair $s_a \neq s_b \in S$,
there is some string $t$ such that one of $s_at, s_bt$ is in $L$ but the other isn’t.

If $S$ is infinite, then $L$ is not regular.

If $S$ is finite, then the minimal DFA for $L$ has precisely $|S|$ states, one reached by each member of $S$.

- DFAs $\equiv$ NFAs $\equiv$ regular expressions.
- We’ve shown how to go from a regular expression to an NFA to a DFA.
- (And going from an NFA to a DFA can lead to exponential blow-up.)
- But we won’t show how to go from an NFA/DFA to a regular expression.
- Finite $\subset$ regular $\subset$ context-free languages.
- To show that a language is regular, construct a DFA/NFA/RegEx for it.
- To show that it is not regular, use the proof method from this lecture.