For every NFA there is a DFA that recognizes exactly the same language.
Proof (and algorithm) idea:
The DFA constructed for an NFA keeps track of all the states that a prefix of an input string can reach in the NFA.
So there will be one state in the DFA for each subset of the states of the NFA that can be reached by some string.
We’ll see how to construct the start state, remaining states and transitions, and the final states of the DFA.
NFAs to DFAs: the start state
The start state of the DFA represents the following set of states in the NFA:
All states reachable from the start state of the NFA using only $\varepsilon$ edges.
NFA
DFA
NFAs to DFAs: states and transitions
Repeat until fixed point:
Let $D_Q$ be a state of the DFA corresponding to a set $Q$ of the NFA states.
Let $a\in\Sigma$ be a symbol for which $D_Q$ has no outgoing edge.
Let $T$ be the (possibly empty) set of NFA states reachable from some state in $Q$ by following one $a$ edge and zero or more $\varepsilon$ edges.
Add a state $D_T$ to the DFA, if not included, that represents the set $T$.
Add an edge labeled $a$ from $D_Q$ to $D_T$.
NFA
DFA
NFAs to DFAs: final states
The final states of the DFA:
Every DFA state that represents a set of NFA states containing a final state.
NFA
DFA
Example: NFA to DFA conversion
NFA
DFA
Exponential blow-up in simulating nondeterminism
In general the DFA might need a state for every subset of states of the NFA.
Power set of the set of states of the NFA.
$n$-state NFA yields DFA with up to $2^n$ states.
We saw an example of this worst case outcome.
The famous “P=NP?” question asks whether a similar blow-up is always necessary to get rid of nondeterminism for polynomial-time algorithms.
DFAs $\equiv$ NFAs $\equiv$ regular expressions
They are all the same and represent regular languages.
Equivalence of DFAs, NFAs, and regular expressions
We have shown how to build an optimal DFA for every regular expression.
Build an NFA.
Convert the NFA to a DFA using the subset construction.
Minimize the resulting DFA.
Theorem
A language is recognized by a DFA (or NFA) if and only if it has a regular expression.
You need to know this fact but we won’t ask you anything about the “only if” direction from DFAs/NFAs to regular expressions.
Languages represented by NFAs, DFAs, and regular expressions are called regular languages.
Languages and representations
Regular, context-free, and other languages.
A hierarchy of languages and representations
In HW8, you’ll (almost :) prove that regular languages are context-free.
How do we prove that all finite languages are regular?
By showing how to construct a DFA/NFA/RegEx for every finite language!
Converting a finite language to a regular expression
Convert each string in the language $L$ to a regular expression.
This is just each string itself.
Then put these regular expressions together using the $\cup$ operator.
The resulting regular expression accepts exactly the strings in $L$.
Example
$\{010, 11, 21\} \longrightarrow 010\cup11\cup21$
Back to languages and representations …
We saw in Lecture 21 that the language $B$ of all binary palindromes can be represented by a CFG.
We also said that $B$ can’t be represented by any regular expression.
How would you prove that?
Why isn’t $B$ (binary palindromes) a regular language?
If $B$ were regular, we could express it as a DFA/NFA/RegEx.
Let’s choose a DFA as our hypothetical representation.
Now, recall that $B$ consists of infinitely many strings $wv\overline{w}$
where $\overline{w}$ is the reverse of $w$ and $v\in \{\varepsilon, “0”, “1” \}$.
What would a DFA need to keep track of to decide $B$?
It would need to keep track of $w$ in order to check $\overline{w}$ against it.
But there are infinitely many possible $w$’s and finitely many DFA states!
This is the intuition for why $B$ is not regular.
Let’s see how to turn this intuition into a formal proof.
A strategy for proving that $B$ is not regular
Proof by contradiction:
Assume that $B$ is regular.
Therefore, there is a DFA $M$ that recognizes $B$.
Show that $M$ accepts or rejects a string it shouldn’t.
Key Idea 1
If two string prefixes collide by reaching the same state, a DFA can no longer distinguish their suffixes.
Key Idea 2
The machine $M$ has finitely many states, and since the strings in $B$ have infinitely many distinct prefixes, two of them must collide!
We choose an infinite set $S$ of prefixes.
This choice must ensure that for every pair of prefixes in $s_a\neq s_b\in S$,
there is a suffix $t$ such that one of of $s_at, s_bt$
is in $B$ but not the other.
Proving that $B$ is not regular
Suppose that some DFA $M$ recognizes $B$.
We show $M$ accepts or rejects a string it shouldn’t.
Consider the set $S = \{0^n1 : n \geq 0 \}$.
Since there are finitely many states in $M$ and infinitely many strings in $S$,
there exist strings $0^a1 \in S$ and $0^b1 \in S$ with $a\neq b$ that end in the same state of $M$.
Important: We don’t get to choose $a$ and $b$! We just know they exist.
Now, consider appending $0^a$ to both strings.
Since $0^a1$ and $0^b1$ end in the same state, so do $0^a10^a$ and $0^b10^a$, call it $q$.
But then $M$ must make a mistake: $q$ needs to be an accept state since
$0^a10^a \in B$, but then $M$ would accept $0^b10^a \not\in B$, which is an error. $\square$
Proving irregularity
A proof template for showing that a language is not regular.
A template for proving that a language $L$ is not regular
① Suppose for contradiction that some DFA $M$ recognizes $L$.
② Consider the set $S = $ {$\ldots$}.
③ Since $S$ is infinite and $M$ has finitely many states,
there must be two strings $s_a, s_b$ $\in S$ such that $s_a \neq s_b$ and both end in the same state of $M$.
④ Consider appending $t$ to both $s_a$ and $s_b$.
⑤ Since $s_a$ and $s_b$ end in the same state of $M$,
then $s_at$ and $s_bt$ also end in the same state $q$ of $M$.
Since $s_at\in L$ and $s_bt\not\in L$, $M$ does not recognize $L$.
⑥ Since $M$ was arbitrary, no DFA recognizes $L$.
$S$ must be infinite,
and for every pair of prefixes $s_a\neq s_b\in S$,
there is a suffix $t$ such that one of of $s_at, s_bt$
is in $L$ but not the other.
We don’t get to choose $s_a$ and $s_b$!
$t$ should be an accept suffix for $s_a$ but not $s_b$.
Example: prove that $L = \{0^n1^n : n \geq 0\}$ is not regular
Suppose for contradiction that some DFA $M$ recognizes $L$.
Consider the set $S = $ $\{0^n : n\geq 0 \}$.
Since $S$ is infinite and $M$ has finitely many states,
there must be two strings $0^a, 0^b$ $\in S$
with $a \neq b$ that both end in the same state of $M$.
Consider appending $1^a$ to both $0^a$ and $0^b$.
Since $0^a$ and $0^b$ end in the same state of $M$,
then $0^a1^a$ and $0^b1^a$ also end in the same state $q$ of $M$.
Since $0^a1^a\in L$ and $0^b1^a\not\in L$, $M$ does not recognize $L$.
Since $M$ was arbitrary, no DFA recognizes $L$.
Example: prove that $L = \{(^n)^n : n \geq 0\}$ is not regular
Suppose for contradiction that some DFA $M$ recognizes $L$.
Consider the set $S = $ $\{(^n : n\geq 0 \}$.
Since $S$ is infinite and $M$ has finitely many states,
there must be two strings $(^a, (^b$ $\in S$
with $a \neq b$ that both end in the same state of $M$.
Consider appending $)^a$ to both $(^a$ and $(^b$.
Since $(^a$ and $(^b$ end in the same state of $M$,
then $(^a)^a$ and $(^b)^a$ also end in the same state $q$ of $M$.
Since $(^a)^a\in L$ and $(^b)^a\not\in L$, $M$ does not recognize $L$.
Since $M$ was arbitrary, no DFA recognizes $L$.
A fun fact about this proof method
Suppose that for a language $L$, the set $S$ is a largest set of prefix strings
with the property that for every pair $s_a \neq s_b \in S$,
there is some string $t$ such that one of $s_at, s_bt$ is in $L$ but the other isn’t.
If $S$ is infinite, then $L$ is not regular.
If $S$ is finite, then the minimal DFA for $L$ has precisely
$|S|$ states, one reached by each member of $S$.
Summary
DFAs $\equiv$ NFAs $\equiv$ regular expressions.
We’ve shown how to go from a regular expression to an NFA to a DFA.
(And going from an NFA to a DFA can lead to exponential blow-up.)
But we won’t show how to go from an NFA/DFA to a regular expression.