CSE 311 Lecture 26: Limitations of DFAs, NFAs, and Regular Expressions

Emina Torlak and Sami Davies

$\newcommand{\nt}[1]{\mathbf{#1}}$ $\newcommand{\S}{\nt{S}}$ $\newcommand{\OR}{\,\vert\,}$

Topics

DFAs $\equiv$ NFAs $\equiv$ regular expressions: A quick review of Lecture 25.
Languages and representations: Regular, context-free, and other languages.
Proving irregularity: A proof template for showing that a language is not regular.

DFAs $\equiv$ NFAs $\equiv$ regular expressions

A quick review of Lecture 25.

Equivalence of DFAs, NFAs, and regular expressions

We have shown how to build an optimal DFA for every regular expression.: Build an NFA.; Convert the NFA to a DFA using the subset construction.; Minimize the resulting DFA.

Theorem: A language is recognized by a DFA (or NFA) if and only if it has a regular expression.

You need to know this fact but we won’t ask you anything about the “only if” direction from DFAs/NFAs to regular expressions.

Languages represented by NFAs, DFAs, and regular expressions are called regular languages.

Languages and representations

Regular, context-free, and other languages.

A hierarchy of languages and representations

How do we prove that all finite languages are regular? By showing how to construct a DFA/NFA/RegEx for every finite language!

Converting a finite language to a regular expression

Convert each string in the language $L$ to a regular expression.: This is just each string itself.
Then put these regular expressions together using the $\cup$ operator.: The resulting regular expression accepts exactly the strings in $L$.
Example: $\{010, 11, 21\} \longrightarrow 010\cup11\cup21$

Back to languages and representations …

How do we prove that all regular languages are context-free? By showing how to construct a CFG for every regular language!

Converting a regular expression to a CFG

Use the following function on regular expressions:: $\mathsf{cfg}(\emptyset) =$ the CFG with no productions.; $\mathsf{cfg}(\varepsilon) =$ the CFG with just the production $S_\varepsilon \to \varepsilon$.; $\mathsf{cfg}(a) =$ the CFG with just the production $S_a \to a$ for every $a\in\Sigma$.; $\mathsf{cfg}(AB) =$ the CFG with the productions $\mathsf{cfg}(A)$, $\mathsf{cfg}(B)$, and $S_{AB} \to S_A S_B$.; $\mathsf{cfg}(A\cup B) =$ the CFG with the productions $\mathsf{cfg}(A)$, $\mathsf{cfg}(B)$, and $S_{A\cup B} \to S_A \OR S_B$.; $\mathsf{cfg}(A^*) =$ the CFG with the productions $\mathsf{cfg}(A)$ and $S_{A *} \to \varepsilon \OR S_A S_{A *}$.
Example: $\mathsf{cfg}((1\cup0)^*0)$: $\begin{align} S_{(1\cup0)^*0} &\to S_{(1\cup0)^*}S_0\\ S_{(1\cup0)^*} &\to \varepsilon \OR S_{1\cup0}S_{(1\cup0)^*}\\ S_{(1\cup0)} &\to S_1 \OR S_0\\ S_1 &\to 1\\ S_0 &\to 0 \end{align}$

Back again to languages and representations …

We saw in Lecture 21 that the language $B$ of all binary palindromes can be represented by a CFG. We also said that $B$ can’t be represented by any regular expression. How would you prove that?

Why isn’t $B$ (binary palindromes) a regular language?

$\newcommand{\nt}[1]{\mathbf{#1}}$ $\newcommand{\S}{\nt{S}}$ $\newcommand{\OR}{\,\vert\,}$

If $B$ were regular, we could express it as a DFA/NFA/RegEx.: Let’s choose a DFA as our hypothetical representation.

Now, recall that $B$ consists of infinitely many strings $wv\overline{w}$: where $\overline{w}$ is the reverse of $w$ and $v\in \{\varepsilon, “0”, “1” \}$.

What would a DFA need to keep track of to decide $B$?: It would need to keep track of $w$ in order to check $\overline{w}$ against it.; But there are infinitely many possible $w$’s and finitely many DFA states!

This is the intuition for why $B$ is not regular. Let’s see how to turn this intuition into a formal proof.

A strategy for proving that $B$ is not regular

Proof by contradiction:: Assume that $B$ is regular.; Therefore, there is a DFA $M$ that recognizes $B$.; Show that $M$ accepts or rejects a string it shouldn’t.

Key Idea 1: If two string prefixes collide by reaching the same state, a DFA can no longer distinguish their suffixes.

Key Idea 2: The machine $M$ has finitely many states, and since the strings in $B$ have infinitely many distinct prefixes, two of them must collide!

We choose an infinite set $S$ of prefixes. This choice must ensure that for every pair of prefixes in $s_a\neq s_b\in S$, there is a suffix $t$ such that one of of $s_at, s_bt$ is in $B$ but not the other.

$\begin{align} S &= \{1,01,001,0001, \ldots\} \\ &= \{0^n1 : n \geq 0 \} \end{align}$

Proving that $B$ is not regular

Suppose that some DFA $M$ recognizes $B$. We show $M$ accepts or rejects a string it shouldn’t. Consider the set $S = \{0^n1 : n \geq 0 \}$.

Since there are finitely many states in $M$ and infinitely many strings in $S$, there exist strings $0^a1 \in S$ and $0^b1 \in S$ with $a\neq b$ that end in the same state of $M$.

Important: We don’t get to choose $a$ and $b$! We just know they exist.

Now, consider appending $0^a$ to both strings.

Since $0^a1$ and $0^b1$ end in the same state, so do $0^a10^a$ and $0^b10^a$, call it $q$. But then $M$ must make a mistake: $q$ needs to be an accept state since $0^a10^a \in B$, but then $M$ would accept $0^b10^a \not\in B$, which is an error. $\square$

Proving irregularity

A proof template for showing that a language is not regular.

A template for proving that a language $L$ is not regular

① Suppose for contradiction that some DFA $M$ recognizes $L$.

② Consider the set $S = $ {$\ldots$}.

③ Since $S$ is infinite and $M$ has finitely many states, there must be two strings $s_a, s_b$ $\in S$ such that $s_a \neq s_b$ and both end in the same state of $M$.

④ Consider appending $t$ to both $s_a$ and $s_b$.

⑤ Since $s_a$ and $s_b$ end in the same state of $M$, then $s_at$ and $s_bt$ also end in the same state $q$ of $M$. Since $s_at\in L$ and $s_bt\not\in L$, $M$ does not recognize $L$.

⑥ Since $M$ was arbitrary, no DFA recognizes $L$.

$S$ must be infinite, and for every pair of prefixes $s_a\neq s_b\in S$, there is a suffix $t$ such that one of of $s_at, s_bt$ is in $L$ but not the other.

We don’t get to choose $s_a$ and $s_b$!

$t$ should be an accept suffix for $s_a$ but not $s_b$.

Example: prove that $L = \{0^n1^n : n \geq 0\}$ is not regular

Suppose for contradiction that some DFA $M$ recognizes $L$.

Consider the set $S = $ $\{0^n : n\geq 0 \}$.

Since $S$ is infinite and $M$ has finitely many states, there must be two strings $0^a, 0^b$ $\in S$ with $a \neq b$ that both end in the same state of $M$.

Consider appending $1^a$ to both $0^a$ and $0^b$.

Since $0^a$ and $0^b$ end in the same state of $M$, then $0^a1^a$ and $0^b1^a$ also end in the same state $q$ of $M$. Since $0^a1^a\in L$ and $0^b1^a\not\in L$, $M$ does not recognize $L$.

Since $M$ was arbitrary, no DFA recognizes $L$.

Example: prove that $L = \{(^n)^n : n \geq 0\}$ is not regular

Suppose for contradiction that some DFA $M$ recognizes $L$.

Consider the set $S = $ $\{(^n : n\geq 0 \}$.

Since $S$ is infinite and $M$ has finitely many states, there must be two strings $(^a, (^b$ $\in S$ with $a \neq b$ that both end in the same state of $M$.

Consider appending $)^a$ to both $(^a$ and $(^b$.

Since $(^a$ and $(^b$ end in the same state of $M$, then $(^a)^a$ and $(^b)^a$ also end in the same state $q$ of $M$. Since $(^a)^a\in L$ and $(^b)^a\not\in L$, $M$ does not recognize $L$.

Since $M$ was arbitrary, no DFA recognizes $L$.

A fun fact about this proof method

Suppose that for a language $L$, the set $S$ is a largest set of prefix strings with the property that for every pair $s_a \neq s_b \in S$, there is some string $t$ such that one of $s_at, s_bt$ is in $L$ but the other isn’t.

If $S$ is infinite, then $L$ is not regular.

If $S$ is finite, then the minimal DFA for $L$ has precisely $|S|$ states, one reached by each member of $S$.

Summary

DFAs $\equiv$ NFAs $\equiv$ regular expressions.: We’ve shown how to go from a regular expression to an NFA to a DFA.; (And going from an NFA to a DFA can lead to exponential blow-up.); But we won’t show how to go from an NFA/DFA to a regular expression.
Finite $\subset$ regular $\subset$ context-free languages.: To show that a language is regular, construct a DFA/NFA/RegEx for it.; To show that it is not regular, use the proof method from this lecture.