CSE 311 Lecture 20:
Regular Expressions

Emina Torlak and Kevin Zatloukal

Topics

Structural induction
A brief review of Lecture 19.
Using structural induction
Example proofs about recursively defined numbers, strings, and trees.
Regular expressions
Definition, examples, applications.

Structural induction

A brief review of Lecture 19.

Structural induction proof template

① Let $P(x)$ be [ definition of $P(x)$ ].
We will show that $P(x)$ is true for every $x\in S$ by structural induction.
② Base cases:
[ Proof of $P(s_0), \ldots, P(s_m)$. ]
③ Inductive hypothesis:
Assume that $P(y_0), \ldots, P(y_k)$ are true for some arbitrary $y_0, \ldots, y_k \in S$.
④ Inductive step:
We want to prove that $P(y)$ is true.
[ Proof of $P(y)$. The proof must invoke the structural inductive hypothesis. ]
⑤ The result follows for all $x\in S$ by structural induction.
 
Recursive definition of $S$
Basis step: $s_0\in S, \ldots, s_m\in S$.
Recursive step:
if $y_0, \ldots, y_k\in S$, then $y\in S$.

If the recursive step of $S$ includes multiple rules for constructing new elements from existing elements, then
assume $P$ for the existing elements in every rule, and
prove $P$ for the new element in every rule.

Structural induction works just like ordinary induction

① Let $P(x)$ be [ definition of $P(x)$ ].
We will show that $P(x)$ is true for every $x\in \N$ by structural induction.
② Base cases:
[ Proof of $P(0)$. ]
③ Inductive hypothesis:
Assume that $P(n)$ is true for some arbitrary $n \in \N$.
 
④ Inductive step:
We want to prove that $P(n+1)$ is true.
[ Proof of $P(n+1)$. The proof must invoke the structural inductive hypothesis. ]
⑤ The result follows for all $x\in \N$ by structural induction.
 
Recursive definition of $\N$
Basis step: $0 \in \N$.
Recursive step:
if $n\in \N$, then $n+1\in \N$.

Ordinary induction is just structural induction applied to the recursively defined set of natural numbers!

Understanding structural induction

$\rule{P(\Node); \forall L, R\in S. (P(L)\wedge P(R))\rightarrow P(\Tree(\Node,L,R))}{\forall x \in S. P(x)}$

How do we get $P(\Tree(\Node,\Tree(\Node, \Node,\Node)))$ from $P(\Node)$ and $\forall L,R\in S. (P(L)\wedge P(R))\rightarrow P(\Tree(\Node,L,R))$?

Define $S$ by
Basis: $\Node \in S$.
Recursive:
if $L, R\in S$, then
$\Tree(\Node,L,R)\in S$
1. First, we have $\forall L,R\in S. (P(L)\wedge P(R))\rightarrow P(\Tree(\Node,L,R))$  
2. Next, we have $P(\Node)$. $P(\Node)$
3. Intro $\wedge$ on 2 gives us $P(\Node)\wedge P(\Node)$. $P(\Node)\wedge P(\Node)$
4. Elim $\forall$ on 1 gives us $(P(\Node)\wedge P(\Node))\rightarrow P(\Tree(\Node, \Node, \Node))$. $\ \Downarrow_{\ (P(\Node)\wedge P(\Node))\rightarrow P(\Tree(\Node, \Node,\Node))}$
5. Modus Ponens on 3 and 4 gives us $P(\Tree(\Node, \Node,\Node))$. $P(\Tree(\Node, \Node,\Node))$
6. Intro $\wedge$ on 2 and 5 gives us $P(\Node)\wedge P(\Tree(\Node, \Node,\Node))$. $P(\Node)\wedge P(\Tree(\Node, \Node,\Node))$
7. Elim $\forall$ on 1 gives us $(P(\Node)\wedge P(\Tree(\Node, \Node,\Node))\rightarrow P(\Tree(\Node,\Tree(\Node, \Node, \Node)))$. $\ \Downarrow_{\ (P(\Node)\wedge P(\Tree(\Node, \Node,\Node))\rightarrow P(\Tree(\Node,\Tree(\Node, \Node, \Node)))}$
8. Modus Ponens on 6 and 7 gives us $P(\Tree(\Node,\Tree(\Node, \Node, \Node)))$. $P(\Tree(\Node,\Tree(\Node, \Node, \Node)))$

Using structural induction

Example proofs about recursively defined numbers, strings, and trees.

Prove that every $x\in S$ is divisible by 3

① Let $P(x)$ be $3 \vert x$.
We will show that $P(x)$ is true for every $x\in S$ by structural induction.
② Base cases ($x=6$, $x=15$):
$3 \vert 6$ so $P(6)$ holds, and $3 \vert 15$ so $P(15)$ holds.
③ Inductive hypothesis:
Assume that $P(x), P(y)$ are true for some arbitrary $x,y \in S$.
④ Inductive step:
We want to prove that $P(x+y)$ is true.
By the inductive hypothesis, $3\vert x$ and $3\vert y$, so $x = 3i$ and $y = 3j$ for some $i,j\in\Z$. Therefore, $x + y = 3i + 3j = 3(i+j)$ so $3\vert (x+y)$. Hence, $P(x+y)$ is true.
⑤ The result follows for all $x\in S$ by structural induction.
 
Define $S$ by
Basis: $6\in S$, $15\in S$.
Recursive: if $x,y\in S$, then $x+y\in S$.

Prove $\op{len}(x\bullet y) = \op{len}(x) + \op{len}(y)$ for all $x,y\in\Sigma^* $

What object ($x$ or $y$) to do structural induction on?

① Let $P(y)$ be $\forall x\in\Sigma^* . \op{len}(x\bullet y) = \op{len}(x) + \op{len}(y)$.
We will show that $P(y)$ is true for every $y\in \Sigma^* $ by structural induction.
② Base case ($y=\varepsilon$):
Let $x$ in $\Sigma^* $ be arbitrary. Then, $\op{len}(x\bullet \varepsilon)$ $=$ $\op{len}(x)$ $=$ $\op{len}(x) + \op{len}(\varepsilon)$ since $\op{len}(\varepsilon) = 0$. So $P(\varepsilon)$ is true.
③ Inductive hypothesis:
Assume that $P(w)$ is true for some arbitrary $w \in \Sigma^* $.
④ Inductive step:
We want to prove that $P(wa)$ is true for every $a\in\Sigma$.
Let $a\in\Sigma$ and $x\in\Sigma^* $ be arbitrary. Then
So $\op{len}(x\bullet wa)=\op{len}(x) + \op{len}(wa)$ for all $x\in\Sigma^* $, and $P(wa)$ is true.
⑤ The result follows for all $y\in \Sigma^* $ by structural induction.
 
Define $\Sigma^* $ by
Basis: $\varepsilon \in \Sigma^* $.
Recursive:
if $w\in\Sigma^* $ and $a\in\Sigma$,
then $wa\in\Sigma^* $
Length
$\op{len}(\varepsilon) = 0$
$\op{len}(wa) = \op{len}(w) + 1$
Concatenation
$x\bullet \varepsilon = x$
$x\bullet (wa) = (x\bullet w)a$

Prove $\Size{t}\leq 2^{\Height{t}+1}-1$ for every rooted binary tree $t$

① Let $P(t)$ be $\Size{t}\leq 2^{\Height{t}+1}-1$.
We will show that $P(t)$ is true for every $t\in S $ by structural induction.
② Base case ($t=\Node$):
$\Size{\Node} = 1 = 2^1 - 1 = 2^{0+1}-1 = 2^{\Height{\Node}+1}-1$ so $P(\Node)$ is true.
③ Inductive hypothesis:
Assume that $P(L)$ and $P(R)$ are true for some arbitrary $L, R \in S$.
④ Inductive step:
We want to prove that $P(\Tree(\Node,L,R))$ is true.
⑤ The result follows for all $t\in S$ by structural induction.
 
Define $S$ by
Basis: $\Node \in S$.
Recursive:
if $L, R\in S$, then
$\Tree(\Node,L,R)\in S$
Size
$\Size{\Node} = 1$
$\Size{\Tree(\Node,L,R)} = $
$\quad 1 + \Size{L} + \Size{R}$
Height
$\Height{\Node} = 0$
$\Height{\Tree(\Node,L,R))} = $
$\quad 1 + \max(\Height{L}, \Height{R})$

Regular expressions

Definition, examples, applications.

Sets of strings as languages

A language is a sets of strings with specific syntax, e.g.:
Syntactically correct Java/C/C++ programs.
The set $\Sigma^* $ of all strings over the alphabet $\Sigma$.
Palindromes over $\Sigma$.
Binary strings with no 1’s before 0’s.
Regular expressions let us specify regular languages, e.g.:
All binary strings.
The strings $\{0000, 0010, 1000, 1010\}$.
All strings that contain the string “CSE311”.

Regular expressions over $\Sigma $: syntax

Basis step:
$\emptyset, \varepsilon$ are regular expressions.
$a$ is a regular expression for any $a\in\Sigma$.
Recursive step:
If $A$ and $B$ are regular expressions, then so are
$AB$, $A\cup B$, and $A^* $.
Examples: regular expressions of $\Sigma = \{0, 1\}$
Basis: $\emptyset$, $\varepsilon$, $0$, $1$.
Recursive: $01011$, $0^* 1^* $, $(0\cup 1)0(0\cup 1)0$, etc.

Regular expressions over $\Sigma $: semantics

A regular expression over $\Sigma $ represents a set of strings over $\Sigma $.
$\emptyset$ represents the set with no strings.
$\varepsilon$ represents the set $\{\varepsilon\}$.
$a$ represents the set $\{a\}$.
$AB$ represents the concatenation of the sets represented by $A$ and $B$: $\{ a\bullet b \ \vert\ a\in A, b\in B\}$.
$A\cup B$ represents the union of the sets represented by $A$ and $B$: $A\cup B$.
$A^* $ represents the concatenation of the set represented by $A$ with itself zero or more times: $A^* = \{\varepsilon\} \cup A \cup AA \cup AAA \cup AAAA \cup \ldots$

This just defines a recursive function definition for computing the meaning of a regular expression:

Examples of regular expressions

$001^* $
Binary strings with “00” followed by any number of 1s.
$0^* 1^* $
Binary strings with any number of 0s followed by any number of 1s.
$(0\cup 1)0(0\cup 1)0$
$\{0000, 0010, 1000, 1010\}$
$(0^* 1^* )^* $
All binary strings.
$(0 \cup 1)^* 0110 (0 \cup 1)^* $
Binary strings that contain “0110”.

Regular expressions in practice

Used to define the tokens in a programming language.
Legal variable names, keywords, etc.
Used in grep, a Unix program that searches for patterns in a set of files.
For example, grep "311" *.md searches for the string “311” in all Markdown files in the current directory.
Used in programs to process strings.
These slides are generated with the help of regular expressions :)

Summary

Use structural induction to prove properties of recursive structures.
Follows from ordinary induction but is easier to use.
As powerful as ordinary induction.
To prove $\forall x\in S. P(x)$ using structural induction:
Show that $P$ holds for the elements in the basis step of $S$.
Assume $P$ for every existing element of $S$ named in the recursive step.
Prove $P$ for every new element of $S$ created in the recursive step.
A regular expression defines a set of strings over an alphabet $\Sigma $.
$\emptyset$, $\varepsilon$, and $a\in\Sigma$ are regular expressions.
If $A$ and $B$ are regular expressions, then so are $(AB), (A\cup B), A^* $.
Many practical applications, from grep to everyday programming.