Regular Expressions

Emina Torlak and Kevin Zatloukal

- Structural induction
- A brief review of Lecture 19.
- Using structural induction
- Example proofs about recursively defined numbers, strings, and trees.
- Regular expressions
- Definition, examples, applications.

A brief review of Lecture 19.

- ① Let $P(x)$ be
*[ definition of $P(x)$ ]*. - We will show that $P(x)$ is true for every $x\in S$ by structural induction.
- ② Base cases:
*[ Proof of $P(s_0), \ldots, P(s_m)$. ]*- ③ Inductive hypothesis:
- Assume that $P(y_0), \ldots, P(y_k)$ are true for some arbitrary $y_0, \ldots, y_k \in S$.
- ④ Inductive step:
- We want to prove that $P(y)$ is true.
*[ Proof of $P(y)$. The proof***must**invoke the structural inductive hypothesis. ]- ⑤ The result follows for all $x\in S$ by structural induction.

- Recursive definition of $S$
**Basis step**: $s_0\in S, \ldots, s_m\in S$.**Recursive step**:- if $y_0, \ldots, y_k\in S$, then $y\in S$.

If the **recursive step** of $S$ includes multiple rules for constructing new elements from existing elements, then

③ **assume** $P$ for the existing elements in every rule, and

④ **prove** $P$ for the new element in every rule.

- ① Let $P(x)$ be
*[ definition of $P(x)$ ]*. - We will show that $P(x)$ is true for every $x\in \N$ by structural induction.
- ② Base cases:
*[ Proof of $P(0)$. ]*- ③ Inductive hypothesis:
- Assume that $P(n)$ is true for some arbitrary $n \in \N$.
- ④ Inductive step:
- We want to prove that $P(n+1)$ is true.
*[ Proof of $P(n+1)$. The proof***must**invoke the structural inductive hypothesis. ]- ⑤ The result follows for all $x\in \N$ by structural induction.

- Recursive definition of $\N$
**Basis step**: $0 \in \N$.**Recursive step**:- if $n\in \N$, then $n+1\in \N$.

Ordinary induction is just structural induction applied to the recursively defined set of natural numbers!

$\rule{P(\Node); \forall L, R\in S. (P(L)\wedge P(R))\rightarrow P(\Tree(\Node,L,R))}{\forall x \in S. P(x)}$

How do we get $P(\Tree(\Node,\Tree(\Node, \Node,\Node)))$ from $P(\Node)$ and $\forall L,R\in S. (P(L)\wedge P(R))\rightarrow P(\Tree(\Node,L,R))$?

- Define $S$ by
**Basis**: $\Node \in S$.**Recursive**:- if $L, R\in S$, then
- $\Tree(\Node,L,R)\in S$

1. | First, we have $\forall L,R\in S. (P(L)\wedge P(R))\rightarrow P(\Tree(\Node,L,R))$ | |

2. | Next, we have $P(\Node)$. | $P(\Node)$ |

3. | Intro $\wedge$ on 2 gives us $P(\Node)\wedge P(\Node)$. | $P(\Node)\wedge P(\Node)$ |

4. | Elim $\forall$ on 1 gives us $(P(\Node)\wedge P(\Node))\rightarrow P(\Tree(\Node, \Node, \Node))$. | $\ \Downarrow_{\ (P(\Node)\wedge P(\Node))\rightarrow P(\Tree(\Node, \Node,\Node))}$ |

5. | Modus Ponens on 3 and 4 gives us $P(\Tree(\Node, \Node,\Node))$. | $P(\Tree(\Node, \Node,\Node))$ |

6. | Intro $\wedge$ on 2 and 5 gives us $P(\Node)\wedge P(\Tree(\Node, \Node,\Node))$. | $P(\Node)\wedge P(\Tree(\Node, \Node,\Node))$ |

7. | Elim $\forall$ on 1 gives us $(P(\Node)\wedge P(\Tree(\Node, \Node,\Node))\rightarrow P(\Tree(\Node,\Tree(\Node, \Node, \Node)))$. | $\ \Downarrow_{\ (P(\Node)\wedge P(\Tree(\Node, \Node,\Node))\rightarrow P(\Tree(\Node,\Tree(\Node, \Node, \Node)))}$ |

8. | Modus Ponens on 6 and 7 gives us $P(\Tree(\Node,\Tree(\Node, \Node, \Node)))$. | $P(\Tree(\Node,\Tree(\Node, \Node, \Node)))$ |

Example proofs about recursively defined numbers, strings, and trees.

- ① Let $P(x)$ be $3 \vert x$.
- We will show that $P(x)$ is true for every $x\in S$ by structural induction.
- ② Base cases ($x=6$, $x=15$):
- $3 \vert 6$ so $P(6)$ holds, and $3 \vert 15$ so $P(15)$ holds.
- ③ Inductive hypothesis:
- Assume that $P(x), P(y)$ are true for some arbitrary $x,y \in S$.
- ④ Inductive step:
- We want to prove that $P(x+y)$ is true.
- By the inductive hypothesis, $3\vert x$ and $3\vert y$, so $x = 3i$ and $y = 3j$ for some $i,j\in\Z$. Therefore, $x + y = 3i + 3j = 3(i+j)$ so $3\vert (x+y)$. Hence, $P(x+y)$ is true.
- ⑤ The result follows for all $x\in S$ by structural induction.

- Define $S$ by
**Basis:**$6\in S$, $15\in S$.**Recursive:**if $x,y\in S$, then $x+y\in S$.

What object ($x$ or $y$) to do structural induction on?

- ① Let $P(y)$ be $\forall x\in\Sigma^* . \op{len}(x\bullet y) = \op{len}(x) + \op{len}(y)$.
- We will show that $P(y)$ is true for every $y\in \Sigma^* $ by structural induction.
- ② Base case ($y=\varepsilon$):
- Let $x$ in $\Sigma^* $ be arbitrary. Then, $\op{len}(x\bullet \varepsilon)$ $=$ $\op{len}(x)$ $=$ $\op{len}(x) + \op{len}(\varepsilon)$ since $\op{len}(\varepsilon) = 0$. So $P(\varepsilon)$ is true.
- ③ Inductive hypothesis:
- Assume that $P(w)$ is true for some arbitrary $w \in \Sigma^* $.
- ④ Inductive step:
- We want to prove that $P(wa)$ is true for every $a\in\Sigma$.
- Let $a\in\Sigma$ and $x\in\Sigma^* $ be arbitrary. Then
- So $\op{len}(x\bullet wa)=\op{len}(x) + \op{len}(wa)$ for all $x\in\Sigma^* $, and $P(wa)$ is true.
- ⑤ The result follows for all $y\in \Sigma^* $ by structural induction.

- Define $\Sigma^* $ by
**Basis**: $\varepsilon \in \Sigma^* $.**Recursive**:- if $w\in\Sigma^* $ and $a\in\Sigma$,
- then $wa\in\Sigma^* $
- Length
- $\op{len}(\varepsilon) = 0$
- $\op{len}(wa) = \op{len}(w) + 1$
- Concatenation
- $x\bullet \varepsilon = x$
- $x\bullet (wa) = (x\bullet w)a$

- ① Let $P(t)$ be $\Size{t}\leq 2^{\Height{t}+1}-1$.
- We will show that $P(t)$ is true for every $t\in S $ by structural induction.
- ② Base case ($t=\Node$):
- $\Size{\Node} = 1 = 2^1 - 1 = 2^{0+1}-1 = 2^{\Height{\Node}+1}-1$ so $P(\Node)$ is true.
- ③ Inductive hypothesis:
- Assume that $P(L)$ and $P(R)$ are true for some arbitrary $L, R \in S$.
- ④ Inductive step:
- We want to prove that $P(\Tree(\Node,L,R))$ is true.
- ⑤ The result follows for all $t\in S$ by structural induction.

- Define $S$ by
**Basis**: $\Node \in S$.**Recursive**:- if $L, R\in S$, then
- $\Tree(\Node,L,R)\in S$
- Size
- $\Size{\Node} = 1$
- $\Size{\Tree(\Node,L,R)} = $

$\quad 1 + \Size{L} + \Size{R}$ - Height
- $\Height{\Node} = 0$
- $\Height{\Tree(\Node,L,R))} = $

$\quad 1 + \max(\Height{L}, \Height{R})$

Definition, examples, applications.

- A
*language*is a sets of strings with specific syntax, e.g.: - Syntactically correct Java/C/C++ programs.
- The set $\Sigma^* $ of all strings over the alphabet $\Sigma$.
- Palindromes over $\Sigma$.
- Binary strings with no 1’s before 0’s.

**Regular expressions**let us specify*regular languages*, e.g.:- All binary strings.
- The strings $\{0000, 0010, 1000, 1010\}$.
- All strings that contain the string “CSE311”.

- Basis step:
- $\emptyset, \varepsilon$ are regular expressions.
- $a$ is a regular expression for any $a\in\Sigma$.
- Recursive step:
- If $A$ and $B$ are regular expressions, then so are
- $AB$, $A\cup B$, and $A^* $.

- Examples: regular expressions of $\Sigma = \{0, 1\}$
- Basis: $\emptyset$, $\varepsilon$, $0$, $1$.
- Recursive: $01011$, $0^* 1^* $, $(0\cup 1)0(0\cup 1)0$, etc.

- A regular expression over $\Sigma $ represents a set of strings over $\Sigma $.
- $\emptyset$ represents the set with no strings.
- $\varepsilon$ represents the set $\{\varepsilon\}$.
- $a$ represents the set $\{a\}$.
- $AB$ represents the concatenation of the sets represented by $A$ and $B$: $\{ a\bullet b \ \vert\ a\in A, b\in B\}$.
- $A\cup B$ represents the union of the sets represented by $A$ and $B$: $A\cup B$.
- $A^* $ represents the concatenation of the set represented by $A$ with itself zero or more times: $A^* = \{\varepsilon\} \cup A \cup AA \cup AAA \cup AAAA \cup \ldots$

This just defines a recursive function definition for computing
the meaning of a regular expression:

- $001^* $
- Binary strings with “00” followed by any number of 1s.
- $0^* 1^* $
- Binary strings with any number of 0s followed by any number of 1s.
- $(0\cup 1)0(0\cup 1)0$
- $\{0000, 0010, 1000, 1010\}$
- $(0^* 1^* )^* $
- All binary strings.
- $(0 \cup 1)^* 0110 (0 \cup 1)^* $
- Binary strings that contain “0110”.

- Used to define the
*tokens*in a programming language. - Legal variable names, keywords, etc.
- Used in
`grep`

, a Unix program that searches for patterns in a set of files. - For example,
`grep "311" *.md`

searches for the string “311” in all Markdown files in the current directory. - Used in programs to process strings.
- These slides are generated with the help of regular expressions :)

- Use structural induction to prove properties of recursive structures.
- Follows from ordinary induction but is easier to use.
- As powerful as ordinary induction.
- To prove $\forall x\in S. P(x)$ using structural induction:
- Show that $P$ holds for the elements in the basis step of $S$.
- Assume $P$ for every existing element of $S$ named in the recursive step.
- Prove $P$ for every new element of $S$ created in the recursive step.
- A regular expression defines a set of strings over an alphabet $\Sigma $.
- $\emptyset$, $\varepsilon$, and $a\in\Sigma$ are regular expressions.
- If $A$ and $B$ are regular expressions, then so are $(AB), (A\cup B), A^* $.
- Many practical applications, from
`grep`

to everyday programming.