Lecture 6 Discussion Questions

We highly recommend downloading the files for this Q&A and following along with us:

wget https://courses.cs.washington.edu/courses/cse391/22wi/lectures/6/questions6.zip
unzip questions6.zip

Suppose that we have a file named words.txt with the following contents
```
These are some words
a11 0f the5e w0rds c0n7ain number5
S0me 0f these w0rds do
```
- Write a command that identifies all words which are exactly four characters long.
- Write a command that identifies all words which are exactly four characters long and contain only letters (both upper and lowercase).
- Write a command that identifies all words which are at least four characters long and contain only letters (both upper and lowercase).
Solutions
```
grep -E "\<....\>" words.txt
```
```
grep -E "\<[a-zA-Z]{4}\>" words.txt
```
```
grep -E "\<[a-zA-Z]{4,}\>" words.txt
```
Suppose that we have the following file named vegetables.txt 🥦
```
broccoli
asparagus
potato
lettuce
zucchini
brocccccccoli
```
- Come up with a grep command that correctly identifies all vegetables that have two or more consecutive c‘s in their name.
- Come up with a grep command that correctly identifies all vegetables that have two or more c‘s anywhere in their name.
Solutions
```
grep -E "cc+" vegetables.txt
# or
grep -E "c{2,}" vegetables.txt
```
```
grep -E ".*c.*c.*" vegetables.txt
```
Using the file from Q2: Come up with a grep command that correctly identifies all vegetables that have two or more consecutive repeated letters in their name.
Solutions
```
grep -E "([a-z])\1+" vegetables.txt
```

Suppose we have a file kitkats.txt with the following contents:

kit kat
kat kit
my favorite part of the kit is the kat
cats do not like kit kats
this line only has kit
this line only has kat

Write a command that finds all lines which contain kit and kat in any order.

Solutions

grep -E "kit" kitkats.txt | grep -E "kat"

Suppose that we have the following file named emails.txt. This file contains a user’s first and last name, followed by a comma, and then their email address. What is a grep command that determines which users have exactly their last name as their email
```
larry ruzzo, ruzzo@cs.washington.edu
zorah fung, zfung@yahoo.com
hunter schafer, hschafer@uw.edu
bennet goeckner, goeckner@math.uw.edu
ruth anderson, andersonr@gmail.com
```
In other words, our command should correctly identify that Larry Ruzzo and Bennet Goeckner have their last names as their email address.
Solutions
```
grep -E "[a-z]+ ([a-z]+), \1@[a-z]+\.[a-z]+" emails.txt
```
The backend team at faang needs your help - we have lots of new products and they’re flying off the shelves like crazy (apparently you can sell happiness). In order to track all these transactions, each sale is assigned a unique ticket id. A ticket id is defined by the following properties:
- It must contain exactly 16 letters (upper or lowercase) and numbers
- To improve readability, the letters may optionally be grouped into segments that are multiples of length four delimited by dashes. However, the string may not end with a dash.
The following are valid ticket ids:
```
1234567891011112
1234-4567-8910-1112
aBcD-Ef79-8122-fd01
aBcDEf798122-fd01
```
The following are not valid ticket ids:
```
12345                            #too short
1233333333333333333333333333     #too long
1234-4567-8910-11?2              #illegal character
1234567891011112-                #ends with dash
```
- Come up with a grep command that identifies valid ticket id’s in the file tickets.txt
- Write a command that identifies how many unique valid ticket id’s are in the file ticket.txt.
- Challenge: Come up with a grep command that identifies valid ticket id’s with the added constraint that if there is a single dash, all groups of four must be separated by a dash. (i.e. Now aBcDEf798122-fd01 is not a valid ticket id).
Solutions
```
grep -E "^([a-zA-Z0-9]{4}-?){3}[a-zA-Z0-9]{4}$" tickets.txt
```
```
grep -E "^([a-zA-Z0-9]{4}-?){3}[a-zA-Z0-9]{4}$" tickets.txt | sort | uniq | wc -l
```
```
grep -E "^[a-zA-Z0-9]{4}(-?)[a-zA-Z0-9]{4}\1[a-zA-Z0-9]{4}\1[a-zA-Z0-9]{4}$" tickets.txt
```