24sp ver.
Note: this is for the Spring 2024 iteration of CSE 391. Looking for a different quarter? Please visit https://courses.cs.washington.edu/courses/cse391/.
We highly recommend downloading the files for this Q&A and following along with us:
wget https://courses.cs.washington.edu/courses/cse391/24sp/lectures/6/questions6.zip
unzip questions6.zip
For many of these problems, you may find it helpful to refer to the Regex Syntax on our reference page.
-
Suppose that we have a file named
words.txt
with the following contentsThese are some words a11 0f the5e w0rds c0n7ain number5 S0me 0f these w0rds do
-
Write a command that identifies all words which are exactly four characters long.
-
Write a command that identifies all words which are exactly four characters long and contain only letters (both upper and lowercase).
-
Write a command that identifies all words which are at least four characters long and contain only letters (both upper and lowercase).
Solutions
grep -E "\<....\>" words.txt
grep -E "\<[a-zA-Z]{4}\>" words.txt
grep -E "\<[a-zA-Z]{4,}\>" words.txt
-
-
Suppose that we have the following file named
vegetables.txt
🥦broccoli asparagus potato lettuce zucchini brocccccccoli
-
Come up with a
grep
command that correctly identifies all vegetables that have two or more consecutivec
‘s in their name. -
Come up with a
grep
command that correctly identifies all vegetables that have two instances ofc
anywhere in their name.
Solutions
grep -E "cc+" vegetables.txt # or grep -E "c{2,}" vegetables.txt
grep -E ".*c.*c.*" vegetables.txt
-
-
Using the file from Q2: Come up with a
grep
command that correctly identifies all vegetables that have two or more consecutive repeated letters in their name.Solutions
grep -E "([a-z])\1+" vegetables.txt
-
Suppose we have a file
kitkats.txt
with the following contents:Write a command that finds all lines which containkit kat kat kit my favorite part of the kit is the kat cats do not like kit kats this line only has kit this line only has kat
kit
andkat
in any order.Solutions
grep -E "kit" kitkats.txt | grep -E "kat"
-
Suppose that we have the following file named
emails.txt
. This file contains a user’s first and last name, followed by a comma, and then their email address. What is agrep
command that determines which users have exactly their last name as their emailIn other words, our command should correctly identify that Larry Ruzzo and Bennet Goeckner have their last names as their email address.larry ruzzo, ruzzo@cs.washington.edu zorah fung, zfung@yahoo.com hunter schafer, hschafer@uw.edu bennet goeckner, goeckner@math.uw.edu ruth anderson, andersonr@gmail.com
Solutions
grep -E "[a-z]+ ([a-z]+), \1@[a-z]+\.[a-z]+" emails.txt
-
The backend team at faang needs your help - we have lots of new products and they’re flying off the shelves like crazy (apparently you can sell happiness). In order to track all these transactions, each sale is assigned a unique ticket id. A ticket id is defined by the following properties:
- It must contain exactly 16 letters (upper or lowercase) and numbers
- To improve readability, the letters may optionally be grouped into segments that are multiples of length four delimited by dashes. However, the string may not end with a dash.
The following are valid ticket ids:
The following are not valid ticket ids:1234567891011112 1234-4567-8910-1112 aBcD-Ef79-8122-fd01 aBcDEf798122-fd01
12345 #too short 1233333333333333333333333333 #too long 1234-4567-8910-11?2 #illegal character 1234567891011112- #ends with dash
-
Come up with a
grep
command that identifies valid ticket id’s in the filetickets.txt
-
Write a command that identifies how many unique valid ticket id’s are in the file
ticket.txt
. -
Challenge: Come up with a
grep
command that identifies valid ticket id’s with the added constraint that if there is a single dash, all groups of four must be separated by a dash. (i.e. NowaBcDEf798122-fd01
is not a valid ticket id).
Solutions
grep -E "^([a-zA-Z0-9]{4}-?){3}[a-zA-Z0-9]{4}$" tickets.txt
grep -E "^([a-zA-Z0-9]{4}-?){3}[a-zA-Z0-9]{4}$" tickets.txt | sort | uniq | wc -l
grep -E "^[a-zA-Z0-9]{4}(-?)[a-zA-Z0-9]{4}\1[a-zA-Z0-9]{4}\1[a-zA-Z0-9]{4}$" tickets.txt