24su ver.
Note: this is for the Summer 2024 iteration of CSE 391. Looking for a different quarter? Please visit https://courses.cs.washington.edu/courses/cse391/.
sed
(2 points)Due Tuesday 08/06 at 10:00 am. No late submissions accepted.
Submission: Gradescope
Specification: Spec
This assignment is focused on using sed
(and getting more practice with regular expressions). In particular: while there are many broad applications of sed
, this assignment will focus on using sed
to automate changes in a codebase and interact with a git
repository.
To calculate your final score on this assignment, sum the individual scores from Gradescope.
- if your
score
is between[0, 1)
, you will get0
points - if your
score
is between[1.0, 1.5)
, you will get1
point - if your
score
is between[1.5, 2]
, you will get2
points
Task 0: Setting Up¶
Like most assignments, you will submit a task1.sh
file to Gradescope. Unlike other assignments, you will download this task1.sh
file from the faang
repository on GitLab. The set of files you will need are on the latest commit on the main
branch of the repository.
To do this, either run git pull
in the repository:
git checkout main
git pull
Or, re-clone the repository from scratch (particularly helpful if you ran into issues with HW5)
Another way is to re-clone the repository:
git clone git@gitlab.cs.washington.edu:cse391/24su/faang.git
cd faang
You know that you have an up-to-date version if the task1.sh
has the appropriate quarter in its header.
You are not expected (and are not able) to make changes to the main
branch of the repository (e.g. pushing a change or merging in a merge request). In addition, you should not commit/push changes to task1.sh
. Instead, you will submit this file to Gradescope.
For the rest of these problems, assume that you are in the root of the repository.
Note
This problem is still broken up into three tasks (to make things more digestible), but everything should go into the task1.sh
file.
Task 1: Warming up with sed
¶
Now that business is booming, FAANG is looking to expand to new ventures. They’ve heard that Seattle is a biotech hub, and biotech is getting pretty big — Amanda Seyfried (of Mean Girls fame) was even in an award-winning drama series on the industry, which has to be good news. For whatever reason, FAANG has decided to rebrand to 24AndMe, and address some tech debt along the way. You’ll use sed
to help with this transition.
For the rest of these problems: feel free to (somewhat) match your answers to the actual set of files that you are given; we will not test your regexes against a more general case (though, it’d be helpful to do so)!
Tip
Take advantage of the fact that this homework is “within” a git repository! If you want to discard your changes and “reset” them back to your original state, run git checkout <file-you-want-to-reset>
. Be careful not to reset your task1.sh
!
These first few problems are straightforward applications of sed
, similar to what you’ve practiced before.
Problem 1: Slinging Slogans¶
As part of the rebrand, the company wants to create a slogan (FAANG … never had one). Write a sed
command that replaces the string Insert Catchy Slogan Here
in GenerateSite.java
with whatever slogan you want (as long as it’s a non-empty string). This command should update the file in-place.
Problem 2: Deleting Dirt¶
A biotech company can’t sell “dirt” - they’re scientific, so they sell “soil”. Write a sed
command that replaces all occurrences of dirt
(case-insensitively) in Products.java
with the string soil
. This command should update the file in-place.
Problem 3: Catching Credit Cards¶
Last week, the engineers also discovered that the website didn’t properly format credit cards from different users. Some numbers have spaces, some don’t, and the spacing isn’t always consistent.
Write a sed
command that reformats each credit card number in cards.txt
to have consistent spacing, where each group of 4 numbers is separated by a single space, and the last group of numbers contains 1-4 characters. This command should not modify cards.txt
.
Task 2: Code Refactoring with sed
¶
Now, we’ll apply sed
to refactor code across multiple files at once. You might find some of these techniques helpful in future projects…
Problem 4: Privatization¶
The engineer who authored Product.java
and Employee.java
gave the classes public fields. Their teammates are not impressed.
Fix this mistake by writing a sed
command that makes the fields for the Product
and Employee
classes private
. In other words, replace all instances of public
with private
when used as a class field declaration. As a reminder, the syntax for class fields is public <type> <identifier>;
. Take care to not overwrite the class declaration or method signatures.
This command should update Product.java
and Employee.java
in-place.
Problem 5: Consolidating Comments¶
Java programs can contain single-line comments (beginning with //
) and multi-line comments (starting with /*
and ending with */
). However, some multi-line comments only occupy a single line, and some companies say that it’s “good practice” to use a single-line comment in these cases. 24AndMe is now one of these companies.
Write a sed
command that finds multi-line comments in Products.java
that occupy only a single line, and replaces them with a //
comment — while keeping the rest of their content. For example, /* premium dirt */
would become // premium dirt
(note the space after //
).
Your command does not need to handle multi-line comments that span multiple lines (e.g. the /*
is not on the same line as the */
). In addition, you may assume that each line has at most one comment.
This command should update Products.java
in-place.
Problem 6: Tying Up Double Helixes¶
For the last part of the rebrand, all references to FAANG in the source code will need to be updated. Write a sed
command that finds and replaces all occurrences of FAANG
(case-insensitively) to 24AndMe
across all .java
files in the repository. Your command should update all the files in-place.
Task 3: Combining git
and sed
¶
In this last task, we’ll look at a different type of log analysis: looking at git log
!
Problem 7: Employing Expressions¶
HR has lost the list of the employees at the company! And, not everybody put their real name in Staff.java
— truly unfortunate. Your job is to reconstruct the list of employees by looking at all the engineers who have contributed to the repository.
Write a command (that uses sed
, git log
, and other tools in this class) to create a contributors.txt
file which contains an alphabetically-sorted list of all the unique contributors to the repository.
For this problem, a unique “contributor” should be all of the text that appears after the text Author:
when running git log
. Authors with the same name but different emails should be considered distinct. Authors should be compared and sorted case-insensitively.
Hint: you do not need to pass any extra options to git log
(and we promise that it’s easier this way)!
Problem 8: Naming Names¶
Write a command that outputs a list of the unique names of contributors from your contributors.txt
.
For this problem, the name of a contributor can be found by removing the email and the surrounding angle brackets (<
and >
). You should also remove any duplicates you encounter after doing this process.
Problem 9: Enumerating Emails¶
IT wants to generate a company email address for each of these employees, using their email usernames from their previous email accounts.
Write a command that outputs (to standard output) company emails of the form username@24AndMe.com
, where username
is any non-whitespace character that comes before the @
symbol in that contributor’s email. A contributor’s email is defined as being in-between the <
and >
in contributors.txt
(i.e., it does not include the angle brackets). Similar to the previous question, you should also remove any duplicates you encounter after doing this process.
General Hints¶
- build your solutions iteratively - don’t write it all at once!
- we’ve done parts of these problems before in previous homeworks
- the regular expression syntax hints from the
grep
assignment apply here withsed
- some special characters must be escaped by a
\
to be used in a regex pattern - putting a
g
at the end of your pattern, such ass/oldpattern/newtext/g
, processes all matches on a line - struggling to debug a regex? Tools like regex101 may help!
For many of these problems, you may find it helpful to refer to the Regex Syntax on our reference page.