HW7 - More regex and sed (2 points)

Due Tuesday 11/19 at 1:00 pm. No late submissions accepted.

Submission: Gradescope

Specification: Spec


This assignment is focused on using sed (and getting more practice with regular expressions). In particular: while there are many broad applications of sed, this assignment will focus on using sed to automate changes in a codebase and interact with a git repository.

To calculate your final score on this assignment, sum the individual scores from Gradescope.

  • if your score is between [0, 1), you will get 0 points
  • if your score is between [1.0, 1.5), you will get 1 point
  • if your score is between [1.5, 2], you will get 2 points

Task 0: Setting Up

Like most assignments, you will submit a task1.sh file to Gradescope. Unlike other assignments, you will download this task1.sh file from the faang repository on GitLab. The set of files you will need are on the latest commit on the main branch of the repository.

To do this, either run git pull in the repository:

git checkout main
git pull

Or, re-clone the repository from scratch (particularly helpful if you ran into issues with HW5)

Another way is to re-clone the repository:

git clone git@gitlab.cs.washington.edu:cse391/24au/faang.git
cd faang

You know that you have an up-to-date version if the task1.sh has the appropriate quarter in its header.

You are not expected (and are not able) to make changes to the main branch of the repository (e.g. pushing a change or merging in a merge request). In addition, you should not commit/push changes to task1.sh. Instead, you will submit this file to Gradescope.

For the rest of these problems, assume that you are in the root of the repository.

Note

This problem is still broken up into three tasks (to make things more digestible), but everything should go into the task1.sh file.

Task 1: Warming up with sed

Now that business is booming, FAANG is looking to expand to new ventures. They’ve heard that Seattle is a biotech hub, and biotech is getting pretty big — Amanda Seyfried (of Mean Girls fame) was even in an award-winning drama series on the industry, which has to be good news. For whatever reason, FAANG has decided to rebrand to 24AndMe, and address some tech debt along the way. You’ll use sed to help with this transition.

For the rest of these problems: feel free to (somewhat) match your answers to the actual set of files that you are given; we will not test your regexes against a more general case (though, it’d be helpful to do so)!

Tip

Take advantage of the fact that this homework is “within” a git repository! If you want to discard your changes and “reset” them back to your original state, run git checkout <file-you-want-to-reset>. Be careful not to reset your task1.sh!

These first few problems are straightforward applications of sed, similar to what you’ve practiced before.

Problem 1: Slinging Slogans

As part of the rebrand, the company wants to create a slogan (FAANG … never had one). Write a sed command that replaces the string Insert Catchy Slogan Here in GenerateSite.java with whatever slogan you want (as long as it’s a non-empty string). This command should update the file in-place.

Problem 2: Deleting Dirt

A biotech company can’t sell “dirt” - they’re scientific, so they sell “soil”. Write a sed command that replaces all occurrences of dirt (case-insensitively) in Products.java with the string soil. This command should update the file in-place.

Problem 3: Catching Credit Cards

Last week, the engineers also discovered that the website didn’t properly format credit cards from different users. Some numbers have spaces, some don’t, and the spacing isn’t always consistent.

Write a sed command that reformats each credit card number in cards.txt to have consistent spacing, where each group of 4 numbers is separated by a single space, and the last group of numbers contains 1-4 characters. This command should not modify cards.txt.

Task 2: Code Refactoring with sed

Now, we’ll apply sed to refactor code across multiple files at once. You might find some of these techniques helpful in future projects…

Problem 4: Privatization

The engineer who authored Product.java and Employee.java gave the classes public fields. Their teammates are not impressed.

Fix this mistake by writing a sed command that makes the fields for the Product and Employee classes private. In other words, replace all instances of public with private when used as a class field declaration. As a reminder, the syntax for class fields is public <type> <identifier>;. Take care to not overwrite the class declaration or method signatures.

This command should update Product.java and Employee.java in-place.

Problem 5: Consolidating Comments

Java programs can contain single-line comments (beginning with //) and multi-line comments (starting with /* and ending with */). However, some multi-line comments only occupy a single line, and some companies say that it’s “good practice” to use a single-line comment in these cases. 24AndMe is now one of these companies.

Write a sed command that finds multi-line comments in Products.java that occupy only a single line, and replaces them with a // comment — while keeping the rest of their content. For example, /* premium dirt */ would become // premium dirt (note the space after //).

Your command does not need to handle multi-line comments that span multiple lines (e.g. the /* is not on the same line as the */). In addition, you may assume that each line has at most one comment.

This command should update Products.java in-place.

Problem 6: Tying Up Double Helixes

For the last part of the rebrand, all references to FAANG in the source code will need to be updated. Write a sed command that finds and replaces all occurrences of FAANG (case-insensitively) to 24AndMe across all .java files in the repository. Your command should update all the files in-place.

Task 3: Combining git and sed

In this last task, we’ll look at a different type of log analysis: looking at git log!

Problem 7: Employing Expressions

HR has lost the list of the employees at the company! And, not everybody put their real name in Staff.java — truly unfortunate. Your job is to reconstruct the list of employees by looking at all the engineers who have contributed to the repository.

Write a command (that uses sed, git log, and other tools in this class) to create a contributors.txt file which contains an alphabetically-sorted list of all the unique contributors to the repository.

For this problem, a unique “contributor” should be all of the text that appears after the text Author: when running git log. Authors with the same name but different emails should be considered distinct. Authors should be compared and sorted case-insensitively.

Hint: you do not need to pass any extra options to git log (and we promise that it’s easier this way)!

Problem 8: Naming Names

Write a command that outputs a list of the unique names of contributors from your contributors.txt.

For this problem, the name of a contributor can be found by removing the email and the surrounding angle brackets (< and >). You should also remove any duplicates you encounter after doing this process.

Problem 9: Enumerating Emails

IT wants to generate a company email address for each of these employees, using their email usernames from their previous email accounts.

Write a command that outputs (to standard output) company emails of the form username@24AndMe.com, where username is any non-whitespace character that comes before the @ symbol in that contributor’s email. A contributor’s email is defined as being in-between the < and > in contributors.txt (i.e., it does not include the angle brackets). Similar to the previous question, you should also remove any duplicates you encounter after doing this process.

General Hints

  • build your solutions iteratively - don’t write it all at once!
  • we’ve done parts of these problems before in previous homeworks
  • the regular expression syntax hints from the grep assignment apply here with sed
  • some special characters must be escaped by a \ to be used in a regex pattern
  • putting a g at the end of your pattern, such as s/oldpattern/newtext/g, processes all matches on a line
  • struggling to debug a regex? Tools like regex101 may help!

For many of these problems, you may find it helpful to refer to the Regex Syntax on our reference page.