24su ver.
Note: this is for the Summer 2024 iteration of CSE 391. Looking for a different quarter? Please visit https://courses.cs.washington.edu/courses/cse391/.
This is a set of notes that cover CSE 391’s pre-lecture content. It should be used as reference material, but does not replace watching the videos or actively engaging with the content. All the notes are intentionally kept on one page to make searching for content easier.
These notes were originally written by written by Antonio Ballesteros and Kirupa Gunaseelan, two TAs during the 24wi offering of 391. It has since been refined by TAs and course staff over each offering. Thank you Antonio and Kirupa!
Table of Contents:
- Week 1: Introduction to Linux
- Week 2: More shell commands, streams, I/O redirection
- Week 3: More shell operators,
xargs
,find
,cut
- Week 4: Introduction to
git
- Week 5: More
git
and GitLab - Week 6: Regular Expressions and
grep
- Week 7:
sed
- Week 8: Introduction to shell scripting
Week 1¶
What is Unix and Linux?¶
Unix and Linux refer to operating systems. You’ve used operating systems before - you might have a PC that runs Windows or a Macbook that runs macOS. We have a “third genre”, which we’ll refer to as Unix/Linux (or sometimes, just Unix or Linux).
It turns out, Unix/Linux computers are by far the most popular operating system today - powering everything from mobile phones and micro-devices to cloud computing and supercomputers.
While learning Linux, we’ll also learn another way for you to interact with your computer.
Accessing Linux¶
Check out the Working at Home page on the class website, which will talk about different ways to access Linux in this class. You’ll need to do this for your homework!
ssh
and attu
¶
The “default” way we’ll access Linux is by using ssh
, which lets you remotely log in to other computers. For example, the attu
servers are actually physically located in the basement of the CSE department! Rather than having to physically go to them, we can use ssh
to connect to them remotely and do work on them.
The terminal and the shell¶
The terminal is a text-based user interface for interacting with your computer. Inside the terminal is the shell, which is often denoted by the $
. Essentially, the shell is a program that allows the user to interact with the operating system + applications.
vim
and emacs
¶
When editing code on a terminal, we’ll need a code editor. We recommend two in this class - vim
and emacs
. You only need to pick one of them and stick with them throughout the course. You’ll learn more about these editors in your homework; the editors section of the resources page includes some guides to get started.
Basic Shell Commands¶
command | description |
---|---|
pwd | Print current working directory |
cd | Change working directory |
ls | List files in working directory |
man | Bring up manual for a command |
exit | Log out of shell |
Flags and Arguments¶
Flags are prepended with -
and change programs’ behavior slightly. For example,
$ ls -l
-l
is a flag for ls
. In this case, the -l
flag for ls
lists the content of the current directory in “long listing format”.
Commands can also take arguments. For example,
$ ls dir1
dir
is an argument. In this case, this lists the contents of the directory named dir
.
Commands can take multiple arguments and flags. For example,
$ ls -a -l dir1 dir2
lists the contents of dir1
and dir2
in long listing format, showing hidden files (-a
).
Some programs, like ls
, let you combine flags into one -
:
$ ls -al dir1 dir2
man
pages (documentation)¶
Documentation for Linux is built-in and can be accessed using the man
(manual) command.
For example,
$ man ls
provides the documentation for ls
.
Some helpful notes for reading man
pages:
- most
man
pages have specific components: - a summary or synopsis that shows the structure of the command and important flags and arguments
- a longer description that explains each flag and argument
- examples of how to use the command in common use-cases
- you can search for words with
/
: e.g. typing/reverse
searches for"reverse"
exactly - if you type in
h
, you can see all the commands you can use to navigate aman
page
The synopsis has some important syntax. For example, man ls
has:
SYNOPSIS
ls [OPTION]... [FILE]...
- items within
[]
are optional; in this case, this means thatls
can optionally take anOPTION
or aFILE
, but neither is required. - the
...
mean that the command takes one or more of the preceding item; in this case, this means thatls
can take one or more options and one or more files
Compare this synopsis with the next few commands; how are they the same, and how are they different?
Directory Commands¶
command | description |
---|---|
ls | List files in working directory |
pwd | Print current working directory |
cd | Change working directory |
mkdir | Make a new directory |
rmdir | Remove the given directory (must be empty) |
Relative Directories¶
directory | description |
---|---|
. | References the working directory |
.. | References the parent of working directory |
~ | Refers to the **home** directory |
~/Desktop | Your desktop |
Week 2¶
File Examination Commands¶
command | description | example(s) |
---|---|---|
cat | “Print” out files to the console | cat file.txt |
less | Browse a file, with search, scroll, and other features | less file.txt |
more | An alternative to less with different keybinds and features | more file.txt |
head | “Print” out the first 10 lines of a file; use flags to change this behaviour | head file.txt , head -n 5 file.txt |
tail | “Print” out the last 10 lines of a file; use flags to change this behaviour | tail file.txt , tail -n 5 file.txt |
wc | “Print” out the number of lines, words, and characters in a file; use flags to change this behaviour | wc file.txt , wc -l file.txt |
Searching and Sorting Commands¶
command | description | example(s) |
---|---|---|
grep | “Print” out the lines of the input file(s) that contain a specific string. | grep "berry" fruits.txt veggies.txt shows the lines in both fruits.txt and veggies.txt that contain the string "berry" . |
sort | “Print” out the contents of the input file, but with lines sorted lexicographically. | sort file.txt |
uniq | “Print” out the contents of the input file, but remove (adjacent) repeated lines. Often used with sort . | uniq file.txt |
find | Searches the filesystem for file(s) that match a pattern. | find -name "*.java" finds all files in the current directory and its subdirectories that end in .java . Note that this is more powerful than ls *.java ! |
All of these commands have many options. You’ll have to look at the man
pages for sort
, uniq
, and find
for your homework!
Compiling and Running java programs¶
javac HelloWorld.java
compiles the contents inHelloWorld.java
(can replace this with any other.java
file)java HelloWorld
runsHelloWorld.java
java HelloWorld.java
compiles and runs HelloWorld.java
Standard Streams¶
Processes (~ programs) in Unix have 3 standard streams, which are “abstract” locations that tell a process where to read input from and write output to. They are:
stream | Java equivalent | description |
---|---|---|
Standard Input (stdin ) | System.in | where the program gets user input; defaults to terminal input |
Standard Output (stdout ) | System.out | where the program sends “normal” output; defaults to printing to terminal |
Standard Error (stderr ) | System.err | where the program sends error output; defaults to printing to terminal |
Many commands will default to using stdin
for input when some arguments aren’t provided; for example, try running grep "a"
and then typing many sentences into your terminal.
As a programmer, you don’t have to worry about exactly how these work; the shell and operating system manage them for you! However, we’ll often “redirect” these streams elsewhere.
Input and Output Redirection¶
>
: Standard Output Redirection¶
The >
operator allows you to execute a command and redirect its standard output to the given file, instead of printing it to the console.
For example,
$ grep "berry" fruits.txt > berries.txt
finds all lines which contain "berry"
in fruits.txt
, and writes it to berries.txt
instead of printing it to the console.
The >
operator overwrites the file. In contrast, the >>
operator appends to a file.
$ grep "berry" fruits.txt >> berries.txt
The left-hand side of >
and >>
should be a command. The right-hand side of >
and >>
should be the name of a file.
<
: Standard Input Redirection¶
The <
operator allows you to use the contents of a file as the contents of standard input, instead of what the user types in to the console.
For example, the following command finds lines containing "berry"
within fruits.txt
:
$ grep "berry" < fruits.txt
This looks similar to using grep "berry" fruits.txt
, but there is a slight nuance; when using <
, grep
is now reading from standard input (not directly from a file). This difference becomes more important in later sections!
The left-hand side of <
should be a command. The right-hand side of <
should be the name of a file.
2>
: Standard Error Redirection¶
The 2>
operator allows you to execute a command and redirect its standard error to the given file, instead of printing it to the console.
For example,
$ javac HasSomeErrors.java 2> errors.txt
would compile HasSomeErrors.java
, and write any error output to errors.txt
instead of printing it to the console.
Note that >
and 2>
can point to different files; this is helpful in splitting up logs and debugging.
$ javac *.java > output.txt 2> errors.txt
The left-hand side of 2>
should be a command. The right-hand side of 2>
should be the name of a file.
Pipes¶
The |
operator is called a pipe. You use pipes two “link” two commands together. Consider:
$ command1 | command2
In order, the |
:
- executes
command1
- then, executes
command2
, using the standard output ofcommand1
as the standard input tocommand2
Conceptually, it is shorthand for the following sequence of commands:
command1 > filename
filename < command2
rm filename
Week 3¶
More command line operators¶
And (&&
)¶
The and operator (double ampersand) is put between two commands, e.g. command1 && command2
:
- if
command1
succeeds, it then runscommand2
- if
command1
fails (e.g. when runningjavac CompilerErrors.java
and getting a compilation error), thencommand2
is not run - useful when
command2
depends oncommand1
succeeding
(this behaviour comes from short-circuiting in Boolean expressions)
Or (||
)¶
The or operator (double pipe) is put between two commands, e.g. command1 || command2
:
- if
command1
succeeds, thencommand2
is not run - if
command1
fails (e.g. when runningjavac CompilerErrors.java
and getting a compilation error), it then runscommand2
- useful when
command2
is a fallback forcommand1
(this behaviour comes from short-circuiting in Boolean expressions)
Then (;
)¶
The then operator (semicolon) is put between two commands, e.g. command1 ; command2
. It runs command1
and then command2
, regardless of whether or not command1
succeeded or failed.
echo
¶
echo
is a command that prints out its argument(s) to standard output. It’s the shell equivalent of System.out.print()
in Java or print()
in Python.
$ echo "Hello, world"
Hello, world
xargs
: convert stdin to arguments¶
In Week 2, we talked about standard input (stdin) and command-line arguments being different concepts. Frequently, you want to convert standard input to an argument (often as part of chaining many |
commands). xargs
is a command that lets you do just that!
For example, consider the following command:
$ ls *.java | xargs javac
ls *.java
outputs lines to standard output|
takes the standard output ofls *.java
and sends it to the command to its right as standard input- but,
javac
only takes in files to compile as arguments - not standard input - so, we use
xargs
to convert the output ofls *.java
to arguments forjavac
- you can think of this as a short form for these three commands: 1.
ls *.java > toCompile.txt
2.xargs javac < toCompile.txt
3.rm toCompile.txt
find
: recursively search directories¶
Using ls *.java
only gives you the files in the current directory that have the .java
extension. The find
command lets you search within subdirectories as well.
For example, the following command finds all Java source files in the current directory and its subdirectories.
$ find -name "*.java"
You will often pair find
with xargs
(to apply an operation to all files that match a pattern). For example, the following compiles all Java files in the current directory and its subdirectories.
$ find -name "*.java" | xargs javac
cut
: simplify complex strings¶
The cut
command lets you manipulate strings from standard input.
The -c
(character) flag lets you get characters at certain (1-indexed) indices or ranges. Here are a few examples:
$ echo "abcdef" | cut -c2
b
$ echo "abcdef" | cut -c2-5
bcde
$ echo "abcdef" | cut -c2,1,4
bad
The -d
(delimiter) flag lets you split up input into (1-indexed) fields that are separated by the delimiter, which you can then access with the -f
(field) flag. A common example is to parse CSVs (comma-separated values):
$ echo "a,b,c,d,e,f" | cut -d, -f1
a
Application: parsing logs¶
In the video, we showed one application of cut
: parsing complicated log files. On attu
, /cse/web/courses/logs/common_log
outputs a log of all requests the CSE courses websites get. However, just using tail
on this gives us too much information - it’s hard to focus!
(try running tail -f /cse/web/courses/logs/common_log
yourself first)
Instead, we can use the power of pipes and cut
to pull out a specific field of the log, like the requested page.
$ tail -f /cse/web/courses/logs/common_log | cut -d\" -f4
Here, we’re escaping the "
as it has a special meaning in the shell, and we’re grabbing the 4th field.
We can also use the stdbuf
command to get more instant, non-buffered input, and look for requests just to 391
:
$ tail -f /cse/web/courses/logs/common_log | stdbuf -oL cut -d\" -f4 | grep "391"
Week 4¶
Definitions¶
- version control: software that keeps track of changes to a set of files
- repository (repo): a location that stores a copy of all of the files for a project
git
: a command-line tool and version control system for software; used everywhere- GitHub and GitLab: platforms (and companies) that help host
git
repositories
Repositories and Remote¶
What goes in a repository and what doesn’t? Rule of thumb: everything you need to build your project from source, and nothing more!
- include: source code files (e.g.
.java
,.c
files), build and config files (e.g.Makefile
), assets (e.g. images), documentation - not include: object files (e.g.
.class
or.o
files), executables (e.g..exe
or.app
files) - depending on your situation: library and dependency files
Git is a distributed version control system:
- every user has a copy of the entire repository (including all the files and a history of changes)
- users make changes on their own local repositories, and share them with others by “pushing” these changes (or “pull” changes from other users’ repositories)
Frequently, you will have a “special” repository called remote:
- the remote repository is the “main” copy of the code or the “source of truth”
- developers will push/pull changes to/from remote
- remote is often hosted remotely (i.e. not on a developer’s computer) by services like GitHub or GitLab
Conceptual: Four Phases of Git¶
Fundamentally, Git stores all data as a set of changes to files. Changes can be in one of four phases:
- Working Directory (Working changes, what’s on your computer)
- You can move these changes/files to the 2nd phase (Staging Area) by staging your files using
git add
orgit stage
. This is essentially getting your changes ready, prepared, and in draft mode (preparing them for a commit later on). This is relatively easy to undo (git restore –staged <file>).
- You can move these changes/files to the 2nd phase (Staging Area) by staging your files using
- Staging Area/Index (change’s you’re preparing to commit)
- You can move your stages files to your local repo, by committing them using
git commit
. This saves your changes to your local repo, and is more difficult to undo.
- You can move your stages files to your local repo, by committing them using
- Local Repository (a local copy of the repo with your committed changes)
- You can move your changes to the 4th phase (remote repository) by pushing them, using
git push
. This is the hardest to reverse.
- You can move your changes to the 4th phase (remote repository) by pushing them, using
- Remote Repository (remote shared repository)
Basic Git Commands¶
command | description |
---|---|
git clone <url> [dir] | Make a local copy of the git repository at <url> |
git add <file> ... | Add the changes made to each <file> to the staging area |
git commit | Create a “commit” that captures all the changes in the staging area; requires a commit message |
git push | Push changes from your local repository to the remote repository |
git status | View status of files in the working directory and staging area |
git log | View history of commits in reverse chronological order (use --graph --oneline for a concise visualization) |
git diff | Show differences for changes between working directory and staging (or use --staged for staging and last commit) |
git revert <commit> | Reverts the given commit by adding a new commit that undoes the changes |
Anatomy of a git log
(commits and hashes)¶
Each entry of git log
is a single commit. For example, here is one commit:
commit 8669021427dfff099b25adae3616e4cca9461cf4
Author: Matt Wang <mxw@cs.washington.edu>
Date: Tue Jul 2 01:32:36 2024 -0700
Create "Using CSE GitLab" page
There are a lot of things going on here!
- the hash is the long string that appears after
commit
(in this case,8669021427dfff099b25adae3616e4cca9461cf4
)- a hash uniquely (*) identifies a commit; we will use these in the next section
- we can often refer to a hash by its first seven characters, e.g.
8669021
- each commit also has an author (a name + email, configured by
git config
) and a timestamp - each commit has a commit message (this is what you wrote in
git commit -m
)
Note
While this is not the focus of this class, “hashes” are a fascinating part of computer science with deep connections to cryptography, computer security, and math. Roughly speaking, these “hashes” are similar to the “hash” you use in a HashMap
. It is not strictly true that all commit hashes are unique; see the “SHAttered” paper for more. See also “Hash function” on Wikipedia.
Commits, Branches, and History¶
In git
, commits are a group of changes. Each commit builds on top of a previous commit, similar to a linked list.
In contrast, a branch is a pointer (or reference) to a specific commit.
This diagram shows a simple git history:
gitGraph commit id: "A" commit id: "B" commit id: "C" tag: "HEAD"
In this example, we have three commits: A
, B
, and C
. The HEAD
tells us where our local copy of the repository is (at commit C
). This means that our local repository has, in order:
- the changes from
A
- then, the changes from
B
- finally, the changes from
C
The main
branch refers to the commit C
here; though, as C
builds on top of B
and A
, we can also think of the branch as containing the history of the project up and until C
.
Branching and Merging¶
When working on a new feature or bugfix, you will often create a new branch to work on your changes. That way, your changes won’t affect others who are working off of the main
branch (or their own branches).
Once you’re ready to add your changes to the main
branch, you will need to merge your feature branch in. To do so,
- go to the branch that you want to receive the changes (typically,
main
) - run
git merge feature
(wherefeature
is the name of your branch) - if necessary, resolve any merge conflicts
- occurs when
git
can’t automatically merge commits (usually due to conflicting changes) - you need to edit each file to have the “correct” behaviour after the merge (often editing the lines with
<<<< HEAD
and====
) - once they are all complete,
git add
your changes and rungit commit
- occurs when
- finish the merge commit (by using the default commit message and/or editing it)
- if necessary, run
git push
to update remote with the change
Branching and Merging Commands¶
command | description |
---|---|
git branch <name> | Creates a new branch with the provided name. |
git checkout <branch> | Switches your local repository to a different branch (i.e., moves the HEAD ) |
git switch <branch> | Same as git checkout (for the purposes of this class) |
git checkout -b <branch> | Like git checkout , but creates the branch if it doesn’t exist |
git merge <other-branch> | merges the “other branch” into your current branch, updating your current branch. Can cause a merge conflict!! |
Week 5¶
Remote, origin, and syncing¶
As a reminder: the remote repository is the central source of truth for code. In teams, everybody typically syncs their changes with remote (rather than directly with each other).
Branches on the remote repository are often prefixed with origin/
, e.g. origin/main
is the main
branch on the remote repository. Other than this prefix, you should think of them as “normal” branches - pointing to a specific commit.
To sync changes with remote, you’ll run the git push
and git pull
commands. These will sync your local branch (e.g. main
) with the remote version (e.g. origin/main
). In this model,
git push
updatesorigin/main
with the changes you’ve made inmain
git pull
updatesmain
with the changes fromorigin/main
With only one person making changes, this is pretty straightforward. But, things get harder when multiple changes get involved!
Example: resolving conflicts with remote¶
Imagine that you have a local and remote repository, both with a main
branch A
and B
:
--- title: Local --- gitGraph commit id: "A" commit id: "origin/main -> B" tag: "HEAD"
--- title: Remote --- gitGraph commit id: "A" commit id: "B"
Next, imagine that you make a change to your local repository called C
, while your coworker adds a different change D
. The graph would look like this:
--- title: Local --- gitGraph commit id: "A" commit id: "origin/main -> B" commit id: "C" tag: "HEAD"
--- title: Remote --- gitGraph commit id: "A" commit id: "B" commit id: "D"
Note that at this point, origin/main
still points at B
: your local repository doesn’t know about these changes yet.
Running git push
here will give you an error (usually something like "error: failed to push some refs to REMOTE"
). This is because git
doesn’t know how to resolve the history: C
and D
both point at B
, so it’s not clear how to “combine” them. This can be complicated by the commits touching the same file.
To fix this, you’ll:
- first, run
git fetch
, which updates your local repository’sorigin/main
to point atD
--- title: Local (after git fetch) --- gitGraph commit id: "A" commit id: "B" branch origin/main commit id: "orign/main -> D" checkout main commit id: "C"
- then, run
git merge origin/main
, which mergesorigin/main
intomain
– fixing the issue withC
andD
locally- if necessary, address any merge conflicts here
--- title: Local (after the merge) --- gitGraph commit id: "A" commit id: "B" branch origin/main commit id: "orign/main -> D" checkout main commit id: "C" merge origin/main id: "M" tag: "HEAD"
- finally, run
git push
, which will pushC
and the merge commitM
to the remote – fixing the issue withC
andD
remotely
--- title: Remote (after the merge) --- gitGraph commit id: "A" commit id: "B" commit id: "D" commit id: "C" commit id: "M" tag: "HEAD"
Pull and Merge Requests¶
You will often not use git merge
directly with remote - instead, you’ll use a platform like GitHub or GitLab. This simplifies some of the process and lets you work with others!
In particular, you will use a tool called a merge request (GitLab’s term) or pull request (GitHub’s term). This lets you “prepare” a merge, but also lets you get feedback from others on code (a big part of software engineering).
The sketch of how to use a pull or merge request is:
- create a local branch and make some commits
- push those commits (and that local branch) to remote
- open a pull or merge request on GitHub/GitLab: here, you’re “requesting” to merge your code
- work with others (e.g. they leave comments on your code, run some tests, …)
- hit “merge” on the request, which merges into
main
(or other branch)
Example: creating a merge request¶
Continuing from our previous example, let’s make a new branch called feature
with git checkout -b feature
; we’ll add a new commit called X
:
--- title: Local --- gitGraph commit id: "A" commit id: "B" branch origin/main commit id: "D" checkout main commit id: "C" merge origin/main id: "origin/main -> M" branch feature commit id: "X" tag: "HEAD"
Now, assume that someone else has added a change Y
to your remote’s main
. We can still git push
our change to the remote - since we’re using a different branch than main
, we don’t encounter a conflict just yet.
--- title: Remote --- gitGraph commit id: "A" commit id: "B" branch origin/main commit id: "D" checkout main commit id: "C" merge origin/main id: "M" branch feature commit id: "X" checkout main commit id: "Y"
However, if we want to merge origin/main
and feature
, we’ll once again have some sort of conflict! However, what you’ll typically do instead of locally merging is:
- open a merge/pull request on GitLab/GitHub
- after the request is created, you can fix the conflict on GitLab or GitHub; this will create a commit on your
feature
branch (not onmain
) -M1
in the diagram - after getting approval, you can hit “merge”: this will merge
feature
intomain
, creating a merge commit onmain
(not onfeature
) -M2
in the diagram
--- title: Remote --- gitGraph commit id: "A" commit id: "B" branch remote-fetch commit id: "D" checkout main commit id: "C" merge remote-fetch id: "M" branch feature commit id: "X" checkout main commit id: "Y" checkout feature merge main id: "M1" checkout main merge feature id: "feature -> M2" tag: "main"
Now, the remote’s main
branch has both the changes Y
and X
, as well as the merge commits M1
(main
-> feature
) and M2
(feature
-> main
). You can safely delete feature
.
More on reverting commits¶
There are different ways to revert a commit.
git revert <commit>
reverts a specific commit by adding a new commit to the history that undoes the changes- you can pass a commit hash, or a relative term like
HEAD
(most recent commit) - good because you preserve the previous commit history – easy to work with others, can “undo the undo”
- bad because secrets (e.g. passwords, APIs, embarassing typos) still remain in the total history
- you can pass a commit hash, or a relative term like
git reset <commit>
moves yourHEAD
and working directory to a commit; you can pair this with “force-pushing” to truly remove commits from history- but, this changes the remote’s git history, which can break collaboration with others
- generally not recommended unless you know what you’re doing
General advice: use git revert
, and don’t alter the history of the main
branch (unless you know what you’re doing)!
.gitignore
¶
.gitignore
is a special file you can create in a git repository. This file tells git
to ignore certain patterns of files.
For example, you almost always don’t want to commit .class
files in Java projects. To tell git
to not “look” at these files, you can create a file called .gitignore
and add the following:
*.class
The .gitignore
uses a similar “glob” syntax to find -name
, so you can use these ideas interchangeably.
Week 6¶
Introduction to Regular Expressions¶
Regular expressions (or regexes) are a concise way to describe patterns of text. While they have a more precise meaning in theoretical CS, software engineers tend to use the term quite broadly.
In 391, we’ll be focusing on regular expressions in the context of two commands: grep
(this week) and sed
(next week).
Basic grep
flags and syntax¶
To use regular expressions in grep
, we’ll want to use the -E
flag. You can still use other flags (e.g. -i
for case-insensitive matches).
Many characters in grep
have special meanings - these are called metacharacters. One that we’ve already learned about is .
, which matches any character; this will highlight all characters in a file:
$ grep -E "." file.txt
In contrast, adding more strings will only match parts of lines that match the entire pattern. For example, this regular expression matches parts of lines that start with hello
, have any character, then end with world
.
$ grep -E "hello.world" file.txt
You can escape metacharacters with \
; to match a literal period (followed by com
), try:
$ grep -E "\.com" file.txt
More Metacharacters: Anchors¶
Next, we learned about “anchors”: special characters that match the beginning or end of a line (^
and $
) or a word (\<
and \>
).
Anchor | What it matches | Example |
---|---|---|
^ | Start of line | ^cat matches cat and caterpillar , but not orange cat or (cat) |
$ | End of line (not including newline) | cat$ matches cat and tomcat , but not cat. or cat! |
\< | Start of word | \<cat matches caterpillar fur and here cat here , but not tomcats |
\> | End of word | cat\> matches brown tomcat and muscat , but not tomcats rock |
You can combine anchors together; using ^
and $
together is helpful for matching entire lines.
Alternating and repeating characters¶
Syntax | What it matches | Example |
---|---|---|
| | Either pattern (to left or right) | com|edu matches either com or edu |
* | 0 or more copies of the character before it | 0* matches the empty string, 0 , 00 , 000 , … |
+ | 1 or more copies of the character before it | 1+ matches 1 , 11 , 111 , … |
? | 0 or 1 copies of the character before it | 2? matches the empty string or 2 |
() | Group characters together as one character (capture group) | (01)+ matches 01 , 0101 , … |
Note that using *
is dangerous: it matches everything (including things you may not want to match).
Character sets¶
The []
syntax creates a character set, which matches one of any of the characters between the [
and ]
. For example, the following two commands are equivalent:
$ grep -E "(a|b|c|d|e)"
$ grep -E "[abcde]"
Character sets support special syntax with -
(“ranges”) and ^
(negation):
Character set | Description |
---|---|
[A-Z] | All uppercase alphabet characters |
[a-z] | All lowercase alphabet characters |
[0-9] | All digits |
[A-Za-z] | All uppercase or lowercase alphabet characters |
[^a] | All characters that are not a |
[^a-z] | All characters that are not lowercase alphabet characters |
Note that ^
has a different meaning than the start anchor ^
, In addition, outside of ^
and -
, regex metacharacters do not have their special meanings inside []
; for example, [.?!]
matches one of .
, ?
, and !
, not any character.
Using ^
and -
in character sets is a bit tricky. To quote grep
’s man
page:
To include a literal
[
place it first in the list. Similarly, to include a literal^
place it anywhere but first. Finally, to include a literal-
place it last.
Occurrence ranges¶
The {}
syntax matches the previous character a specific number of times.
Syntax | Description |
---|---|
{n} | Matches the previous character exactly n times |
{,n} | Matches the previous character up to n times, inclusive |
{a,b} | Matches the previous character between a and b times, inclusive |
Backreferences¶
Backreferences let you capture patterns and look for them later. They work with capture groups (()
) and are one-indexed.
For example, if we wanted to match lines containing a three-letter word, a space, and then the same three-letter word, we would do:
$ grep -E "(...) \1"
Backreferences only match the exact same characters as before; the above example does not match any three-letter word followed by a space and then another three-letter word.
Addendum: reference sheet¶
Check out the reference sheet’s section on regex for a complete table of all the syntax.
Week 7¶
Introduction to sed
¶
If grep
is a fancy “find” of the command line, sed
(stands for stream editor) is the “find-and-replace” of the command line.
We will always use sed
with the -r
flag. The general syntax looks like
sed -r 's/REGEX/TEXT/'
REGEX
is a pattern or regular expression that we want to match- this is the same syntax with
grep
- which makesgrep
helpful to test with!
- this is the same syntax with
TEXT
stands for the text that we want to replace the matched text with- outside of backreferences, special characters here are interpreted literally: they do not have their regular expression meaning
- the
-r
flag stands for regular expression sed
takes its input from a file(s) or standard input
For example,
sed -r 's/UW/University of Washington/' schools.txt
Would replace the first instance of UW
with University of Washington
for each line in schools.txt
.
You can add a g
after the last /
to do a “global” replace, which replaces every instance - not just the first one per line.
sed -r 's/UW/University of Washington/g' schools.txt
Since the /
has a special meaning in sed
, you can escape /
with \
.
The -i
flag¶
By default, sed
outputs changes to standard output but does not edit the original file.
You can change this behaviour by using the -i
flag, which changes the file in place. -i
requires an argument that is a file extension; sed
will create a backup file with this extension, before your changes.
For example,
sed -ri.bak 's/cats/dogs/' best_animals.txt
- first, make a backup file called
best_animals.txt.bak
- then, replace the first instance of
cats
withdogs
in each line ofbest_animals.txt
- will not output anything to standard output
sed
and backreferences¶
sed
becomes particularly powerful with backreferences: we can now edit lines depending on what we captured. In the pre-lecture, we saw the example artists.txt
:
Duckworth, Kendrick Lamar
Swift, Taylor Alison
Grande-Butera, Ariana
Ma, Yo-Yo
Bryan, Zachary Lane
Cottrill, Claire Elizabeth
Graham, Aubrey Drake
Amstutz, Kayleigh Rose
Jónsdóttir, Laufey Lín Bing
We can reformat this to put each artists first and middle names before their last name with:
sed -r 's/^(.*), (.*)$/\2 \1/' artists.txt
Giving us:
Kendrick Lamar Duckworth
Taylor Alison Swift
Ariana Grande-Butera
Yo-Yo Ma
Zachary Lane Bryan
Claire Elizabeth Cottrill
Aubrey Drake Graham
Kayleigh Rose Amstutz
Laufey Lín Bing Jónsdóttir
Week 8¶
Linux permissions (briefly)¶
We briefly talked about Linux permissions (we’ll return to this next week). For now, the tl;dr is that:
- by default, files are not executable
- to make a file executable, you need to use
chmod +x FILE
- you can then execute the file with
./FILE
You can check the permissions for a file with ls -l
; if an x
appears in the leftmost-column, then the file is executable.
Our first script¶
A shell script is a series of shell commands run from top to bottom. The most basic one looks like this:
#!/bin/bash
echo "Hello World!"
The #!bin/bash
is called a “shebang”. This tells your computer which “shell” should run the program; for this class, we’ll use bash
.
Subsequent lines are run one after another. Note that if your lines cd
to new directories or create new variables, that doesn’t carry over to the “parent shell”.
#!/bin/bash
mkdir bestfolder
cd bestfolder
touch bestfile
For example, the above shell script:
- will create a new folder called
bestfolder
- will create a new file called
bestfile
- will not move the shell that ran this script to
bestfolder
Variables¶
To create variables, use the =
:
color="red"
- don’t include spaces between the
=
symbol;color = "red"
will not work - by default, we use
"
. they are technically optional, but helpful when the input has spaces (e.g.color="sky blue"
works whilecolor=sky blue
doesn’t) - variables are case-sensitive (
color
is not the same asCOLOR
)
To reference a variable, use $
:
color="red"
echo $color
echo "my fave color is $color"
- within double quotes, variables are “expanded” to their value
- within single quotes, strings are interpreted literally;
echo 'My favorite color is $color'
would printMy favorite color is $color
- if a variable name is not found, bash does not throw an error; it just uses an empty string
Special Variables: Arguments¶
Bash “pre-fills” some variables for you that are related to the script’s arguments:
$1
is the first argument,$2
is the second, …$0
is the filename of the script$#
is the number of arguments provided (not counting the filename)$@
is all of the arguments at once
Exit Codes¶
The $?
variable is also pre-filled by bash and tells you the exit code for the previous command.
Usually, a 0
indicates that the command was successful, while a non-zero code means there was a failure.
You can make a script exit with exit
, e.g. exit 1
.
For Loops¶
Bash has a special syntax for for
loops, with the for
, in
, do
, and done
keywords. Here are some examples iterating over numbers, files, and arguments - all in one syntax style. Note how the first two use command substitution - the $()
- to “capture” the output of a command.
#!/bin/bash
for i in $(seq 1 4); do
echo $i
done
#!/bin/bash
for file in $(ls); do
echo $file
done
#!/bin/bash
for arg in $@; do
echo $arg
done
You can also put the do
on a separate line:
#!/bin/bash
for arg in $@
do
echo $arg
done
Arithmetic¶
To do arithmetic, we can either use let
or $(())
:
a=1
let b="$a + 2"
c=$(( $a + $b ))
If Statements¶
Bash has a very specific syntax for if
statements - be very mindful of the spacing with the [
and ]
!
Here are some examples:
if [ -n "$NAME"]; then
echo 'Variable $NAME exists'
fi
if [ $a -lt 10 ] && [ $a -gt 5 ]; then
echo "variable a is between 5 and 10"
fi
if [ $1 -lt $2 ]; then
echo "arg1 is less than arg2"
elif [ $1 -eq $2 ]; then
echo "arg1 equals arg2"
else
echo "arg1 is greater than arg2"
fi
Arithmetic Operators¶
Operator | Description |
---|---|
-gt | greater than |
-ge | greater than or equal to |
-lt | less than |
-le | less than or equal to |
-eq | equal to |
-ne | not equal to |
Boolean Operators¶
Operator | Description |
---|---|
if [ expr1 ] && [ expr2 ]; then | boolean AND |
if [ expr1 -a expr2 ]; then | boolean AND |
if [ expr1 ] || [ expr2 ]; then | boolean OR |
if [ expr1 -o expr2 ]; then | boolean OR |
if [ ! expr1 ]; then | boolean NOT |
String Operators¶
Operator | Description |
---|---|
= | equal to |
!= | not equal to |
-z | is empty |
-n | is non-empty |
File and Directory Operators¶
Operator | Description |
---|---|
-f | file exists |
-d | directory exists |
-r | file is readable |
-w | file is writable |
-x | file is executable |