This is a set of notes that cover CSE 391’s pre-lecture content. It should be used as reference material, but does not replace watching the videos or actively engaging with the content. All the notes are intentionally kept on one page to make searching for content easier.

These notes were originally written by written by Antonio Ballesteros and Kirupa Gunaseelan, two TAs during the 24wi offering of 391. It has since been refined by TAs and course staff over each offering. Thank you Antonio and Kirupa!

Table of Contents:

  1. Week 1: Introduction to Linux
  2. Week 2: More shell commands, streams, I/O redirection
  3. Week 3: More shell operators, xargs, find, cut

Week 1

What is Unix and Linux?

Unix and Linux refer to operating systems. You’ve used operating systems before - you might have a PC that runs Windows or a Macbook that runs macOS. We have a “third genre”, which we’ll refer to as Unix/Linux (or sometimes, just Unix or Linux).

It turns out, Unix/Linux computers are by far the most popular operating system today - powering everything from mobile phones and micro-devices to cloud computing and supercomputers.

While learning Linux, we’ll also learn another way for you to interact with your computer.

Accessing Linux

Check out the Working at Home page on the class website, which will talk about different ways to access Linux in this class. You’ll need to do this for your homework!

ssh and attu

The “default” way we’ll access Linux is by using ssh, which lets you remotely log in to other computers. For example, the attu servers are actually physically located in the basement of the CSE department! Rather than having to physically go to them, we can use ssh to connect to them remotely and do work on them.

The terminal and the shell

The terminal is a text-based user interface for interacting with your computer. Inside the terminal is the shell, which is often denoted by the $. Essentially, the shell is a program that allows the user to interact with the operating system + applications.

vim and emacs

When editing code on a terminal, we’ll need a code editor. We recommend two in this class - vim and emacs. You only need to pick one of them and stick with them throughout the course. You’ll learn more about these editors in your homework; the editors section of the resources page includes some guides to get started.

Basic Shell Commands

command description
pwd Print current working directory
cd Change working directory
ls List files in working directory
man Bring up manual for a command
exit Log out of shell

Flags and Arguments

Flags are prepended with - and change programs’ behavior slightly. For example,

$ ls -l

-l is a flag for ls. In this case, the -l flag for ls lists the content of the current directory in “long listing format”.

Commands can also take arguments. For example,

$ ls dir1

dir is an argument. In this case, this lists the contents of the directory named dir.

Commands can take multiple arguments and flags. For example,

$ ls -a -l dir1 dir2

lists the contents of dir1 and dir2 in long listing format, showing hidden files (-a).

Some programs, like ls, let you combine flags into one -:

$ ls -al dir1 dir2

man pages (documentation)

Documentation for Linux is built-in and can be accessed using the man (manual) command.

For example,

$ man ls

provides the documentation for ls.

Some helpful notes for reading man pages:

  • most man pages have specific components:
  • a summary or synopsis that shows the structure of the command and important flags and arguments
  • a longer description that explains each flag and argument
  • examples of how to use the command in common use-cases
  • you can search for words with /: e.g. typing /reverse searches for "reverse" exactly
  • if you type in h, you can see all the commands you can use to navigate a man page

The synopsis has some important syntax. For example, man ls has:

SYNOPSIS
       ls [OPTION]... [FILE]...
  • items within [] are optional; in this case, this means that ls can optionally take an OPTION or a FILE, but neither is required.
  • the ... mean that the command takes one or more of the preceding item; in this case, this means that ls can take one or more options and one or more files

Compare this synopsis with the next few commands; how are they the same, and how are they different?

Directory Commands

command description
ls List files in working directory
pwd Print current working directory
cd Change working directory
mkdir Make a new directory
rmdir Remove the given directory (must be empty)

Relative Directories

directory description
. References the working directory
.. References the parent of working directory
~ Refers to the **home** directory
~/Desktop Your desktop

Week 2

File Examination Commands

command description example(s)
cat “Print” out files to the console cat file.txt
less Browse a file, with search, scroll, and other features less file.txt
more An alternative to less with different keybinds and features more file.txt
head “Print” out the first 10 lines of a file; use flags to change this behaviour head file.txt, head -n 5 file.txt
tail “Print” out the last 10 lines of a file; use flags to change this behaviour tail file.txt, tail -n 5 file.txt
wc “Print” out the number of lines, words, and characters in a file; use flags to change this behaviour wc file.txt, wc -l file.txt

Searching and Sorting Commands

command description example(s)
grep “Print” out the lines of the input file(s) that contain a specific string. grep "berry" fruits.txt veggies.txt shows the lines in both fruits.txt and veggies.txt that contain the string "berry".
sort “Print” out the contents of the input file, but with lines sorted lexicographically. sort file.txt
uniq “Print” out the contents of the input file, but remove (adjacent) repeated lines. Often used with sort. uniq file.txt
find Searches the filesystem for file(s) that match a pattern. find -name "*.java" finds all files in the current directory and its subdirectories that end in .java. Note that this is more powerful than ls *.java!

All of these commands have many options. You’ll have to look at the man pages for sort, uniq, and find for your homework!

Compiling and Running java programs

  • javac HelloWorld.java compiles the contents in HelloWorld.java (can replace this with any other .java file)
  • java HelloWorld runs HelloWorld.java
  • java HelloWorld.java compiles and runs HelloWorld.java

Standard Streams

Processes (~ programs) in Unix have 3 standard streams, which are “abstract” locations that tell a process where to read input from and write output to. They are:

stream Java equivalent description
Standard Input (stdin) System.in where the program gets user input; defaults to terminal input
Standard Output (stdout) System.out where the program sends “normal” output; defaults to printing to terminal
Standard Error (stderr) System.err where the program sends error output; defaults to printing to terminal

Many commands will default to using stdin for input when some arguments aren’t provided; for example, try running grep "a" and then typing many sentences into your terminal.

As a programmer, you don’t have to worry about exactly how these work; the shell and operating system manage them for you! However, we’ll often “redirect” these streams elsewhere.

Input and Output Redirection

>: Standard Output Redirection

The > operator allows you to execute a command and redirect its standard output to the given file, instead of printing it to the console.

For example,

$ grep "berry" fruits.txt > berries.txt

finds all lines which contain "berry" in fruits.txt, and writes it to berries.txt instead of printing it to the console.

The > operator overwrites the file. In contrast, the >> operator appends to a file.

$ grep "berry" fruits.txt >> berries.txt

The left-hand side of > and >> should be a command. The right-hand side of > and >> should be the name of a file.

<: Standard Input Redirection

The < operator allows you to use the contents of a file as the contents of standard input, instead of what the user types in to the console.

For example, the following command finds lines containing "berry" within fruits.txt:

$ grep "berry" < fruits.txt

This looks similar to using grep "berry" fruits.txt, but there is a slight nuance; when using <, grep is now reading from standard input (not directly from a file). This difference becomes more important in later sections!

The left-hand side of < should be a command. The right-hand side of < should be the name of a file.

2>: Standard Error Redirection

The 2> operator allows you to execute a command and redirect its standard error to the given file, instead of printing it to the console.

For example,

$ javac HasSomeErrors.java 2> errors.txt

would compile HasSomeErrors.java, and write any error output to errors.txt instead of printing it to the console.

Note that > and 2> can point to different files; this is helpful in splitting up logs and debugging.

$ javac *.java > output.txt 2> errors.txt

The left-hand side of 2> should be a command. The right-hand side of 2> should be the name of a file.

Pipes

The | operator is called a pipe. You use pipes two “link” two commands together. Consider:

$ command1 | command2

In order, the |:

  1. executes command1
  2. then, executes command2, using the standard output of command1 as the standard input to command2

Conceptually, it is shorthand for the following sequence of commands:

  1. command1 > filename
  2. filename < command2
  3. rm filename

Week 3

More command line operators

And (&&)

The and operator (double ampersand) is put between two commands, e.g. command1 && command2:

  • if command1 succeeds, it then runs command2
  • if command1 fails (e.g. when running javac CompilerErrors.java and getting a compilation error), then command2 is not run
  • useful when command2 depends on command1 succeeding

(this behaviour comes from short-circuiting in Boolean expressions)

Or (||)

The or operator (double pipe) is put between two commands, e.g. command1 || command2:

  • if command1 succeeds, then command2 is not run
  • if command1 fails (e.g. when running javac CompilerErrors.java and getting a compilation error), it then runs command2
  • useful when command2 is a fallback for command1

(this behaviour comes from short-circuiting in Boolean expressions)

Then (;)

The then operator (semicolon) is put between two commands, e.g. command1 ; command2. It runs command1 and then command2, regardless of whether or not command1 succeeded or failed.

echo

echo is a command that prints out its argument(s) to standard output. It’s the shell equivalent of System.out.print() in Java or print() in Python.

$ echo "Hello, world"
Hello, world

xargs: convert stdin to arguments

In Week 2, we talked about standard input (stdin) and command-line arguments being different concepts. Frequently, you want to convert standard input to an argument (often as part of chaining many | commands). xargs is a command that lets you do just that!

For example, consider the following command:

$ ls *.java | xargs javac
  • ls *.java outputs lines to standard output
  • | takes the standard output of ls *.java and sends it to the command to its right as standard input
  • but, javac only takes in files to compile as arguments - not standard input
  • so, we use xargs to convert the output of ls *.java to arguments for javac
  • you can think of this as a short form for these three commands: 1. ls *.java > toCompile.txt 2. xargs javac < toCompile.txt 3. rm toCompile.txt

find: recursively search directories

Using ls *.java only gives you the files in the current directory that have the .java extension. The find command lets you search within subdirectories as well.

For example, the following command finds all Java source files in the current directory and its subdirectories.

$ find -name "*.java"

You will often pair find with xargs (to apply an operation to all files that match a pattern). For example, the following compiles all Java files in the current directory and its subdirectories.

$ find -name "*.java" | xargs javac

cut: simplify complex strings

The cut command lets you manipulate strings from standard input.

The -c (character) flag lets you get characters at certain (1-indexed) indices or ranges. Here are a few examples:

$ echo "abcdef" | cut -c2
b
$ echo "abcdef" | cut -c2-5
bcde
$ echo "abcdef" | cut -c2,1,4
bad

The -d (delimiter) flag lets you split up input into (1-indexed) fields that are separated by the delimiter, which you can then access with the -f (field) flag. A common example is to parse CSVs (comma-separated values):

$ echo "a,b,c,d,e,f" | cut -d, -f1
a

Application: parsing logs

In the video, we showed one application of cut: parsing complicated log files. On attu, /cse/web/courses/logs/common_log outputs a log of all requests the CSE courses websites get. However, just using tail on this gives us too much information - it’s hard to focus!

(try running tail -f /cse/web/courses/logs/common_log yourself first)

Instead, we can use the power of pipes and cut to pull out a specific field of the log, like the requested page.

$ tail -f /cse/web/courses/logs/common_log | cut -d\" -f4

Here, we’re escaping the " as it has a special meaning in the shell, and we’re grabbing the 4th field.

We can also use the stdbuf command to get more instant, non-buffered input, and look for requests just to 391:

$ tail -f /cse/web/courses/logs/common_log | stdbuf -oL cut -d\" -f4 | grep "391"

Week 4

Definitions

  • version control: software that keeps track of changes to a set of files
  • repository (repo): a location that stores a copy of all of the files for a project
  • git: a command-line tool and version control system for software; used everywhere
  • GitHub and GitLab: platforms (and companies) that help host git repositories

Repositories and Remote

What goes in a repository and what doesn’t? Rule of thumb: everything you need to build your project from source, and nothing more!

  • include: source code files (e.g. .java, .c files), build and config files (e.g. Makefile), assets (e.g. images), documentation
  • not include: object files (e.g. .class or .o files), executables (e.g. .exe or .app files)
  • depending on your situation: library and dependency files

Git is a distributed version control system:

  • every user has a copy of the entire repository (including all the files and a history of changes)
  • users make changes on their own local repositories, and share them with others by “pushing” these changes (or “pull” changes from other users’ repositories)

Frequently, you will have a “special” repository called remote:

  • the remote repository is the “main” copy of the code or the “source of truth”
  • developers will push/pull changes to/from remote
  • remote is often hosted remotely (i.e. not on a developer’s computer) by services like GitHub or GitLab

Conceptual: Four Phases of Git

Fundamentally, Git stores all data as a set of changes to files. Changes can be in one of four phases:

  1. Working Directory (Working changes, what’s on your computer)
    1. You can move these changes/files to the 2nd phase (Staging Area) by staging your files using git add or git stage. This is essentially getting your changes ready, prepared, and in draft mode (preparing them for a commit later on). This is relatively easy to undo (git restore –staged <file>).
  2. Staging Area/Index (change’s you’re preparing to commit)
    1. You can move your stages files to your local repo, by committing them using git commit. This saves your changes to your local repo, and is more difficult to undo.
  3. Local Repository (a local copy of the repo with your committed changes)
    1. You can move your changes to the 4th phase (remote repository) by pushing them, using git push. This is the hardest to reverse.
  4. Remote Repository (remote shared repository)

Basic Git Commands

command description
git clone <url> [dir] Make a local copy of the git repository at <url>
git add <file> ... Add the changes made to each <file> to the staging area
git commit Create a “commit” that captures all the changes in the staging area; requires a commit message
git push Push changes from your local repository to the remote repository
git status View status of files in the working directory and staging area
git log View history of commits in reverse chronological order (use --graph --oneline for a concise visualization)
git diff Show differences for changes between working directory and staging (or use --staged for staging and last commit)
git revert <commit> Reverts the given commit by adding a new commit that undoes the changes

Anatomy of a git log (commits and hashes)

Each entry of git log is a single commit. For example, here is one commit:

commit 8669021427dfff099b25adae3616e4cca9461cf4
Author: Matt Wang <mxw@cs.washington.edu>
Date:   Tue Jul 2 01:32:36 2024 -0700

    Create "Using CSE GitLab" page

There are a lot of things going on here!

  • the hash is the long string that appears after commit (in this case, 8669021427dfff099b25adae3616e4cca9461cf4)
    • a hash uniquely (*) identifies a commit; we will use these in the next section
    • we can often refer to a hash by its first seven characters, e.g. 8669021
  • each commit also has an author (a name + email, configured by git config) and a timestamp
  • each commit has a commit message (this is what you wrote in git commit -m)

Note

While this is not the focus of this class, “hashes” are a fascinating part of computer science with deep connections to cryptography, computer security, and math. Roughly speaking, these “hashes” are similar to the “hash” you use in a HashMap. It is not strictly true that all commit hashes are unique; see the “SHAttered” paper for more. See also “Hash function” on Wikipedia.

Commits, Branches, and History

In git, commits are a group of changes. Each commit builds on top of a previous commit, similar to a linked list.

In contrast, a branch is a pointer (or reference) to a specific commit.

This diagram shows a simple git history:

gitGraph
    commit id: "A"
    commit id: "B"
    commit id: "C" tag: "HEAD"

In this example, we have three commits: A, B, and C. The HEAD tells us where our local copy of the repository is (at commit C). This means that our local repository has, in order:

  1. the changes from A
  2. then, the changes from B
  3. finally, the changes from C

The main branch refers to the commit C here; though, as C builds on top of B and A, we can also think of the branch as containing the history of the project up and until C.

Branching and Merging

When working on a new feature or bugfix, you will often create a new branch to work on your changes. That way, your changes won’t affect others who are working off of the main branch (or their own branches).

Once you’re ready to add your changes to the main branch, you will need to merge your feature branch in. To do so,

  1. go to the branch that you want to receive the changes (typically, main)
  2. run git merge feature (where feature is the name of your branch)
  3. if necessary, resolve any merge conflicts
    • occurs when git can’t automatically merge commits (usually due to conflicting changes)
    • you need to edit each file to have the “correct” behaviour after the merge (often editing the lines with <<<< HEAD and ====)
    • once they are all complete, git add your changes and run git commit
  4. finish the merge commit (by using the default commit message and/or editing it)
  5. if necessary, run git push to update remote with the change

Branching and Merging Commands

command description
git branch <name> Creates a new branch with the provided name.
git checkout <branch> Switches your local repository to a different branch (i.e., moves the HEAD)
git switch <branch> Same as git checkout (for the purposes of this class)
git checkout -b <branch> Like git checkout, but creates the branch if it doesn’t exist
git merge <other-branch> merges the “other branch” into your current branch, updating your current branch. Can cause a merge conflict!!

Week 5

Remote, origin, and syncing

As a reminder: the remote repository is the central source of truth for code. In teams, everybody typically syncs their changes with remote (rather than directly with each other).

Branches on the remote repository are often prefixed with origin/, e.g. origin/main is the main branch on the remote repository. Other than this prefix, you should think of them as “normal” branches - pointing to a specific commit.

To sync changes with remote, you’ll run the git push and git pull commands. These will sync your local branch (e.g. main) with the remote version (e.g. origin/main). In this model,

  • git push updates origin/main with the changes you’ve made in main
  • git pull updates main with the changes from origin/main

With only one person making changes, this is pretty straightforward. But, things get harder when multiple changes get involved!

Example: resolving conflicts with remote

Imagine that you have a local and remote repository, both with a main branch A and B:

---
title: Local
---
gitGraph
    commit id: "A"
    commit id: "origin/main -> B" tag: "HEAD"
---
title: Remote
---
gitGraph
    commit id: "A"
    commit id: "B"

Next, imagine that you make a change to your local repository called C, while your coworker adds a different change D. The graph would look like this:

---
title: Local
---
gitGraph
    commit id: "A"
    commit id: "origin/main -> B"
    commit id: "C" tag: "HEAD"
---
title: Remote
---
gitGraph
    commit id: "A"
    commit id: "B"
    commit id: "D"

Note that at this point, origin/main still points at B: your local repository doesn’t know about these changes yet.

Running git push here will give you an error (usually something like "error: failed to push some refs to REMOTE"). This is because git doesn’t know how to resolve the history: C and D both point at B, so it’s not clear how to “combine” them. This can be complicated by the commits touching the same file.

To fix this, you’ll:

  1. first, run git fetch, which updates your local repository’s origin/main to point at D
---
title: Local (after git fetch)
---
gitGraph
    commit id: "A"
    commit id: "B"
    branch origin/main
    commit id: "orign/main -> D"
    checkout main
    commit id: "C"
  1. then, run git merge origin/main, which merges origin/main into main – fixing the issue with C and D locally
    • if necessary, address any merge conflicts here
---
title: Local (after the merge)
---
gitGraph
    commit id: "A"
    commit id: "B"
    branch origin/main
    commit id: "orign/main -> D"
    checkout main
    commit id: "C"
    merge origin/main id: "M" tag: "HEAD"
  1. finally, run git push, which will push C and the merge commit M to the remote – fixing the issue with C and D remotely
---
title: Remote (after the merge)
---
gitGraph
    commit id: "A"
    commit id: "B"
    commit id: "D"
    commit id: "C"
    commit id: "M" tag: "HEAD"

Pull and Merge Requests

You will often not use git merge directly with remote - instead, you’ll use a platform like GitHub or GitLab. This simplifies some of the process and lets you work with others!

In particular, you will use a tool called a merge request (GitLab’s term) or pull request (GitHub’s term). This lets you “prepare” a merge, but also lets you get feedback from others on code (a big part of software engineering).

The sketch of how to use a pull or merge request is:

  1. create a local branch and make some commits
  2. push those commits (and that local branch) to remote
  3. open a pull or merge request on GitHub/GitLab: here, you’re “requesting” to merge your code
  4. work with others (e.g. they leave comments on your code, run some tests, …)
  5. hit “merge” on the request, which merges into main (or other branch)

Example: creating a merge request

Continuing from our previous example, let’s make a new branch called feature with git checkout -b feature; we’ll add a new commit called X:

---
title: Local
---
gitGraph
    commit id: "A"
    commit id: "B"
    branch origin/main
    commit id: "D"
    checkout main
    commit id: "C"
    merge origin/main id: "origin/main -> M"
    branch feature
    commit id: "X" tag: "HEAD"

Now, assume that someone else has added a change Y to your remote’s main. We can still git push our change to the remote - since we’re using a different branch than main, we don’t encounter a conflict just yet.

---
title: Remote
---
gitGraph
    commit id: "A"
    commit id: "B"
    branch origin/main
    commit id: "D"
    checkout main
    commit id: "C"
    merge origin/main id: "M"
    branch feature
    commit id: "X"
    checkout main
    commit id: "Y"

However, if we want to merge origin/main and feature, we’ll once again have some sort of conflict! However, what you’ll typically do instead of locally merging is:

  1. open a merge/pull request on GitLab/GitHub
  2. after the request is created, you can fix the conflict on GitLab or GitHub; this will create a commit on your feature branch (not on main) - M1 in the diagram
  3. after getting approval, you can hit “merge”: this will merge feature into main, creating a merge commit on main (not on feature) - M2 in the diagram
---
title: Remote
---
gitGraph
    commit id: "A"
    commit id: "B"
    branch remote-fetch
    commit id: "D"
    checkout main
    commit id: "C"
    merge remote-fetch id: "M"
    branch feature
    commit id: "X"
    checkout main
    commit id: "Y"
    checkout feature
    merge main id: "M1"
    checkout main
    merge feature id: "feature -> M2" tag: "main"

Now, the remote’s main branch has both the changes Y and X, as well as the merge commits M1 (main -> feature) and M2 (feature -> main). You can safely delete feature.

More on reverting commits

There are different ways to revert a commit.

  • git revert <commit> reverts a specific commit by adding a new commit to the history that undoes the changes
    • you can pass a commit hash, or a relative term like HEAD (most recent commit)
    • good because you preserve the previous commit history – easy to work with others, can “undo the undo”
    • bad because secrets (e.g. passwords, APIs, embarassing typos) still remain in the total history
  • git reset <commit> moves your HEAD and working directory to a commit; you can pair this with “force-pushing” to truly remove commits from history
    • but, this changes the remote’s git history, which can break collaboration with others
    • generally not recommended unless you know what you’re doing

General advice: use git revert, and don’t alter the history of the main branch (unless you know what you’re doing)!

.gitignore

.gitignore is a special file you can create in a git repository. This file tells git to ignore certain patterns of files.

For example, you almost always don’t want to commit .class files in Java projects. To tell git to not “look” at these files, you can create a file called .gitignore and add the following:

*.class

The .gitignore uses a similar “glob” syntax to find -name, so you can use these ideas interchangeably.