Git
Git is a command-line tool and version control system for software. Version control helps keep track of changes to a set of files. GitHub and GitLab are platforms (companies) that help host git
repositories.
Repositories and Remote
A repository (repo) is a location that stores a copy of all of the files for a project. What goes in a repository and what doesn’t? Rule of thumb: everything you need to build your project from source, and nothing more!
- include: source code files (e.g.
.java
,.c
files), build and config files (e.g.Makefile
), assets (e.g. images), documentation - not include: object files (e.g.
.class
or.o
files), executables (e.g..exe
or.app
files) - depending on your situation: library and dependency files
Git is a distributed version control system:
- every user has a copy of the entire repository (including all the files and a history of changes)
- users make changes on their own local repositories, and share them with others by “pushing” these changes (or “pull” changes from other users’ repositories)
Frequently, you will have a “special” repository called remote:
- the remote repository is the “main” copy of the code or the “source of truth”
- developers will push/pull changes to/from remote
- remote is often hosted remotely (i.e. not on a developer’s computer) by services like GitHub or GitLab
Four Phases of Git
Fundamentally, Git stores all data as a set of changes to files. Changes can be in one of four phases:
- Working Directory (Working changes, what’s on your computer)
- You can move these changes/files to the 2nd phase (Staging Area) by staging your files using
git add
orgit stage
. This is essentially getting your changes ready, prepared, and in draft mode (preparing them for a commit later on). This is relatively easy to undo (git restore –staged <file>).
- You can move these changes/files to the 2nd phase (Staging Area) by staging your files using
- Staging Area/Index (change’s you’re preparing to commit)
- You can move your stages files to your local repo, by committing them using
git commit
. This saves your changes to your local repo, and is more difficult to undo.
- You can move your stages files to your local repo, by committing them using
- Local Repository (a local copy of the repo with your committed changes)
- You can move your changes to the 4th phase (remote repository) by pushing them, using
git push
. This is the hardest to reverse.
- You can move your changes to the 4th phase (remote repository) by pushing them, using
- Remote Repository (remote shared repository)
Basic Git Commands
command | description |
---|---|
git clone <url> [dir] | Make a local copy of the git repository at <url> |
git add <file> ... | Add the changes made to each <file> to the staging area |
git commit | Create a “commit” that captures all the changes in the staging area; requires a commit message |
git push | Push changes from your local repository to the remote repository |
git status | View status of files in the working directory and staging area |
git log | View history of commits in reverse chronological order (use --graph --oneline for a concise visualization) |
git diff | Show differences for changes between working directory and staging (or use --staged for staging and last commit) |
git revert <commit> | Reverts the given commit by adding a new commit that undoes the changes |
Anatomy of a git log (commits and hashes)
Each entry of git log
is a single commit. For example, here is one commit:
commit 8669021427dfff099b25adae3616e4cca9461cf4
Author: Matt Wang <mxw@cs.washington.edu>
Date: Tue Jul 2 01:32:36 2024 -0700
Create "Using CSE GitLab" page
There are a lot of things going on here!
- the hash is the long string that appears after
commit
(in this case,8669021427dfff099b25adae3616e4cca9461cf4
)- a hash uniquely (*) identifies a commit; we will use these in the next section
- we can often refer to a hash by its first seven characters, e.g.
8669021
- each commit also has an author (a name + email, configured by
git config
) and a timestamp - each commit has a commit message (this is what you wrote in
git commit -m
)
!!! note While this is not the focus of this class, “hashes” are a fascinating part of computer science with deep connections to cryptography, computer security, and math. Roughly speaking, these “hashes” are similar to the “hash” you use in a HashMap
. It is not strictly true that all commit hashes are unique; see the “SHAttered” paper for more. See also “Hash function” on Wikipedia.
Commits, Branches, and History
In git
, commits are a group of changes. Each commit builds on top of a previous commit, similar to a linked list.
In contrast, a branch is a pointer (or reference) to a specific commit.
This diagram shows a simple git history:
gitGraph
commit id: "A"
commit id: "B"
commit id: "C" tag: "HEAD"
In this example, we have three commits: A
, B
, and C
. The HEAD
tells us where our local copy of the repository is (at commit C
). This means that our local repository has, in order:
- the changes from
A
- then, the changes from
B
- finally, the changes from
C
The main
branch refers to the commit C
here; though, as C
builds on top of B
and A
, we can also think of the branch as containing the history of the project up and until C
.
Branching and Merging
When working on a new feature or bugfix, you will often create a new branch to work on your changes. That way, your changes won’t affect others who are working off of the main
branch (or their own branches).
Once you’re ready to add your changes to the main
branch, you will need to merge your feature branch in. To do so,
- go to the branch that you want to receive the changes (typically,
main
) - run
git merge feature
(wherefeature
is the name of your branch) - if necessary, resolve any merge conflicts
- occurs when
git
can’t automatically merge commits (usually due to conflicting changes) - you need to edit each file to have the “correct” behaviour after the merge (often editing the lines with
<<<< HEAD
and====
) - once they are all complete,
git add
your changes and rungit commit
- occurs when
- finish the merge commit (by using the default commit message and/or editing it)
- if necessary, run
git push
to update remote with the change
Branching and Merging Commands
command | description |
---|---|
git branch <name> | Creates a new branch with the provided name. |
git checkout <branch> | Switches your local repository to a different branch (i.e., moves the HEAD ) |
git switch <branch> | Same as git checkout (for the purposes of this class) |
git checkout -b <branch> | Like git checkout , but creates the branch if it doesn’t exist |
git merge <other-branch> | merges the “other branch” into your current branch, updating your current branch. Can cause a merge conflict!! |
Remote, origin, and syncing
As a reminder: the remote repository is the central source of truth for code. In teams, everybody typically syncs their changes with remote (rather than directly with each other).
Branches on the remote repository are often prefixed with origin/
, e.g. origin/main
is the main
branch on the remote repository. Other than this prefix, you should think of them as “normal” branches - pointing to a specific commit.
To sync changes with remote, you’ll run the git push
and git pull
commands. These will sync your local branch (e.g. main
) with the remote version (e.g. origin/main
). In this model,
git push
updatesorigin/main
with the changes you’ve made inmain
git pull
updatesmain
with the changes fromorigin/main
With only one person making changes, this is pretty straightforward. But, things get harder when multiple changes get involved!
Resolving conflicts with remote
Imagine that you have a local and remote repository, both with a main
branch A
and B
:
---
title: Local
---
gitGraph
commit id: "A"
commit id: "origin/main - B" tag: "HEAD"
---
title: Remote
---
gitGraph
commit id: "A"
commit id: "B"
Next, imagine that you make a change to your local repository called C
, while your coworker adds a different change D
. The graph would look like this:
---
title: Local
---
gitGraph
commit id: "A"
commit id: "origin/main - B"
commit id: "C" tag: "HEAD"
---
title: Remote
---
gitGraph
commit id: "A"
commit id: "B"
commit id: "D"
Note that at this point, origin/main
still points at B
: your local repository doesn’t know about these changes yet.
Running git push
here will give you an error (usually something like "error: failed to push some refs to REMOTE"
). This is because git
doesn’t know how to resolve the history: C
and D
both point at B
, so it’s not clear how to “combine” them. This can be complicated by the commits touching the same file.
To fix this, you’ll:
- first, run
git fetch
, which updates your local repository’sorigin/main
to point atD
--- title: Local (after git fetch) --- gitGraph commit id: "A" commit id: "B" branch origin/main commit id: "orign/main - D" checkout main commit id: "C"
- then, run
git merge origin/main
, which mergesorigin/main
intomain
– fixing the issue withC
andD
locally- if necessary, address any merge conflicts here
--- title: Local (after the merge) --- gitGraph commit id: "A" commit id: "B" branch origin/main commit id: "orign/main - D" checkout main commit id: "C" merge origin/main id: "M" tag: "HEAD"
- if necessary, address any merge conflicts here
- finally, run
git push
, which will pushC
and the merge commitM
to the remote – fixing the issue withC
andD
remotely
---
title: Remote (after the merge)
---
gitGraph
commit id: "A"
commit id: "B"
commit id: "D"
commit id: "C"
commit id: "M" tag: "HEAD"
Pull and Merge Requests
You will often not use git merge
directly with remote - instead, you’ll use a platform like GitHub or GitLab. This simplifies some of the process and lets you work with others!
In particular, you will use a tool called a merge request (GitLab’s term) or pull request (GitHub’s term). This lets you “prepare” a merge, but also lets you get feedback from others on code (a big part of software engineering).
The sketch of how to use a pull or merge request is:
- create a local branch and make some commits
- push those commits (and that local branch) to remote
- open a pull or merge request on GitHub/GitLab: here, you’re “requesting” to merge your code
- work with others (e.g. they leave comments on your code, run some tests, …)
- hit “merge” on the request, which merges into
main
(or other branch)
Creating a merge request
Continuing from our previous example, let’s make a new branch called feature
with git checkout -b feature
; we’ll add a new commit called X
:
---
title: Local
---
gitGraph
commit id: "A"
commit id: "B"
branch origin/main
commit id: "D"
checkout main
commit id: "C"
merge origin/main id: "origin/main - M"
branch feature
commit id: "X" tag: "HEAD"
Now, assume that someone else has added a change Y
to your remote’s main
. We can still git push
our change to the remote - since we’re using a different branch than main
, we don’t encounter a conflict just yet.
---
title: Remote
---
gitGraph
commit id: "A"
commit id: "B"
branch origin/main
commit id: "D"
checkout main
commit id: "C"
merge origin/main id: "M"
branch feature
commit id: "X"
checkout main
commit id: "Y"
However, if we want to merge origin/main
and feature
, we’ll once again have some sort of conflict! However, what you’ll typically do instead of locally merging is:
- open a merge/pull request on GitLab/GitHub
- after the request is created, you can fix the conflict on GitLab or GitHub; this will create a commit on your
feature
branch (not onmain
) -M1
in the diagram - after getting approval, you can hit “merge”: this will merge
feature
intomain
, creating a merge commit onmain
(not onfeature
) -M2
in the diagram
---
title: Remote
---
gitGraph
commit id: "A"
commit id: "B"
branch remote-fetch
commit id: "D"
checkout main
commit id: "C"
merge remote-fetch id: "M"
branch feature
commit id: "X"
checkout main
commit id: "Y"
checkout feature
merge main id: "M1"
checkout main
merge feature id: "feature - M2" tag: "main"
Now, the remote’s main
branch has both the changes Y
and X
, as well as the merge commits M1
(main
to feature
) and M2
(feature
to main
). You can safely delete feature
.
Reverting commits
There are different ways to revert a commit.
git revert <commit>
reverts a specific commit by adding a new commit to the history that undoes the changes- you can pass a commit hash, or a relative term like
HEAD
(most recent commit) - good because you preserve the previous commit history – easy to work with others, can “undo the undo”
- bad because secrets (e.g. passwords, APIs, embarassing typos) still remain in the total history
- you can pass a commit hash, or a relative term like
git reset <commit>
moves yourHEAD
and working directory to a commit; you can pair this with “force-pushing” to truly remove commits from history- but, this changes the remote’s git history, which can break collaboration with others
- generally not recommended unless you know what you’re doing
General advice: use git revert
, and don’t alter the history of the main
branch (unless you know what you’re doing)!
.gitignore
.gitignore
is a special file you can create in a git repository. This file tells git
to ignore certain patterns of files.
For example, you almost always don’t want to commit .class
files in Java projects. To tell git
to not “look” at these files, you can create a file called .gitignore
and add the following:
*.class
The .gitignore
uses a similar “glob” syntax to find -name
, so you can use these ideas interchangeably.