CSE 374, Lecture 17: git

We've been talking about tools that you can use in the terminal and for programming, and today we'll continue with version control.

Version Control

Programmers use a group of technologies called "version control" to make them more productive and effective. Version control was developed to address three problems:

  1. Backups. In order to prevent loss of data when your computer crashes or dies, you should back up your work somewhere else.
  2. Collaboration. If you're working on a large project with someone else, how do you collaborate? Do you email files back and forth? Do you save things in a common Google Drive folder? Neither of these are very scalable solutions if you have a large project with many files and many collaborators. How do you deal with conflicting changes from differing people? How do you keep track of which version is the "final" version?
  3. Version log. Have you ever made a mistake and tried to undo it with CTRL-Z, but been unable to do so because Word or whatever program you were working with wouldn't go back far enough? Version control helps with this problem by keeping a log of all previous changes and allowing you to retrieve those versions whenever you like.

Version control solves these three problems by managing files and coordinating how they are shared across computers. We'll discuss the theory behind it and then how to use it.

There are many different actual programs that do version control: git, subversion, mercurial, perforce, and others. Each of these works in a slightly different way, but the concepts are similar and can be extended from one to the other. We'll use git in this class.

None of these version control systems is language-specific or file-type-specific. Commonly people store source code in a version control system, but you can store whatever type of files you like! We use a git repository for storing course administrative files related to CSE 374, for example, which consists of source code, html files, PowerPoint presentations, pdfs, Word documents, and text files. Some people use source control for everything they do. (Note however that version control systems were originally built and optimized for text files - while you can store photos and videos in them, this may be less efficient due to how version control stores changes).

Finally, it's totally ok not to memorize all of the commands that we'll talk about! Know the concepts and the basics, and look up the rest as you need it.

Theory

The most traditional type of version control system is called a "distributed version control system."

The distributed version control system is powerful, but in large projects with a lot of collaborators, it can be infeasible for every person to pull changes from every other person. As an alternative to distributed version control, many projects use a "centralized version control" system:

We'll use a central repository model for CSE 374, using a service called "GitLab" to maintain the central shared repository.

           --------------------
          | Central repository |
          |      (GitLab)      |
           --------------------
         -------> | R | <-------
        |          ---          |
        |           ^           |
        |           |           |
        |           |           |
        v           v           v
       ---         ---         ---
      | R |       | R |       | R |
     -------     -------     -------
    | Alice |   |  Bob  |   | Carol |
     -------     -------     -------

       Centralized version control

Typical terminology for tasks that you'll accomplish with version control:

git

In CSE 374, we will be using git as our version control system, with Gitlab as our central repository (very similar to Github).

There are three main steps to getting a repository set up in git:

  1. Create a repository. In Gitlab this can be accomplished by selecting the "+ New Project" button in the Gitlab UI.
  2. Set up authentication. In order to use git to collaborate, you need to have a way to prove that you are you and are allowed to access the repository (as opposed to some other member of the class). We use ssh keys to authenticate (same technology as we use to connect to klaatu via SSH), and you'll have to create your own keys to use with Gitlab - instructions are linked on the course webpage.
  3. Clone the repository onto your local computer using the "git clone" command.
        $ cd where-you-want-to-put-it
        $ git clone git@gitlab.cs.washington.edu:path/to/repo
    

A typical workflow for working on code in a git repository is as follows:

    # Get the latest version of the code from the central repository.
    # Pull often to prevent merge conflicts (see below).
    $ git pull

    # Edit the files
    $ emacs main.c

    # Check the status - what files have changed?
    $ git status

    # Mark the file "main.c" as ready for the next commit.
    $ git add main.c

    # View the line-by-line differences between the last commit and
    # any uncommitted local changes.
    $ git diff

    # Actually commit the change to git - "save" it. Commit messages
    # should be descriptive of what you changed to help others understand
    # what changed.
    $ git commit -m "increased max line length from 100 to 200"

    # View the history of all commits in the repository
    $ git log

    # Push the new commit to the central repository.
    $ git push

Gotchas and more advanced things (use "man git" to learn more):

Merging

git works easily if there's only one person working on a repository, but whenever more than one person is working on code at once, you run the risk of "merge conflicts". A merge conflict occurs when two people make changes to their own working copies and then try to push those changes to the central repository. The first person's push will succeed, but when the second person does "git pull" prior to pushing their code, they may encounter a merge conflict.

If git detects a merge issue (the same line of code edited in two non-sequential commits, i.e. commits made at the same time), it will do its best to try to resolve the issue on its own. As long as the two commits didn't touch the same line, the conflict should be resolved automatically. But if the commits did touch the same line of code, you will have to fix the conflict manually.

git will tell you which files had merge conflicts (use git status to see conflicts), and the files will be edited to identify the conflict:

    <<<<<<<< HEAD
    for (int i=0; i<10; i++)
    ===============
    for (int i=0; i<=10; i++)
    >>>>>>>> master

You must modify the section to contain the code you want, then save, add, and commit the merge.

.gitignore

You can store any files that you'd like in git, but there is a certain class of files that you should NOT store in git, because they are unnecessary and pollute the environment.

Since it can be a pain to have to remember not to add these files to your commit, git allows you to create a file called ".gitignore", which is stored in the root directory of your git repository and contains a list of files (using "*" if you'd like) to ignore:

    # emacs backup files
    *~

    # OS X finder info files
    .DS_Store

    # built object files
    *.o

Summary