Git Basics

The recommendation for this course is to fork this repo, take any notes/do the assignment in your fork, add the original as a remote, and regularly pull any changes that I upload to the main repo. If you understand what all that means and know how to do it, you can probably skip the rest of this notebook. If not, read on!

Introduction

Git is a popular form of version control. When used properly, Git allows you to track the evolution of your files over time and eliminates the need to keep multiple copies of the same code when experimenting with breaking changes. For example, you can see the history of how the materials for this workshop were prepared here. Git is also an effective way to coordinate the collaboration of multiple users working on the same code base. I've chosen to put the materials of this TensorFlow workshop in a GitHub repository to facilitate getting this material out to the Duke-Tsinghua MLSS class, while also making it easy to send out any updates to these lecture materials.

Installation

If you don't already use Git or have it installed, I highly recommend it integrating it into your workflow, not just for this workshop, but for your class work, work work, or research. Follow the instructions corresponding to your operating system.

Ubuntu

There's a good chance you already have Git installed. Check first by typing into your command line:

git --version

If you get a "git: command not found", Git can be easily installed with:

sudo apt-get install git

Once installed, Git commands can be typed directly into the command line.

Windows

If you're on Windows and you don't remember installing Git, you probably don't have it. Download Git for Windows here, run the set-up .exe, and follow the instructions. If you're unsure, stick to the default settings.

If you chose the default installation options, you should now have Git BASH installed, a Linux/Unix style BASH emulator. See the explanation of git status for a screenshot of what it looks like. Git commands can be typed directly into this shell.

Mac OS X

First, check if you have Git installed already by typing the following into a Terminal:

git --version

Download Git here and follow the instructions for installation. If you're unsure, stick to the default settings.

You can verify your installation by typing git --version again into your terminal.

GUI versions of Git (optional)

If command lines make you nervous, there are lots of free Git GUIs out there. If you'd rather use a GUI, you're free to download whichever you like, but as a recommendation, I personally like using Atlassian's SourceTree:

The rest of this notebook is written for command line Git, but the concepts for Git with a GUI are the same, so if you're unfamiliar with Git, read on and try to find the matching buttons in your GUI.

Usage for this class

Git is capable of quite a bit, and learning all the commands available would both take too long and be total overkill for this workshop. If you want to understand the commands you're about to type, read the next section (Important Commands and Terminology to Know) first, but if you just want to get up and going:

Fork and clone

  1. Go to the class's repo and click "Fork" in the upper right corner:

  2. If you don't have an account, make one at the prompt. If you have one already, sign in.

  3. You should arrive at the page of your shiny new fork of the class repository. Click on the green "Clone or download" button on the right side of the page. Copy the HTTPS, and in your command line/Git Bash, enter the following:

    git clone [PASTE_HTTPS_HERE]
    

That's it! You now have a local copy of the class Git repo to play around with.

Committing changes and pushing

Throughout the class, feel free to modify any of the files however you want (e.g. to take notes); in fact, for the homework, you'll have to modify files. Any time you want to make a checkpoint of your files, you do that by commiting and pushing your changes. To do that:

  1. Check your changes. This isn't a required step, but always recommend:
    git status
    
  2. Stage the changes you want to commit. For example, to add your changes for your homework:
    git add 01B_MLP_CNN_Assignment.ipynb
    
  3. Commit your changes, adding a commit message at the same time:
    git commit -m "Yay, finished the homework"
    
  4. Push your changes to your fork:
    git push
    
    You'll be asked to verify your usename and password, and assuming you type those in correctly, your changes should appear on your GitHub page.

Pulling updates

I may occasionally make updates to the materials in this repository. You can sync your fork to mine by first adding the main repo as a remote (you only have to do this once), and then doing a pull:

  1. Add the main repo as a remote (only do this once):
    git remote add kevinjliang https://github.com/kevinjliang/Duke-Tsinghua-MLSS-2017
    
  2. Pull from the upstream repo to sync:
    git pull kevinjliang
    

Important Commands and Terminology to Know

My personal philosophy with Git commands is the same as with Linux/Unix commands: learn the basic commands that you use for 95% of your workflow, be generally aware of what other stuff is possible, and Google what you don't know (or don't remember). Here are some concepts to know:

Repository (Repo)

A repository is a data structure that holds all the metadata of which files to track, a historical record of changes, commit objects, etc. All files in the root directory and any subdirectories can be tracked by the repo.

The repository on your machine that you work in is your local repository. When working in a collaborative setting, we also keep a copy of our code on a remote server (for example, provided by GitHub). Usually, I treat this remote copy as the "official" or "master" version of the code, and the local copy as my working copy. We interact with this remote repo through pushing and pulling, which are explained below.

GitHub

A web-based version of Git that allows you browse, search, and host Git repositories. If you don't already have an account, you should sign up for one; it's free, and you'll need it to fork this repo. GitHub provides a lot of really cool features, so I'd recommend exploring it on your own time when you get the chance. The materials for this workshop are hosted in this GitHub repository.

Fork

A way to create your own copy of someone else's repo, under your username. Unless you are the owner or listed as a collaborator for a repo, you won't have the ability to make any changes to the code. Forking creates a personal copy where you are the owner; thus it's good for playing around with someone else's code, using an existing repository as a starting point for your own project, or for class settings like this.

Clone (git clone)

Creates a local copy of a Git repository on your machine. The repo's directory will be like any other directory on your computer, but you'll have the ability to use the various git commands. To clone:

git clone [REPOSITORY_LINK]

Status (git status)

Shows which files have changed since the latest commit, which changes have been staged, your current branch, and other useful things.

git status

Produces something like:

Stage (git add)

Marks a change as one to be included in a commit. This allows you to choose which files to include:

git add [FILES_TO_ADD_SEPARATED_BY_SPACES]

Why not add all files? Generally, you want your commit history to tell a story of your development. As such, each commit should aim to capture one major change or added feature, even if you're working on multiple things at once. Also, sometimes, you're not ready to push a certain change yet. For example, although I want to be able to test the homework solutions to make sure they work, I don't quite want to publish them yet.

If you do find yourself wanting to stage everything though (which does happen quite often), you can do that with the following additional flag:

git add -A

Commit (git commit)

Saves your staged changes to the repository. This is how you add an entry to your Git commit history. Typically, it's useful to include a short summary of what is encapsulated in the commit. Typing the command git commit will get the job done, but it'll take you to a vim editor to type your message, which can be jarring if you're unfamiliar with vim. For larger projects with many collaborators, writing very descriptive commit messages is important, and you should absolutely learn to use the editor to write effective messages. For personal use though, it can be a bit much. Instead, I like to use the following shortcut:

git commit -m "[YOUR_COMMIT_MESSAGE_HERE]"

This commits your changes and saves your commit message in one line.

Pull (git pull)

Fetches any new changes from the remote server and merges them with your local copy of the repo. This'll be helpful for receiving any of the changes I make to the course materials.

git pull

If you cloned your own repo from GitHub, this automatically point to that remote's repository. You can add other remotes to pull from with the following:

git remote add [REMOTE_NAME] [REMOTE_REPOSITORY_LINK]

You can then pull from one of the other remotes:

git pull [REMOTE_NAME]

Warning: You should make sure your changes are not only saved but also committed before trying to pull. Accidentally clobbering all your hard work is a very painful and all-too-common mistake when it comes to version control systems.

Push (git push)

Shares your commits with a remote repository. Note, you must first make sure your local repo is synced with the remote before you push and resolve any of the change conflicts that exist.

git push

By default, if you cloned your own repo from GitHub, push should point to your remote repository.

Other tutorials and resources