The recommendation for this course is to fork this repo, take any notes/do the assignment in your fork, add the original as a remote, and regularly pull any changes that I upload to the main repo. If you understand what all that means and know how to do it, you can probably skip the rest of this notebook. If not, read on!
Git is a popular form of version control. When used properly, Git allows you to track the evolution of your files over time and eliminates the need to keep multiple copies of the same code when experimenting with breaking changes. For example, you can see the history of how the materials for this workshop were prepared here. Git is also an effective way to coordinate the collaboration of multiple users working on the same code base. I've chosen to put the materials of this TensorFlow workshop in a GitHub repository to facilitate getting this material out to the Duke-Tsinghua MLSS class, while also making it easy to send out any updates to these lecture materials.
If you don't already use Git or have it installed, I highly recommend it integrating it into your workflow, not just for this workshop, but for your class work, work work, or research. Follow the instructions corresponding to your operating system.
There's a good chance you already have Git installed. Check first by typing into your command line:
git --version
If you get a "git: command not found"
, Git can be easily installed with:
sudo apt-get install git
Once installed, Git commands can be typed directly into the command line.
If you're on Windows and you don't remember installing Git, you probably don't have it. Download Git for Windows here, run the set-up .exe, and follow the instructions. If you're unsure, stick to the default settings.
If you chose the default installation options, you should now have Git BASH installed, a Linux/Unix style BASH emulator. See the explanation of git status
for a screenshot of what it looks like. Git commands can be typed directly into this shell.
First, check if you have Git installed already by typing the following into a Terminal:
git --version
Download Git here and follow the instructions for installation. If you're unsure, stick to the default settings.
You can verify your installation by typing git --version
again into your terminal.
If command lines make you nervous, there are lots of free Git GUIs out there. If you'd rather use a GUI, you're free to download whichever you like, but as a recommendation, I personally like using Atlassian's SourceTree:
The rest of this notebook is written for command line Git, but the concepts for Git with a GUI are the same, so if you're unfamiliar with Git, read on and try to find the matching buttons in your GUI.
Git is capable of quite a bit, and learning all the commands available would both take too long and be total overkill for this workshop. If you want to understand the commands you're about to type, read the next section (Important Commands and Terminology to Know) first, but if you just want to get up and going:
Go to the class's repo and click "Fork" in the upper right corner:
If you don't have an account, make one at the prompt. If you have one already, sign in.
You should arrive at the page of your shiny new fork of the class repository. Click on the green "Clone or download" button on the right side of the page.
git clone [PASTE_HTTPS_HERE]
That's it! You now have a local copy of the class Git repo to play around with.
Throughout the class, feel free to modify any of the files however you want (e.g. to take notes); in fact, for the homework, you'll have to modify files. Any time you want to make a checkpoint of your files, you do that by commiting and pushing your changes. To do that:
git status
git add 01B_MLP_CNN_Assignment.ipynb
git commit -m "Yay, finished the homework"
git push
I may occasionally make updates to the materials in this repository. You can sync your fork to mine by first adding the main repo as a remote (you only have to do this once), and then doing a pull:
git remote add kevinjliang https://github.com/kevinjliang/Duke-Tsinghua-MLSS-2017
git pull kevinjliang
My personal philosophy with Git commands is the same as with Linux/Unix commands: learn the basic commands that you use for 95% of your workflow, be generally aware of what other stuff is possible, and Google what you don't know (or don't remember). Here are some concepts to know:
A repository is a data structure that holds all the metadata of which files to track, a historical record of changes, commit objects, etc. All files in the root directory and any subdirectories can be tracked by the repo.
The repository on your machine that you work in is your local repository. When working in a collaborative setting, we also keep a copy of our code on a remote server (for example, provided by GitHub). Usually, I treat this remote copy as the "official" or "master" version of the code, and the local copy as my working copy. We interact with this remote repo through pushing and pulling, which are explained below.
A web-based version of Git that allows you browse, search, and host Git repositories. If you don't already have an account, you should sign up for one; it's free, and you'll need it to fork this repo. GitHub provides a lot of really cool features, so I'd recommend exploring it on your own time when you get the chance. The materials for this workshop are hosted in this GitHub repository.
A way to create your own copy of someone else's repo, under your username. Unless you are the owner or listed as a collaborator for a repo, you won't have the ability to make any changes to the code. Forking creates a personal copy where you are the owner; thus it's good for playing around with someone else's code, using an existing repository as a starting point for your own project, or for class settings like this.
git clone
)Creates a local copy of a Git repository on your machine. The repo's directory will be like any other directory on your computer, but you'll have the ability to use the various git commands. To clone:
git clone [REPOSITORY_LINK]
git status
)Shows which files have changed since the latest commit, which changes have been staged, your current branch, and other useful things.
git status
Produces something like:
git add
)Marks a change as one to be included in a commit. This allows you to choose which files to include:
git add [FILES_TO_ADD_SEPARATED_BY_SPACES]
Why not add all files? Generally, you want your commit history to tell a story of your development. As such, each commit should aim to capture one major change or added feature, even if you're working on multiple things at once. Also, sometimes, you're not ready to push a certain change yet. For example, although I want to be able to test the homework solutions to make sure they work, I don't quite want to publish them yet.
If you do find yourself wanting to stage everything though (which does happen quite often), you can do that with the following additional flag:
git add -A
git commit
)Saves your staged changes to the repository. This is how you add an entry to your Git commit history. Typically, it's useful to include a short summary of what is encapsulated in the commit. Typing the command git commit
will get the job done, but it'll take you to a vim
editor to type your message, which can be jarring if you're unfamiliar with vim
. For larger projects with many collaborators, writing very descriptive commit messages is important, and you should absolutely learn to use the editor to write effective messages. For personal use though, it can be a bit much. Instead, I like to use the following shortcut:
git commit -m "[YOUR_COMMIT_MESSAGE_HERE]"
This commits your changes and saves your commit message in one line.
git pull
)Fetches any new changes from the remote server and merges them with your local copy of the repo. This'll be helpful for receiving any of the changes I make to the course materials.
git pull
If you cloned your own repo from GitHub, this automatically point to that remote's repository. You can add other remotes to pull from with the following:
git remote add [REMOTE_NAME] [REMOTE_REPOSITORY_LINK]
You can then pull from one of the other remotes:
git pull [REMOTE_NAME]
Warning: You should make sure your changes are not only saved but also committed before trying to pull. Accidentally clobbering all your hard work is a very painful and all-too-common mistake when it comes to version control systems.
git push
)Shares your commits with a remote repository. Note, you must first make sure your local repo is synced with the remote before you push and resolve any of the change conflicts that exist.
git push
By default, if you cloned your own repo from GitHub, push should point to your remote repository.