In Lecture 2, you worked with the playground repository. You learned how to navigate the repository from the Git
point of view, make changes to the repo, and work with the remote repo.
One very important topic in Git
involves the concept of the branch. You will work extensively with branches in any real project. In fact, branches are central to the Git
workflow. In this portion of the lecture, we will discuss branches with Git
.
For more details on branches in Git
see Chapter 3 of the Git
Book: Git Branching - Branches in a Nutshell.
As you might have noticed by now, everything in Git
is a branch. We have branches on remote (upstream) repositories, copies of remote branches in our local repository, and branches on local repositories which (so far) track remote branches (or more precisely local copies of remote repositories).
Begin today's lecture by entering your playground
repository from last lecture. Note that the following cell is not necessary for you. I have to re-clone the repo since I'm in a new notebook. You should just keep working like you were before.
In [1]:
%%bash
cd /tmp
rm -rf playground #remove if it exists
git clone https://github.com/dsondak/playground.git
Once you're in your playground
repo, you can look at all the branches and print out a lot of information to the screen.
In [2]:
%%bash
cd /tmp/playground
git branch -avv
All of these branches are nothing but commit-streams in disguise, as can be seen above. It's a very simple model that leads to a lot of interesting version control patterns.
Since branches are so light-weight, the recommended way of working on software using git is to create a new branch for each new feature you add, test it out, and if good, merge it into master. Then you deploy the software from master. We have been using branches under the hood. Let's now lift the hood.
branch
Branches can also be created manually, and they are a useful way of organizing unfinished changes.
The branch
command has two forms. The first:
git branch
simply lists all of the branches in your local repository. If you run it without having created any branches, it will list only one, called master
. This is the default branch. You have also seen the use of git branch -avv
to show all branches (even remote ones).
The other form of the branch
command creates a branch with a given name:
It's important to note that this new branch is not active. If you make changes, those changes will still apply to the master
branch, not my-new-branch
. That is, after executing the git branch my-new-branch
command you're still on the master
branch and not the my-new-branch
branch. To change this, you need the next command.
checkout
Checkout switches the active branch. Since branches can have different changes, checkout
may make the working directory look very different. For instance, if you have added new files to one branch, and then check another branch out, those files will no longer show up in the directory. They are still stored in the .git
folder, but since they only exist in the other branch, they cannot be accessed until you check out the original branch.
You can combine creating a new branch and checking it out with the shortcut:
Ok so lets try this out on the playground
repository.
In [3]:
%%bash
cd /tmp/playground
git branch mybranch1
See what branches we have created.
In [4]:
%%bash
cd /tmp/playground
git branch
Notice that you have created the mybranch1
branch but you're still on the master
branch.
Jump onto the mybranch1
branch...
In [5]:
%%bash
cd /tmp/playground
git checkout mybranch1
git branch
Notice that it is bootstrapped off the master
branch and has the same files. You can check that with the ls
command.
In [6]:
%%bash
cd /tmp/playground
ls
Note: You could have created this branch and switched to it all in one go by using git checkout -b mybranch1
.
Now let's check the status of our repo.
In [7]:
%%bash
cd /tmp/playground
git status
Alright, so we're on our new branch but we haven't added or modified anything yet; there's nothing to commit.
Let's add a new file. Note that this file gets added on this branch only! Notice that I'm still using the echo
command. Once again, this is only because jupyter
can't work with text editors. If I were you, I'd use vim
.
In [8]:
%%bash
cd /tmp/playground
echo '# Things I wish G.R.R. Martin would say: Finally updating A Song of Ice and Fire.' > books.md
git status
We add the file to the index, and then commit the files to the local repository on the mybranch1
branch.
In [9]:
%%bash
cd /tmp/playground
git add .
git status
In [10]:
%%bash
cd /tmp/playground
git commit -m "Added another test file to demonstrate git features" -a
git status
At this point, we have committed a new file (books.md
) to our new branch in our local repo. Our remote repo is still not aware of this new file (or branch). In fact, our master
branch is still not really aware of this file.
Note: There are really two options at this point:
We'll continue with the first option for now and discuss the other option later.
Ok, we have committed. Lets try to push!
In [11]:
%%bash
cd /tmp/playground
git push
Fail! Why? Because Git
didn't know what to push to on origin (the name of our remote repo) and didn't want to assume we wanted to call the branch mybranch1
on the remote. We need to tell that to Git
explicitly (just like it tells us to).
In [12]:
%%bash
cd /tmp/playground
git push --set-upstream origin mybranch1
Aha, now we have both a remote and a local for mybranch1
. We can use the convenient arguments to branch
in order to see the details of all the branches.
In [17]:
%%bash
cd /tmp/playground
git branch -avv
We make sure we are back on master
In [18]:
%%bash
cd /tmp/playground
git checkout master
What have we done?
We created a new local branch, created a file on it, created that same branch on our remote repo, and pushed all the changes. Finally, we went back to our master
branch to continue work there.
Now we'll look into option 2 above. Suppose we want to add a feature to our repo. We'll create a new branch to work on that feature, but we don't want this branch to be long-lived. Here's how we can accomplish that.
We'll go a little faster this time since you've seen all these commands before. Even though we're going a little faster this time, make sure you understand what you're doing! Don't just copy and paste!!
In [27]:
%%bash
cd /tmp/playground
git checkout -b feature-branch
In [28]:
%%bash
cd /tmp/playground
git branch
In [29]:
%%bash
cd /tmp/playground
echo '# The collected works of G.R.R. Martin.' > feature.txt
In [30]:
%%bash
cd /tmp/playground
git status
In [31]:
%%bash
cd /tmp/playground
git add feature.txt
git commit -m 'George finished his books!'
At this point, we've committed our new feature to our feature branch in our local repo. Presumably it's all tested and everything is working nicely. We'd like to merge it into our master branch now. First, we'll switch to the master branch.
In [32]:
%%bash
cd /tmp/playground
git checkout master
ls
The master branch doesn't have any idea about our new feature yet! We should merge the feature branch into the master branch.
In [33]:
%%bash
cd /tmp/playground
git merge feature-branch
In [34]:
%%bash
cd /tmp/playground
git status
ls
Now our master branch is up to date with our feature branch. We can now delete our feature branch since it is no longer relevant.
In [35]:
%%bash
cd /tmp/playground
git branch -d feature-branch
Finally, let's push the changes to our remote repo.
In [36]:
%%bash
cd /tmp/playground
git push
Great, so now you have a basic understanding of how to work with branches. There is much more to learn, but these commands should get you going. You should really familiarize yourself with Chapter 3 of the Git
book for more details and workflow ideas.
Commit early, commit often.
Git is more effective when used at a fine granularity. For starters, you can't undo what you haven't committed, so committing lots of small changes makes it easier to find the right rollback point. Also, merging becomes a lot easier when you only have to deal with a handful of conflicts.
Commit unrelated changes separately.
Identifying the source of a bug or understanding the reason why a particular piece of code exists is much easier when commits focus on related changes. Some of this has to do with simplifying commit messages and making it easier to look through logs, but it has other related benefits: commits are smaller and simpler, and merge conflicts are confined to only the commits which actually have conflicting code.
Do not commit binaries and other temporary files.
Git is meant for tracking changes. In nearly all cases, the only meaningful difference between the contents of two binaries is that they are different. If you change source files, compile, and commit the resulting binary, git sees an entirely different file. The end result is that the git repository (which contains a complete history, remember) begins to become bloated with the history of many dissimilar binaries. Worse, there's often little advantage to keeping those files in the history. An argument can be made for periodically snapshotting working binaries, but things like object files, compiled python files, and editor auto-saves are basically wasted space.
Ignore files which should not be committed
Git comes with a built-in mechanism for ignoring certain types of files. Placing filenames or wildcards in a .gitignore
file placed in the top-level directory (where the .git
directory is also located) will cause git to ignore those files when checking file status. This is a good way to ensure you don't commit the wrong files accidentally, and it also makes the output of git status
somewhat cleaner.
Always make a branch for new changes
While it's tempting to work on new code directly in the master
branch, it's usually a good idea to create a new one instead, especially for team-based projects. The major advantage to this practice is that it keeps logically disparate change sets separate. This means that if two people are working on improvements in two different branches, when they merge, the actual workflow is reflected in the git history. Plus, explicitly creating branches adds some semantic meaning to your branch structure. Moreover, there is very little difference in how you use git.
Write good commit messages
I cannot understate the importance of this.
Seriously. Write good commit messages.