Lecture 3: Branches with Git

In Lecture 2, you worked with the playground repository. You learned how to navigate the repository from the Git point of view, make changes to the repo, and work with the remote repo.

One very important topic in Git involves the concept of the branch. You will work extensively with branches in any real project. In fact, branches are central to the Git workflow. In this portion of the lecture, we will discuss branches with Git.

For more details on branches in Git see Chapter 3 of the Git Book: Git Branching - Branches in a Nutshell.


Branching

As you might have seen by now, everything in git is a branch. We have branches on remote (upstream) repositories, copies of remote branches in our local repository, and branches on local repositories which (so far) track remote branches (or more precisely local copies of remote repositories).

Begin by entering your playground repository from last lecture. Note that the following cell is not necessary for you. I have to re-clone the repo since I'm in a new notebook. You should just keep working like you were before.


In [1]:
%%bash
cd /tmp
rm -rf playground #remove if it exists
git clone https://github.com/dsondak/playground.git


Cloning into 'playground'...

In [2]:
%%bash
cd /tmp/playground
git branch -avv


* master                b3bd0fa [origin/master] Merge remote-tracking branch 'course/master'
  remotes/origin/HEAD   -> origin/master
  remotes/origin/master b3bd0fa Merge remote-tracking branch 'course/master'

And all of these branches are nothing but commit-streams in disguise, as can be seen above. Its a very simple model which leads to a lot of interesting version control patterns.

Since branches are so light-weight, the recommended way of working on software using git is to create a new branch for each new feature you add, test it out, and if good, merge it into master. Then you deploy the software from master. We have been using branches under the hood. Let's now lift the hood.


branch

Branches can also be created manually, and they are a useful way of organizing unfinished changes.

The branch command has two forms. The first:

git branch

simply lists all of the branches in your local repository. If you run it without having created any branches, it will list only one, called master. This is the default branch. You have also seen the use of git branch -avv to show all branches (even remote ones).

The other form creates a branch with a given name:

$ git branch my-new-branch

It's important to note that the other branch is not active. If you make changes, they will still apply to the master branch, not my-new-branch. That is, after executing the git branch my-new-branch command you're still on the master branch and not the my-new-branch branch. To change this, you need the next command.


checkout

Checkout switches the active branch. Since branches can have different changes, checkout may make the working directory look very different. For instance, if you have added new files to one branch, and then check another branch out, those files will no longer show up in the directory. They are still stored in the .git folder, but since they only exist in the other branch, they cannot be accessed until you check out the original branch.

# Example $ git checkout my-new-branch

You can combine creating a new branch and checking it out with the shortcut:

# Example $ git checkout -b my-new-branch

Ok so lets try this out on our repository.


In [3]:
%%bash
cd /tmp/playground
git branch mybranch1

See what branches we have created.


In [4]:
%%bash
cd /tmp/playground
git branch


* master
  mybranch1

Jump onto the mybranch1 branch...


In [5]:
%%bash
cd /tmp/playground
git checkout mybranch1
git branch


  master
* mybranch1
Switched to branch 'mybranch1'

Notice that it is bootstrapped off the master branch and has the same files.


In [6]:
%%bash
cd /tmp/playground
ls


README.md
world.md

Note You could have created this branch using git checkout -b mybranch1.

Now let's check the status of our repo.


In [7]:
%%bash
cd /tmp/playground
git status


On branch mybranch1
nothing to commit, working tree clean

Alright, so we're on our new branch but we haven't added or modified anything yet; there's nothing to commit.

Adding a file on a new branch

Let's add a new file. Note that this file gets added on this branch only!


In [8]:
%%bash
cd /tmp/playground
echo '# Things I wish G.R.R. Martin would say:  Finally updating A Song of Ice and Fire.' > books.md
git status


On branch mybranch1
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	books.md

nothing added to commit but untracked files present (use "git add" to track)

We add the file to the index, and then commit the files to the local repository on the mybranch1 branch.


In [9]:
%%bash
cd /tmp/playground
git add .
git status


On branch mybranch1
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	new file:   books.md


In [10]:
%%bash
cd /tmp/playground
git commit -m "Added another test file to demonstrate git features" -a
git status


[mybranch1 63b0a11] Added another test file to demonstrate git features
 1 file changed, 1 insertion(+)
 create mode 100644 books.md
On branch mybranch1
nothing to commit, working tree clean

At this point, we have committed a new file (books.md) to our new branch in our local repo. Our remote repo is still not aware of this new file (or branch). In fact, our master branch is still not really aware of this file.

Note: There are really two options at this point:

  1. Push the current branch to our upstream repo. This would correspond to a "long-lived" branch. You may want to do this if you have a version of your code that you are maintaining.
  2. Merge the new branch into the local master branch. This will happen much more frequently than the first option. You'll be creating branches all the time for little bug fixes and features. You don't necessary want such branches to be "long-lived". Once your feature is ready, you'll merge the feature branch into the master branch, stage, commit, and push (all on master). Then you'll delete the "short-lived" feature branch.

We'll continue with the first option for now and discuss the other option later.

Long-lived branches

Ok we have committed. Lets try to push!


In [11]:
%%bash
cd /tmp/playground
git push


fatal: The current branch mybranch1 has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin mybranch1

Fail! Why? Because Git didn't know what to push to on origin (the name of our remote repo) and didn't want to assume we wanted to call the branch mybranch1 on the remote. We need to tell that to Git explicitly (just like it tells us to).


In [12]:
%%bash
cd /tmp/playground
git push --set-upstream origin mybranch1


Branch mybranch1 set up to track remote branch mybranch1 from origin.
To https://github.com/dsondak/playground.git
 * [new branch]      mybranch1 -> mybranch1

Aha, now we have both a remote and a local for mybranch1


In [17]:
%%bash
cd /tmp/playground
git branch -avv


  master                   b3bd0fa [origin/master] Merge remote-tracking branch 'course/master'
* mybranch1                63b0a11 [origin/mybranch1] Added another test file to demonstrate git features
  remotes/origin/HEAD      -> origin/master
  remotes/origin/master    b3bd0fa Merge remote-tracking branch 'course/master'
  remotes/origin/mybranch1 63b0a11 Added another test file to demonstrate git features

We make sure we are back on master


In [18]:
%%bash
cd /tmp/playground
git checkout master


Your branch is up-to-date with 'origin/master'.
Switched to branch 'master'

Short-lived branches

Now we'll look into option 2 above. Suppose we want to add a feature to our repo. We'll create a new branch to work on that feature, but we don't want this branch to be long-lived. Here's how we can accomplish that.

We'll go a little faster this time since you've seen all these commands before. Even though we're going a little faster this time, make sure you understand what you're doing! Don't just copy and paste!!


In [27]:
%%bash
cd /tmp/playground
git checkout -b feature-branch


Switched to a new branch 'feature-branch'

In [28]:
%%bash
cd /tmp/playground
git branch


* feature-branch
  master
  mybranch1

In [29]:
%%bash
cd /tmp/playground
echo '# The collected works of G.R.R. Martin.' > feature.txt

In [30]:
%%bash
cd /tmp/playground
git status


On branch feature-branch
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	feature.txt

nothing added to commit but untracked files present (use "git add" to track)

In [31]:
%%bash
cd /tmp/playground
git add feature.txt
git commit -m 'George finished his books!'


[feature-branch c9c46e7] George finished his books!
 1 file changed, 1 insertion(+)
 create mode 100644 feature.txt

At this point, we've committed our new feature to our feature branch in our local repo. Presumably it's all tested and everything is working nicely. We'd like to merge it into our master branch now. First, we'll switch to the master branch.


In [32]:
%%bash
cd /tmp/playground
git checkout master
ls


Your branch is up-to-date with 'origin/master'.
README.md
world.md
Switched to branch 'master'

The master branch doesn't have any idea about our new feature yet! We should merge the feature branch into the master branch.


In [33]:
%%bash
cd /tmp/playground
git merge feature-branch


Updating b3bd0fa..c9c46e7
Fast-forward
 feature.txt | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 feature.txt

In [34]:
%%bash
cd /tmp/playground
git status
ls


On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)
nothing to commit, working tree clean
README.md
feature.txt
world.md

Now our master branch is up to date with our feature branch. We can now delete our feature branch since it is no longer relevant.


In [35]:
%%bash
cd /tmp/playground
git branch -d feature-branch


Deleted branch feature-branch (was c9c46e7).

Finally, let's push the changes to our remote repo.


In [36]:
%%bash
cd /tmp/playground
git push


To https://github.com/dsondak/playground.git
   b3bd0fa..c9c46e7  master -> master

Great, so now you have a basic understanding of how to work with branches. There is much more to learn, but these commands should get you going. You should really familiarize yourself with Chapter 3 of the Git book for more details and workflow ideas.

Merge Conflicts

Many of you have already experience merge conflicts. The first hurdle to overcome is to learn how to use vim, which you did in the last lecture. Now we will discuss how to deal with the merge conflict.

First, let's pretend that there are two different developers, Sally and Joe.

If you are currently in your playground directory, please go up one directory (i.e. cd ..). You are going to pretend to be two different developers; Sally and Joe.

First, create two new directories; one for Joe and one for Sally.


In [30]:
%%bash
cd /tmp
mkdir Joe
mkdir Sally

Now, Joe and Sally both clone your playground repo.


In [31]:
%%bash
cd /tmp/Joe
git clone https://github.com/dsondak/playground.git


Cloning into 'playground'...

In [32]:
%%bash
cd /tmp/Sally
git clone https://github.com/dsondak/playground.git


Cloning into 'playground'...

At this point, Joe and Sally each have a clone of the playground project. They will now each make changes to the same file. We'll work with Sally first (since we're already in her directory).


In [33]:
%%bash
cd /tmp/Sally
cd playground
echo '# A Project by Sally' >> intro.md
cat intro.md


# A Project by Sally

Sally is happy with her changes and now decides to commit them to her local repo.


In [34]:
%%bash
cd /tmp/Sally/playground
git add intro.md
git commit -m 'Attributed the test file to Sally.'


[master 2658cab] Attributed the test file to Sally.
 1 file changed, 1 insertion(+)
 create mode 100644 intro.md

At the same time, Joe has made some changes as well to the same file.


In [35]:
%%bash
cd /tmp/Joe/playground
echo '# A Project by Joe' >> intro.md
cat intro.md
git add intro.md
git commit -m 'Attributed the test file to Joe.'


# A Project by Joe
[master 3b934ee] Attributed the test file to Joe.
 1 file changed, 1 insertion(+)
 create mode 100644 intro.md

Now the local repositories for Joe and Sally have different histories! Suppose Sally pushes her changes first.


In [36]:
%%bash
cd /tmp/Sally/playground
git push


To https://github.com/dsondak/playground.git
   999fd74..2658cab  master -> master

Everything worked splendidly. Sally goes home for the day.

Joe is a little bit slower and tries to push just after Sally.


In [37]:
%%bash
cd /tmp/Joe/playground
git push


To https://github.com/dsondak/playground.git
 ! [rejected]        master -> master (fetch first)
error: failed to push some refs to 'https://github.com/dsondak/playground.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

Joe realizes that he's made a mistake. Always fetch and merge (or pull) from the remote repo before doing your work for the day or pushing your recent changes. However, he's a little nervous since it only took him a minute to make his changes to hello.md. He realizes that someone else probably did a push in the meantime. Nevertheless, he proceeds.


In [38]:
%%bash
cd /tmp/Joe/playground
git fetch
git merge origin/master


Auto-merging intro.md
CONFLICT (add/add): Merge conflict in intro.md
Automatic merge failed; fix conflicts and then commit the result.
From https://github.com/dsondak/playground
   999fd74..2658cab  master     -> origin/master

There is a conflict in intro.md and Git can't figure out how to resolve the conflict automatically. It doesn't know who's right. Instead, Git produces a file that contains information about the conflict.


In [39]:
%%bash
cd /tmp/Joe/playground
cat intro.md


<<<<<<< HEAD
# A Project by Joe
=======
# A Project by Sally
>>>>>>> origin/master

Joe knows that Sally is working on the same project as him (they're teammates) so he's not alarmed. He could contact her about the conflict, but in this case he knows exactly what to do.

Note: Joe will use Linux terminal commands but you should feel free to use the vim text editor (or some another text editor of your choice). Remember, jupyter can't handle text editors.


In [40]:
%%bash
cd /tmp/Joe/playground
echo '# Project by Sally and Joe' > intro.md
cat intro.md


# Project by Sally and Joe

Now Joe needs to stage (add) and commit intro.md to fix the merge conflict.


In [41]:
%%bash
cd /tmp/Joe/playground
git commit -a -m 'Shared attribution between Joe and Sally.'


[master 51c6b05] Shared attribution between Joe and Sally.

Finally, Joe is ready to push the changes back to the upstream repository.


In [42]:
%%bash
cd /tmp/Joe/playground
git push


To https://github.com/dsondak/playground.git
   2658cab..51c6b05  master -> master

The merge conflict has been resolved! Of course, Sally doesn't yet know about what just happened. She needs to fetch and merge to get the updates.


In [43]:
%%bash
cd /tmp/Sally/playground
git fetch
git merge origin/master


Updating 2658cab..51c6b05
Fast-forward
 intro.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
From https://github.com/dsondak/playground
   2658cab..51c6b05  master     -> origin/master

So what actually happened here?

And as expected, Git knows how to resolve this conflict. The reason Git can resolve this conflict even though the files differ on the same line is that Git has the commit history, too. When Sally made their original commit, they were given a commit hash (starting with 2658cab). When Joe resolved the merge conflict, Joe created a new commit hash (51c6b05) which unified the changes in commit 2658cab (Sally's original commit) and commit 3b934ee (Joe's original commit). Then, when Joe pushed, all of this information was given to the upstream repository. So Git has a record stating that the merge resolution commit 51c6b05 is a subsequent commit to Sally's original changes in 2658cab. When Sally fetched the upstream repo, Sally got this information, too. So when Sally executed a merge, Sally was merging a predecessor (2658cab) with its direct successor (51c6b05), which Git handles simply by using the successor.

The tricky conflict resolution that Joe did was effectively a way of taking two separate branches and tying them together.

One more note on binary files

A common problem that students ran into last week occurred when they tried to update their local repo from the main course repo after I had made a change to the .pdf lecture slides.

One of the big lessons here is that versioning binary files with Git requires some special tools. In this case, the binary file was a .pdf document. In another case it may be a executable file.

The reason why binary files are difficult to version is because Git must store the entire file again after each commit. This is essentially a consequence of the fact that there is no clear way to diff binary files. Hence, the merging operation has problems.

There is extensive information around for the special tools Git has for working with binary files. For the particular case that many of you ran into last week, you can use some special arguments to the git checkout command. We'll say more about git checkout later. A nice discussion can be found at https://stackoverflow.com/questions/278081/resolving-a-git-conflict-with-binary-files.

My recommendation is that you try to stay away from versioning binary files. I put them up on Git because I will not be changing the lectures slides much (if at all) over the course of the semester and because the lecture slides do not take up much space (and will therefore not have much of an effect on the speed of Git).


Git habits

Commit early, commit often.

Git is more effective when used at a fine granularity. For starters, you can't undo what you haven't committed, so committing lots of small changes makes it easier to find the right rollback point. Also, merging becomes a lot easier when you only have to deal with a handful of conflicts.

Commit unrelated changes separately.

Identifying the source of a bug or understanding the reason why a particular piece of code exists is much easier when commits focus on related changes. Some of this has to do with simplifying commit messages and making it easier to look through logs, but it has other related benefits: commits are smaller and simpler, and merge conflicts are confined to only the commits which actually have conflicting code.

Do not commit binaries and other temporary files.

Git is meant for tracking changes. In nearly all cases, the only meaningful difference between the contents of two binaries is that they are different. If you change source files, compile, and commit the resulting binary, git sees an entirely different file. The end result is that the git repository (which contains a complete history, remember) begins to become bloated with the history of many dissimilar binaries. Worse, there's often little advantage to keeping those files in the history. An argument can be made for periodically snapshotting working binaries, but things like object files, compiled python files, and editor auto-saves are basically wasted space.

Ignore files which should not be committed

Git comes with a built-in mechanism for ignoring certain types of files. Placing filenames or wildcards in a .gitignore file placed in the top-level directory (where the .git directory is also located) will cause git to ignore those files when checking file status. This is a good way to ensure you don't commit the wrong files accidentally, and it also makes the output of git status somewhat cleaner.

Always make a branch for new changes

While it's tempting to work on new code directly in the master branch, it's usually a good idea to create a new one instead, especially for team-based projects. The major advantage to this practice is that it keeps logically disparate change sets separate. This means that if two people are working on improvements in two different branches, when they merge, the actual workflow is reflected in the git history. Plus, explicitly creating branches adds some semantic meaning to your branch structure. Moreover, there is very little difference in how you use git.

Write good commit messages

I cannot understate the importance of this.

Seriously. Write good commit messages.