Version control

Have you ever:

  • Made a change to code, realised it was a mistake and wanted to revert back?
  • Lost code or had a backup that was too old?
  • Had to maintain multiple versions of a product?
  • Wanted to see the difference between two (or more) versions of your code?
  • Wanted to prove that a particular change broke or fixed a piece of code?
  • Wanted to review the history of some code?
  • Wanted to submit a change to someone else's code?
  • Wanted to share your code, or let other people work on your code?
  • Wanted to see how much work is being done, and where, when and by whom?
  • Wanted to experiment with a new feature without interfering with working code?

In these cases, and no doubt others, a version control system should make your life easier.

Getting started with git

Git is distributed revision control system (DVCS)

  • Each copy or "clone" of a repository contains all of that repository's history.

Github

A web service which stores git clones along with extra services like issue tracking and code review.

Because Git is decentralised there needn't be a "blessed" clone, but with Github there almost always is.

git clone

To create your own clone of a repository, simply run:

git clone <repository address>

Let's go ahead and clone https://github.com/pelson/tutorial_classroom to our machines.


In [ ]:
!git clone https://github.com/pelson/tutorial_classroom
%cd tutorial_classroom

We now have a full copy of the tutorial_classroom repository in our current working directory. We could make as many changes as we liked to the repository, commit them, and nobody need know that we've done so - this really is a standalone repository.

git commit

The concept of a commit is fundamental to Git - a commit has a unique ID (SHA1 hash) which is made up of the ID of a single "tree" (basically like a directory), meta-information such as timestamp and author, and importantly contains a pointer to proceeding commit(s).

Therefore a single commit represents a state of a repository and can ultimately reference all other proceeding states. Because of the commit ID being made on the ID of proceeding commit(s), it is not possible to change old commits without affecting later commits.

Committing to your repository requires two phases:

  1. Add new and modified files to the staging area with git add
  2. Commit all staged changes with git commit

At all points, using git status will tell you what state your repository is in, and provide some useful hints with staging.

git log

We can see a list of the commits which have led to the current state of the repository with git log:


In [ ]:
!git log

git branch

With git, branching is made exceptionally easy. A branch is a simple means to make changes to a repository without directly updating the master/main branch(es), making a branch the prefect place to try out new ideas, implement new features, fix bugs, and generally make any changes.

If a circle is used to represent a commit, then a branch, from any point, allows one to add commits without modifying other branches:

We can create a branch from any point in our history, or by default, from the last commit of the current branch (known as HEAD).

git branch <branch name> <reference point>

Switching between branches is a matter of simply doing git checkout <branch name>. Finding out what branches are available is just git branch.

git merge

After separating work into individual branches, we inevitably will want to bring successful changes back into the master (or any other) branch. This is known as merging.

Sometimes merging is as simple as appending commits to the target branch, known as fast-forward merging. However, it is often the case that both branches have moved forwards, requiring what is known as a "3-way merge", as demonstrated in the following image:

Luckily, git takes care of this merging for you - to merge another branch into the current branch, simply git merge <name of other branch> will start the merge process.

git remote

As we already know, the repository that we have cloned is a complete copy of the original repository, but there are also complete copies of the repository elsewhere, which sometimes are convenient to reference.

These references are known as remotes in git, and by cloning https://github.com/pelson/tutorial_classroom earlier, git has already made github's version of the repository available as a remote. We can see this with:


In [ ]:
!git remote --verbose

We can add remotes with

git remote add <new remote name> <URI>

and can move existing remotes with

git remote rename <old remote name> <new remote name>

Note: Normal terminology suggests calling the "blessed" github repository "upstream" and your fork "origin".

Pulling it all together

Now, I'll add a file to my local repository to store a list of attendees of this tutorial.

First, I check the current state of the repository with git status:


In [ ]:
!git status

Create the branch

Next, I create a branch called "euroscipy_2014" based from the last commit in git log (aka HEAD), and check it out:


In [ ]:
!git branch euroscipy_2014
!git checkout euroscipy_2014

And just confirm that the status of the repository has changed.


In [ ]:
!git status

Add some content


In [ ]:
%mkdir euroscipy_2014

In [ ]:
%%writefile euroscipy_2014/ABOUT

This directory contains all the attendees of the
"Lessons for a scientific programmer" tutorial held at EuroScipy'14.

In [ ]:
!git status

Add the content to the staging area


In [ ]:
!git add euroscipy_2014/ABOUT
!git status

Commit the content to the branch


In [ ]:
!git commit -m "Added the class attendees folder."

Merging the branch back to master

Simply checking out master, and then merging the euroscipy_2014 branch would be sufficient to pull these changes in to our local master branch. e.g.

git checkout master git merge euroscipy_2014

However, typically this is not what we would do if we wanted these changes to be integrated into the original repository on github.

Instead we would normally make this branch available publicly (on github) and submit a "pull request" (simply a request to merge) for the branch to be merged on the blessed repository.

Creating a PR on github

Forking and setting up your local clone with the new remote

To submit a pull request to merge our branch into the original github repository we need to make our branch available on github. However, since repositories generally have very limited permissions, we do not have sufficient privileges to write to the original github repository.

At this point we are going to need our own copy of the original repository under our own name on github. On github this is known as "forking" and is essentially just a git clone <upstream repo> to your own space on github's servers.

  • Rename the "origin" remote to "upstream" on your local clone.
  • Go to https://github.com/pelson/tutorial_classroom and fork it.
  • Your forked repo's clone URL will now show something like

    https://github.com/<YOUR_USERNAME>/tutorial_classroom.git

    or

    git@github.com:<YOUR USERNAME>/tutorial_classroom.git

    Add this as a remote called "origin" to your local clone.

  • Verify that git remote --verbose shows the expected URLs and that the command

    git fetch --all

    successfully runs (effectively fetching any commits which have been applied since the clone was last updated).


In [ ]:
!git remote rename origin upstream
# !git remote add origin git@github.com:<YOUR USERNAME>/tutorial_classroom.git

In [ ]:
!git remote --verbose

Pushing your branch to your fork

We can move local branches to a remote with:

git push <remote name> <branch name>

So to push our branch to our fork:


In [ ]:
!git push origin euroscipy_2014

If this fails, add the --verbose flag and go through the following steps:

  • Check your network - are you connected to the internet!?
  • Check you have got git with github setup correctly.
  • Check you are attempting to push to YOUR fork, not the original (pelson) repo.
  • Raise your hand.

Creating the PR

Now that we've pushed our branch to our fork, we can submit a PR. Got to the front page of your fork, and you should see a new banner has appeared to create a PR to merge your branch back into master of the original repository.

But...

Don't make the PR just yet!

Cleaning up your fork

Sometimes you have branches on your fork which are no longer desired.

Simply doing git branch -d <branch name> (or -D if you know that you really want it deleted) will remove a branch locally, but you cannot run git commands on your fork which is sitting on github's servers. Instead, we have to jump through a hoop and push "nothing" to this branch:

git push origin :euroscipy_2014

This command makes a little more sense when you know that git push accepts the following:

git push origin <local_branch_name>:<remote_branch_name>

But admittedly, it is still fairly obscure.

Exercise

1. Split into groups of 2 or 3 and work on a single machine.

2. Update the git repo with git fetch --all and create a branch called my_euroscipy_group based on upstream/master (which is probably not the current branch).

3. Create a single file euroscipy_2014/<YOUR GITHUB USERNAME>.yaml listing all of your group's github usernames in the form:

members:
 - <username 1>
 - <username 2>

4. Commit the new files to your branch.

5. Push the my_euroscipy_group branch to your fork.

6. Submit a single PR per group to register as having attended this tutorial.

Intro | Next























In [1]:
%run resources/load_style.py