J.R. Johansson (jrjohansson at gmail.com)
The latest version of this IPython notebook lecture is available at http://github.com/jrjohansson/scientific-python-lectures.
The other notebooks in this lecture series are indexed at http://jrjohansson.github.io.
In [13]:
from IPython.display import Image
In any software development, one of the most important tools are revision control software (RCS).
They are used in virtually all software development and in all environments, by everyone and everywhere (no kidding!)
RCS can used on almost any digital content, so it is not only restricted to software development, and is also very useful for manuscript files, figures, data and notebooks!
In an RCS, the source code or digital content is stored in a repository.
The repository does not only contain the latest version of all files, but the complete history of all changes to the files since they were added to the repository.
A user can checkout the repository, and obtain a local working copy of the files. All changes are made to the files in the local working directory, where files can be added, removed and updated.
When a task has been completed, the changes to the local files are commited (saved to the repository).
If someone else has been making changes to the same files, a conflict can occur. In many cases conflicts can be resolved automatically by the system, but in some cases we might manually have to merge different changes together.
It is often useful to create a new branch in a repository, or a fork or clone of an entire repository, when we doing larger experimental development. The main branch in a repository is called often master or trunk. When work on a branch or fork is completed, it can be merged in to the master branch/repository.
With distributed RCSs such as GIT or Mercurial, we can pull and push changesets between different repositories. For example, between a local copy of there repository to a central online reposistory (for example on a community repository host site like github.com).
git
) : http://git-scm.com/hg
) : http://mercurial.selenic.com/In the rest of this lecture we will look at git
, although hg
is just as good and work in almost exactly the same way.
On Linux:
$ sudo apt-get install git
On Mac (with macports):
$ sudo port install git
The first time you start to use git, you'll need to configure your author information:
$ git config --global user.name 'Robert Johansson'
$ git config --global user.email robert@riken.jp
To create a brand new empty repository, we can use the command git init repository-name
:
In [4]:
# create a new git repository called gitdemo:
!git init gitdemo
If we want to fork or clone an existing repository, we can use the command git clone repository
:
In [5]:
!git clone https://github.com/qutip/qutip
Git clone can take a URL to a public repository, like above, or a path to a local directory:
In [6]:
!git clone gitdemo gitdemo2
We can also clone private repositories over secure protocols such as SSH:
$ git clone ssh://myserver.com/myrepository
Using the command git status
we get a summary of the current status of the working directory. It shows if we have modified, added or removed files.
In [34]:
!git status
In this case, only the current ipython notebook has been added. It is listed as an untracked file, and is therefore not in the repository yet.
To add a new file to the repository, we first create the file and then use the git add filename
command:
In [35]:
%%file README
A file with information about the gitdemo repository.
In [36]:
!git status
After having added the file README
, the command git status
list it as an untracked file.
In [37]:
!git add README
In [38]:
!git status
Now that it has been added, it is listed as a new file that has not yet been commited to the repository.
In [39]:
!git commit -m "Added a README file" README
In [40]:
!git add Lecture-7-Revision-Control-Software.ipynb
In [41]:
!git commit -m "added notebook file" Lecture-7-Revision-Control-Software.ipynb
In [42]:
!git status
After committing the change to the repository from the local working directory, git status
again reports that working directory is clean.
When files that is tracked by GIT are changed, they are listed as modified by git status
:
In [43]:
%%file README
A file with information about the gitdemo repository.
A new line.
In [44]:
!git status
Again, we can commit such changes to the repository using the git commit -m "message"
command.
In [45]:
!git commit -m "added one more line in README" README
In [46]:
!git status
To remove file that has been added to the repository, use git rm filename
, which works similar to git add filename
:
In [47]:
%%file tmpfile
A short-lived file.
Add it:
In [48]:
!git add tmpfile
In [49]:
!git commit -m "adding file tmpfile" tmpfile
Remove it again:
In [51]:
!git rm tmpfile
In [52]:
!git commit -m "remove file tmpfile" tmpfile
The messages that are added to the commit command are supposed to give a short (often one-line) description of the changes/additions/deletions in the commit. If the -m "message"
is omitted when invoking the git commit
message an editor will be opened for you to type a commit message (for example useful when a longer commit message is requried).
We can look at the revision log by using the command git log
:
In [53]:
!git log
In the commit log, each revision is shown with a timestampe, a unique has tag that, and author information and the commit message.
All commits results in a changeset, which has a "diff" describing the changes to the file associated with it. We can use git diff
so see what has changed in a file:
In [54]:
%%file README
A file with information about the gitdemo repository.
README files usually contains installation instructions, and information about how to get started using the software (for example).
In [55]:
!git diff README
That looks quite cryptic but is a standard form for describing changes in files. We can use other tools, like graphical user interfaces or web based systems to get a more easily understandable diff.
In github (a web-based GIT repository hosting service) it can look like this:
In [24]:
Image(filename='images/github-diff.png')
Out[24]:
To discard a change (revert to the latest version in the repository) we can use the checkout
command like this:
In [58]:
!git checkout -- README
In [59]:
!git status
If we want to get the code for a specific revision, we can use "git checkout" and giving it the hash code for the revision we are interested as argument:
In [60]:
!git log
In [61]:
!git checkout 1f26ad648a791e266fbb951ef5c49b8d990e6461
Now the content of all the files like in the revision with the hash code listed above (first revision)
In [62]:
!cat README
We can move back to "the latest" (master) with the command:
In [63]:
!git checkout master
In [64]:
!cat README
In [65]:
!git status
Tags are named revisions. They are useful for marking particular revisions for later references. For example, we can tag our code with the tag "paper-1-final" when when simulations for "paper-1" are finished and the paper submitted. Then we can always retreive the exactly the code used for that paper even if we continue to work on and develop the code for future projects and papers.
In [66]:
!git log
In [67]:
!git tag -a demotag1 -m "Code used for this and that purpuse"
In [68]:
!git tag -l
In [69]:
!git show demotag1
To retreive the code in the state corresponding to a particular tag, we can use the git checkout tagname
command:
$ git checkout demotag1
With branches we can create diverging code bases in the same repository. They are for example useful for experimental development that requires a lot of code changes that could break the functionality in the master branch. Once the development of a branch has reached a stable state it can always be merged back into the trunk. Branching-development-merging is a good development strategy when serveral people are involved in working on the same code base. But even in single author repositories it can often be useful to always keep the master branch in a working state, and always branch/fork before implementing a new feature, and later merge it back into the main trunk.
In GIT, we can create a new branch like this:
In [70]:
!git branch expr1
We can list the existing branches like this:
In [71]:
!git branch
And we can switch between branches using checkout
:
In [81]:
!git checkout expr1
Make a change in the new branch.
In [74]:
%%file README
A file with information about the gitdemo repository.
README files usually contains installation instructions, and information about how to get started using the software (for example).
Experimental addition.
In [76]:
!git commit -m "added a line in expr1 branch" README
In [77]:
!git branch
In [78]:
!git checkout master
In [79]:
!git branch
We can merge an existing branch and all its changesets into another branch (for example the master branch) like this:
First change to the target branch:
In [82]:
!git checkout master
In [83]:
!git merge expr1
In [84]:
!git branch
We can delete the branch expr1
now that it has been merged into the master:
In [85]:
!git branch -d expr1
In [86]:
!git branch
In [88]:
!cat README
If the respository has been cloned from another repository, for example on github.com, it automatically remembers the address of the parant repository (called origin):
In [5]:
!git remote
In [4]:
!git remote show origin
We can retrieve updates from the origin repository by "pulling" changesets from "origin" to our repository:
In [6]:
!git pull origin
We can register addresses to many different repositories, and pull in different changesets from different sources, but the default source is the origin from where the repository was first cloned (and the work origin could have been omitted from the line above).
After making changes to our local repository, we can push changes to a remote repository using git push
. Again, the default target repository is origin
, so we can do:
In [7]:
!git status
In [8]:
!git add Lecture-7-Revision-Control-Software.ipynb
In [9]:
!git commit -m "added lecture notebook about RCS" Lecture-7-Revision-Control-Software.ipynb
In [11]:
!git push
Github.com is a git repository hosting site that is very popular with both open source projects (for which it is free) and private repositories (for which a subscription might be needed).
With a hosted repository it easy to collaborate with colleagues on the same code base, and you get a graphical user interface where you can browse the code and look at commit logs, track issues etc.
Some good hosted repositories are
In [14]:
Image(filename='images/github-project-page.png')
Out[14]:
There are also a number of graphical users interfaces for GIT. The available options vary a little bit from platform to platform:
In [15]:
Image(filename='images/gitk.png')
Out[15]: