Version control

In a project, either software or documentation/science writing, version control software helps to keeps some order. This is especially important for collaboration involving several people. But can be also vital for personal projects. Keeping track of changes in a text or a software is important in research and development. Being able to go back to a functioning version of software after doing some bad changes to a code, or recovering results obtained before submitting a paper can be tricky without using a version control software.

For this reason, it is a very good idea to learn and immediately apply to your research projects the basics of version control. Especially when you can save remotely your work so that it can be accessed from everywhere and everybody in your collaboration. Also, a remote backup can save you from unexpected crashes of your local computer.

There are two main purposes of version control:

Keep track of changes in the source code.
- Allow reverting back to an older revision if something goes wrong.
- Work on several "branches" of the software concurrently.
- Tags revisions to keep track of which version of the software that was used for what (for example, "release-1.0", "paper-A-final", ...)
Make it possible for serveral people to collaboratively work on the same code base simultaneously.
- Allow many authors to make changes to the code.
- Clearly communicating and visualizing changes in the code base to everyone involved.

Basic principles and terminology

In version control, the source code or digital content is stored in a repository.

The repository does not only contain the latest version of all files, but the complete history of all changes to the files since they were added to the repository.
A user can checkout the repository, and obtain a local working copy of the files. All changes are made to the files in the local working directory, where files can be added, removed and updated.
When a task has been completed, the changes to the local files are commited (saved to the repository).
If someone else has been making changes to the same files, a conflict can occur. In many cases conflicts can be resolved automatically by the system, but in some cases we might manually have to merge different changes together.
It is often useful to create a new branch in a repository, or a fork or clone of an entire repository, when we doing larger experimental development. The main branch in a repository is called often master. When work on a branch or fork is completed, it can be merged in to the master branch/repository.
With distributed version control softwares such as GIT, we can pull and push changesets between different repositories. For example, between a local copy of there repository to a central online reposistory (for example on a community repository host site like github.com).

Some good software

GIT (git) : http://git-scm.com/
Mercurial (hg) : http://mercurial.selenic.com/

In the rest of this lecture we will look at git, although hg is just as good and work in almost exactly the same way.

Installing git

On Linux:

$ sudo apt-get install git

On Mac (with macports):

$ sudo port install git

Configuring the installation

The first time you start to use git, you'll need to configure your author information:

git config --global user.name "your name"
git config --global user.email "your email"
git config --global color.ui "auto"
git config --global core.editor "emacs"

You can change the configurations at any time.

As editor, you can use your favorite editor.



In [1]:

    
%%%bash
git config --list









    



user.email=darioflute@gmail.com
user.name=Dario Fadda
color.ui=auto
core.editor=emacs

Creating and cloning a repository

In this page we will create a repository and work on it to describe the usage of git.

To create a brand new empty repository, we can use the command:

git init repository-name



In [ ]:

    
%%%bash
mkdir planets
cd planets
git init

If we want to fork or clone an existing repository, we can use the command:

git clone repository

Git clone can take a URL to a public repository, i.e.:

git clone https://github.com/TheAlgorithms/Python

or a path to a local directory:



In [ ]:

    
!git clone planets planets2

We can also clone private repositories over secure protocols such as SSH:

git clone ssh://myserver.com/myrepository

Status

Let's go inside the repository we just created.



In [ ]:

    
import os
os.chdir('planets')

Using the command git status we get a summary of the current status of the working directory. It shows if we have modified, added or removed files.



In [ ]:

    
!git status

In this case, the repository is still empty.

Adding files and committing changes

To add a new file to the repository, we first create the file and then use the git add filename command:



In [ ]:

    
%%file mars.txt

Mars is the 4th planet from the Sun



In [ ]:

    
!git status

After having added the file mars.txt, the command git status list it as an untracked file.



In [ ]:

    
!git add mars.txt



In [ ]:

    
!git status

Now that it has been added, it is listed as a new file that has not yet been commited to the repository.



In [ ]:

    
!git commit -m "Some notes on Mars" mars.txt

To see the history of changes, we use the command:

git log



In [ ]:

    
!git log

Commiting changes

When files that is tracked by GIT are changed, they are listed as modified by git status:



In [ ]:

    
%%file README

A file with information about the gitdemo repository.

A new line.



In [ ]:

    
!git status

Again, we can commit such changes to the repository using the git commit -m "message" command.



In [ ]:

    
!git commit -m "added one more line in README" README



In [ ]:

    
!git status

Activity

Create a file called jupiter.txt, add it and commit.

Solution



In [ ]:

    
%%%bash
echo "Jupiter is the biggest planet in the solar system" > jupiter.txt
git add jupiter.txt
git commit -m "added file with jupiter notes"
git status

Removing files

To remove file that has been added to the repository, use git rm filename, which works similar to git add filename:



In [ ]:

    
%%file tmpfile

A short-lived file.

Add it:



In [ ]:

    
!git add tmpfile



In [ ]:

    
!git commit -m "adding file tmpfile" tmpfile

Remove it again:



In [ ]:

    
!git rm tmpfile



In [ ]:

    
!git commit -m "remove file tmpfile" tmpfile

Commit logs

The messages that are added to the commit command are supposed to give a short (often one-line) description of the changes/additions/deletions in the commit. If the -m "message" is omitted when invoking the git commit message an editor will be opened for you to type a commit message (for example useful when a longer commit message is required).

We can look at the revision log by using the command git log:



In [ ]:

    
!git log

In the commit log, each revision is shown with a timestampe, a unique has tag that, and author information and the commit message.

Diffs

All commits results in a changeset, which has a "diff" describing the changes to the file associated with it. We can use git diff so see what has changed in a file:



In [ ]:

    
%%%bash
echo "The color of this planet is red." >> mars.txt
git status



In [ ]:

    
!git diff mars.txt



In [ ]:

    
!git commit -m "new Mars info"

The error is: we have fist to add the filename in then commit



In [ ]:

    
%%%bash
git add mars.txt
git commit -m "new Mars info"
git status

That looks quite cryptic but is a standard form for describing changes in files. We can use other tools, like graphical user interfaces or web based systems to get a more easily understandable diff.

Using the software gitk, the view is more clear:

To see the latest change or see the changes with as little information as possible, we can use the following commands:



In [ ]:

    
%%%bash
git log -1  # last change
echo ""
git log --oneline # short output

To visualize the latest changes (with diff), we can use the command:



In [ ]:

    
!git diff HEAD~1 mars.txt  # Difference with previous version of the file

In particular we refer to the previous versions using HEAD, HEAD~1, HEAD~2:

HEAD the latest committed change
HEAD~1 the previous committed change
HEAD~2 the change before the previous commit, and so on...

So, if we add a new information to the mars file:



In [ ]:

    
%%%bash
echo "Mars has two nice moons." >> mars.txt
git add mars.txt
git commit -m "even more info about Mars"
git status

We can see the difference with respect to 2 versions ago:



In [ ]:

    
!git diff HEAD~2 mars.txt

Discard changes in the working directory

To discard a change (revert to the latest version in the repository) we can use the checkout command.

So, let's do a change in the repository and go back to the last commit.



In [ ]:

    
%%%bash
echo "Many space probes crashed on Mars." >> mars.txt
cat mars.txt



In [ ]:

    
%%%bash
git checkout HEAD mars.txt
git status
cat mars.txt

Checking out old revisions

If we want to get the code for a specific revision, we can use "git checkout" and giving it the hash code for the revision we are interested as argument:



In [ ]:

    
!git log



In [ ]:

    
!git checkout ef01389982dae3dc8583ff10291df34060c30bdd

Now the content of all the files like in the revision with the hash code listed above (first revision)



In [ ]:

    
!cat mars.txt

We can move back to "the latest" (master) with the command:



In [ ]:

    
!git checkout master



In [ ]:

    
!cat mars.txt



In [ ]:

    
!git status

Ignoring

If we want to ignore files or directories created in our repository, we can list them in the .gitignore file.

This is particularly useful when we compile codes or create temporary files we are not interested in saving. For instance, if we don't want to save the results directory or any file that ends in .dat, we can use the following strategy:



In [ ]:

    
%%%bash
mkdir results
touch results/a.out
touch results/b.out
git status



In [ ]:

    
%%%bash
echo 'results/
*.dat' > .gitignore
git status



In [ ]:

    
%%%bash
git add .gitignore
git commit -m "Added gitignore file"
git status

Activity

Overwrite and recover jupiter.txt

Solution



In [ ]:

    
%%%bash
echo "nothing" > jupiter.txt
git status
cat jupiter.txt



In [ ]:

    
%%%bash
git checkout HEAD jupiter.txt
cat jupiter.txt

Tagging and branching

Branches

With branches we can create diverging code bases in the same repository. They are for example useful for experimental development that requires a lot of code changes that could break the functionality in the master branch. Once the development of a branch has reached a stable state it can always be merged back into the trunk. Branching-development-merging is a good development strategy when serveral people are involved in working on the same code base. But even in single author repositories it can often be useful to always keep the master branch in a working state, and always branch/fork before implementing a new feature, and later merge it back into the main trunk.

In GIT, we can create a new branch like this:



In [ ]:

    
!git branch expr1

We can list the existing branches like this:



In [ ]:

    
!git branch

And we can switch between branches using checkout:



In [ ]:

    
!git checkout expr1

Make a change in the new branch.



In [ ]:

    
%%file jupiter.txt

Jupiter is the biggest planet in the solar system
It is considered a failed star.



In [ ]:

    
!git commit -m "added a line in expr1 branch" jupiter.txt



In [ ]:

    
!git branch



In [ ]:

    
!cat jupiter.txt



In [ ]:

    
!git checkout master



In [ ]:

    
!cat jupiter.txt



In [ ]:

    
!git branch

We can merge an existing branch and all its changesets into another branch (for example the master branch) like this:

First change to the target branch:



In [ ]:

    
!git checkout master



In [ ]:

    
!git merge expr1



In [ ]:

    
!git branch



In [ ]:

    
!cat jupiter.txt

We can delete the branch expr1 now that it has been merged into the master:



In [ ]:

    
!git branch -d expr1



In [ ]:

    
!git branch



In [ ]:

    
!cat jupiter.txt

Remote repositories

In our case, we need to sign up for an account on http://github.com

Once entered in your account, you can create your first repository with the name of the directory containing your project. In this case: planets.

Since our repository already exists, we can simply link it and push to master. The second step has to be done interactively, since it asks for username and password.

Creating a new repository

To create a new repository in github, we will press the button new repository in the opening page:

At this point, write the name of the repository in the next page:

After clicking "Create repository", we are lead in a third page which contains the URL of our repository:

pulling and pushing changesets between repositories

Once the repository is created we will have to link our local repository to the remote one and push it to the remote repository. We will label the URL as origin.



In [ ]:

    
!git remote add origin https://github.com/darioflute/planets.git

Then, we will push everything to the remote repository. This operation has to be done interactively, since it requires typing username and password:

git push -u origin master

Username for 'https://github.com': darioflute
Password for 'https://darioflute@github.com': 
Counting objects: 25, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (21/21), done.
Writing objects: 100% (25/25), 2.42 KiB | 0 bytes/s, done.
Total 25 (delta 3), reused 0 (delta 0)
remote: Resolving deltas: 100% (3/3), done.
To https://github.com/darioflute/planets.git
 * [new branch]      master -> master
Branch master set up to track remote branch master from origin.

To check the address of the parent repository, labelled origin:



In [ ]:

    
!git remote -v



In [ ]:

    
!git remote show origin

automatic username

To avoid typing each time your username, you can edit the file .git/config

by adding your username to the url:

url = https://username@repository-url.com

pull

We can retrieve updates from the origin repository by "pulling" changesets from "origin" to our repository:



In [ ]:

    
!git pull origin

We can register addresses to many different repositories, and pull in different changesets from different sources, but the default source is the origin from where the repository was first cloned (and the work origin could have been omitted from the line above).

push

After making changes to our local repository, we can push changes to a remote repository using git push. Again, the default target repository is origin, so we can do:



In [ ]:

    
!git status



In [ ]:

    
%%%bash
echo 'A few notes about planets' > README



In [ ]:

    
!git add README



In [ ]:

    
!git commit -m "added README" README

At this point, we can push again the update to the remote repository:

git push origin master

Delete a remote repository

By far the easiest way is to log in your gitHub account:

Click to your repository: https://github.com/yourUsername/yourRepository, for example:

https://github.com/darioflute/planets

Then in the main toolbar of github click on Settings, or directly type the URL:

https://github.com/darioflute/planets/settings

Scroll down and you will find Delete this repository button.

Hosted repositories

Github.com is a git repository hosting site that is very popular with both open source projects (for which it is free) and private repositories (for which a subscription might be needed).

With a hosted repository it easy to collaborate with colleagues on the same code base, and you get a graphical user interface where you can browse the code and look at commit logs, track issues etc.

Some good hosted repositories are

Github : http://www.github.com
Bitbucket: http://www.bitbucket.org

Graphical user interfaces

There are also a number of graphical users interfaces for GIT. The available options vary a little bit from platform to platform:

http://git-scm.com/downloads/guis

gitk is a popular browser for GIT

git-gui is a tool to use GIT in a graphical way

They can be easily installed in linux with apt-get or synaptic.

Software	Version
Python	2.7.12 64bit [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
IPython	5.1.0
OS	Linux 4.4.0 57 generic x86_64 with debian jessie sid
version_information	1.0.3
Wed Jan 04 16:28:38 2017 CST

Version control

There are two main purposes of version control:

Basic principles and terminology

Some good software

Installing git

Configuring the installation

Creating and cloning a repository

Status

Adding files and committing changes

Commiting changes

Activity

Solution

Removing files

Commit logs

Diffs

Discard changes in the working directory

Checking out old revisions

Ignoring

Activity

Solution

Tagging and branching

Tags

Branches

Remote repositories

Creating a new repository

pulling and pushing changesets between repositories

automatic username

pull

push

Delete a remote repository

Hosted repositories

Graphical user interfaces

Further reading