In a project, either software or documentation/science writing, version control software helps to keeps some order. This is especially important for collaboration involving several people. But can be also vital for personal projects. Keeping track of changes in a text or a software is important in research and development. Being able to go back to a functioning version of software after doing some bad changes to a code, or recovering results obtained before submitting a paper can be tricky without using a version control software.
For this reason, it is a very good idea to learn and immediately apply to your research projects the basics of version control. Especially when you can save remotely your work so that it can be accessed from everywhere and everybody in your collaboration. Also, a remote backup can save you from unexpected crashes of your local computer.
In version control, the source code or digital content is stored in a repository.
The repository does not only contain the latest version of all files, but the complete history of all changes to the files since they were added to the repository.
A user can checkout the repository, and obtain a local working copy of the files. All changes are made to the files in the local working directory, where files can be added, removed and updated.
When a task has been completed, the changes to the local files are commited (saved to the repository).
If someone else has been making changes to the same files, a conflict can occur. In many cases conflicts can be resolved automatically by the system, but in some cases we might manually have to merge different changes together.
It is often useful to create a new branch in a repository, or a fork or clone of an entire repository, when we doing larger experimental development. The main branch in a repository is called often master. When work on a branch or fork is completed, it can be merged in to the master branch/repository.
With distributed version control softwares such as GIT, we can pull and push changesets between different repositories. For example, between a local copy of there repository to a central online reposistory (for example on a community repository host site like github.com).
git
) : http://git-scm.com/hg
) : http://mercurial.selenic.com/In the rest of this lecture we will look at git
, although hg
is just as good and work in almost exactly the same way.
On Linux:
$ sudo apt-get install git
On Mac (with macports):
$ sudo port install git
The first time you start to use git, you'll need to configure your author information:
git config --global user.name "your name"
git config --global user.email "your email"
git config --global color.ui "auto"
git config --global core.editor "emacs"
You can change the configurations at any time.
As editor, you can use your favorite editor.
In [1]:
%%%bash
git config --list
In this page we will create a repository and work on it to describe the usage of git.
To create a brand new empty repository, we can use the command:
git init repository-name
In [ ]:
%%%bash
mkdir planets
cd planets
git init
If we want to fork or clone an existing repository, we can use the command:
git clone repository
Git clone can take a URL to a public repository, i.e.:
git clone https://github.com/TheAlgorithms/Python
or a path to a local directory:
In [ ]:
!git clone planets planets2
We can also clone private repositories over secure protocols such as SSH:
git clone ssh://myserver.com/myrepository
In [ ]:
import os
os.chdir('planets')
Using the command git status
we get a summary of the current status of the working directory. It shows if we have modified, added or removed files.
In [ ]:
!git status
In this case, the repository is still empty.
To add a new file to the repository, we first create the file and then use the git add filename
command:
In [ ]:
%%file mars.txt
Mars is the 4th planet from the Sun
In [ ]:
!git status
After having added the file mars.txt, the command git status
list it as an untracked file.
In [ ]:
!git add mars.txt
In [ ]:
!git status
Now that it has been added, it is listed as a new file that has not yet been commited to the repository.
In [ ]:
!git commit -m "Some notes on Mars" mars.txt
To see the history of changes, we use the command:
git log
In [ ]:
!git log
When files that is tracked by GIT are changed, they are listed as modified by git status
:
In [ ]:
%%file README
A file with information about the gitdemo repository.
A new line.
In [ ]:
!git status
Again, we can commit such changes to the repository using the git commit -m "message"
command.
In [ ]:
!git commit -m "added one more line in README" README
In [ ]:
!git status
In [ ]:
%%%bash
echo "Jupiter is the biggest planet in the solar system" > jupiter.txt
git add jupiter.txt
git commit -m "added file with jupiter notes"
git status
To remove file that has been added to the repository, use git rm filename
, which works similar to git add filename
:
In [ ]:
%%file tmpfile
A short-lived file.
Add it:
In [ ]:
!git add tmpfile
In [ ]:
!git commit -m "adding file tmpfile" tmpfile
Remove it again:
In [ ]:
!git rm tmpfile
In [ ]:
!git commit -m "remove file tmpfile" tmpfile
The messages that are added to the commit command are supposed to give a short (often one-line) description of the changes/additions/deletions in the commit. If the -m "message"
is omitted when invoking the git commit
message an editor will be opened for you to type a commit message (for example useful when a longer commit message is required).
We can look at the revision log by using the command git log
:
In [ ]:
!git log
In the commit log, each revision is shown with a timestampe, a unique has tag that, and author information and the commit message.
All commits results in a changeset, which has a "diff" describing the changes to the file associated with it. We can use git diff
so see what has changed in a file:
In [ ]:
%%%bash
echo "The color of this planet is red." >> mars.txt
git status
In [ ]:
!git diff mars.txt
In [ ]:
!git commit -m "new Mars info"
The error is: we have fist to add the filename in then commit
In [ ]:
%%%bash
git add mars.txt
git commit -m "new Mars info"
git status
That looks quite cryptic but is a standard form for describing changes in files. We can use other tools, like graphical user interfaces or web based systems to get a more easily understandable diff.
Using the software gitk, the view is more clear:
To see the latest change or see the changes with as little information as possible, we can use the following commands:
In [ ]:
%%%bash
git log -1 # last change
echo ""
git log --oneline # short output
To visualize the latest changes (with diff), we can use the command:
In [ ]:
!git diff HEAD~1 mars.txt # Difference with previous version of the file
In particular we refer to the previous versions using HEAD, HEAD~1, HEAD~2:
So, if we add a new information to the mars file:
In [ ]:
%%%bash
echo "Mars has two nice moons." >> mars.txt
git add mars.txt
git commit -m "even more info about Mars"
git status
We can see the difference with respect to 2 versions ago:
In [ ]:
!git diff HEAD~2 mars.txt
To discard a change (revert to the latest version in the repository) we can use the checkout
command.
So, let's do a change in the repository and go back to the last commit.
In [ ]:
%%%bash
echo "Many space probes crashed on Mars." >> mars.txt
cat mars.txt
In [ ]:
%%%bash
git checkout HEAD mars.txt
git status
cat mars.txt
If we want to get the code for a specific revision, we can use "git checkout" and giving it the hash code for the revision we are interested as argument:
In [ ]:
!git log
In [ ]:
!git checkout ef01389982dae3dc8583ff10291df34060c30bdd
Now the content of all the files like in the revision with the hash code listed above (first revision)
In [ ]:
!cat mars.txt
We can move back to "the latest" (master) with the command:
In [ ]:
!git checkout master
In [ ]:
!cat mars.txt
In [ ]:
!git status
If we want to ignore files or directories created in our repository, we can
list them in the .gitignore
file.
This is particularly useful when we compile codes or create temporary files we
are not interested in saving. For instance, if we don't want to save the results
directory or any file that ends in .dat
, we can use the following strategy:
In [ ]:
%%%bash
mkdir results
touch results/a.out
touch results/b.out
git status
In [ ]:
%%%bash
echo 'results/
*.dat' > .gitignore
git status
In [ ]:
%%%bash
git add .gitignore
git commit -m "Added gitignore file"
git status
In [ ]:
%%%bash
echo "nothing" > jupiter.txt
git status
cat jupiter.txt
In [ ]:
%%%bash
git checkout HEAD jupiter.txt
cat jupiter.txt
Tags are named revisions. They are useful for marking particular revisions for later references. For example, we can tag our code with the tag "paper-1-final" when simulations for "paper-1" are finished and the paper submitted. Then we can always retrieve the exactly the code used for that paper even if we continue to work on and develop the code for future projects and papers.
In [ ]:
!git log
In [ ]:
!git tag -a demotag1 -m "pre-release"
In [ ]:
!git tag -l
In [ ]:
!git show demotag1
To retrieve the code in the state corresponding to a particular tag, we can use the:
git checkout tagname
command.
In [ ]:
!git checkout demotag1
Let's go back to our master branch.
In [ ]:
!git checkout master
With branches we can create diverging code bases in the same repository. They are for example useful for experimental development that requires a lot of code changes that could break the functionality in the master branch. Once the development of a branch has reached a stable state it can always be merged back into the trunk. Branching-development-merging is a good development strategy when serveral people are involved in working on the same code base. But even in single author repositories it can often be useful to always keep the master branch in a working state, and always branch/fork before implementing a new feature, and later merge it back into the main trunk.
In GIT, we can create a new branch like this:
In [ ]:
!git branch expr1
We can list the existing branches like this:
In [ ]:
!git branch
And we can switch between branches using checkout
:
In [ ]:
!git checkout expr1
Make a change in the new branch.
In [ ]:
%%file jupiter.txt
Jupiter is the biggest planet in the solar system
It is considered a failed star.
In [ ]:
!git commit -m "added a line in expr1 branch" jupiter.txt
In [ ]:
!git branch
In [ ]:
!cat jupiter.txt
In [ ]:
!git checkout master
In [ ]:
!cat jupiter.txt
In [ ]:
!git branch
We can merge an existing branch and all its changesets into another branch (for example the master branch) like this:
First change to the target branch:
In [ ]:
!git checkout master
In [ ]:
!git merge expr1
In [ ]:
!git branch
In [ ]:
!cat jupiter.txt
We can delete the branch expr1
now that it has been merged into the master:
In [ ]:
!git branch -d expr1
In [ ]:
!git branch
In [ ]:
!cat jupiter.txt
In our case, we need to sign up for an account on http://github.com
Once entered in your account, you can create your first repository with the name of the directory containing your project. In this case: planets.
Since our repository already exists, we can simply link it and push to master. The second step has to be done interactively, since it asks for username and password.
At this point, write the name of the repository in the next page:
After clicking "Create repository", we are lead in a third page which contains the URL of our repository:
Once the repository is created we will have to link our local repository to the remote one and push it to the remote repository.
We will label the URL as origin
.
In [ ]:
!git remote add origin https://github.com/darioflute/planets.git
Then, we will push everything to the remote repository. This operation has to be done interactively, since it requires typing username and password:
git push -u origin master
Username for 'https://github.com': darioflute
Password for 'https://darioflute@github.com':
Counting objects: 25, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (21/21), done.
Writing objects: 100% (25/25), 2.42 KiB | 0 bytes/s, done.
Total 25 (delta 3), reused 0 (delta 0)
remote: Resolving deltas: 100% (3/3), done.
To https://github.com/darioflute/planets.git
* [new branch] master -> master
Branch master set up to track remote branch master from origin.
To check the address of the parent repository, labelled origin:
In [ ]:
!git remote -v
In [ ]:
!git remote show origin
We can retrieve updates from the origin repository by "pulling" changesets from "origin" to our repository:
In [ ]:
!git pull origin
We can register addresses to many different repositories, and pull in different changesets from different sources, but the default source is the origin from where the repository was first cloned (and the work origin could have been omitted from the line above).
After making changes to our local repository, we can push changes to a remote repository using git push
. Again, the default target repository is origin
, so we can do:
In [ ]:
!git status
In [ ]:
%%%bash
echo 'A few notes about planets' > README
In [ ]:
!git add README
In [ ]:
!git commit -m "added README" README
At this point, we can push again the update to the remote repository:
git push origin master
By far the easiest way is to log in your gitHub account:
https://github.com/darioflute/planets
https://github.com/darioflute/planets/settings
Github.com is a git repository hosting site that is very popular with both open source projects (for which it is free) and private repositories (for which a subscription might be needed).
With a hosted repository it easy to collaborate with colleagues on the same code base, and you get a graphical user interface where you can browse the code and look at commit logs, track issues etc.
Some good hosted repositories are
There are also a number of graphical users interfaces for GIT. The available options vary a little bit from platform to platform:
http://git-scm.com/downloads/guis
gitk is a popular browser for GIT
git-gui is a tool to use GIT in a graphical way
They can be easily installed in linux with apt-get or synaptic.
In [1]:
%load_ext version_information
%version_information version_information
Out[1]: