This is the list of software you should have installed on your computer to follow the classes:
Part of the course will be explaining the basics of UNIX using the bash shell. For this reason, I will support only Linux and OS-X operative systems.
You will also need to open an account on github to use the git software to preserve remotely your codes.
Github - http://www.github.com
To install Python, you can simply get the installation bash script installer on:
https://www.continuum.io/downloads
Please, retrieve the Python 2.7 version since we will study an important package which is not yet ported on Python 3. You can run this script on your computer to install the anaconda distribution in your home directory.
After installation a line is added by the installer to your ~/.bashrc file to add the anaconda/bin to your PATH.
Once installed, we will learn how to use the "conda" command to import new packages, upgrade them, and finally build your own packages.
To install git on your computer you will use a different way depending if you have a Linux or a OS-X based computer.
Typically on Linux it is sufficient to install it with:
sudo apt-get install git
In this case you need to be root to install the code. On Mac OS-X, you can use macports.
Once downloaded and installed anaconda Python, it is time to use the package manager conda to check, upgrade, install, remove, etc. packages.
We can start by upgrading the same conda.
conda update conda
We will have to install several new packages, for instance:
conda install astropy
If we cannot find a certain package, we can search conda for it:
anaconda search -t conda lmfit
Sometimes, a package is not available directly, but through a channel. This is the case of lmfit which can be found also in astropy channel. To install it, we have to do:
anaconda install -c astropy lmfit
Another small package we will use is: version_information:
conda install -c pydy version_information
This allows to mark the notebook with the versions used
In [1]:
%load_ext version_information
%version_information numpy, scipy, astropy, matplotlib, version_information
Out[1]:
Anaconda comes with its own interactive development environment (IDE) called spyder. Obviously you can use any editor to edit your codes. For instance, a widely used editor on Linux is emacs, which has its own Python modes. The advantage of an IDE is that it is self-contained. It can suggest classes, methods, etc. which are available in Python when editing. It helps with debugging, signals immediately unuseful lines of code or when the code is incorrect. It is also possible to run the code inside it. And it allows to define a project inside it when you are working on a complicate package.
To experiment with spyder simply call it:
spyder
We will use this IDE to develop a package in the second series of lectures
A more advanced IDE is eclipse. This tool is worth studying if you think to do more coding also in other languages such as C++ of Java. To run on python it needs an additional plugin. It is also possible to install a plugin to directly access github through git commands. The installation is extraordinary simple. The latest eclipse is available on:
https://eclipse.org/downloads/index.php?show_instructions=TRUE
To work with Python you will need to install the following plugin:
https://marketplace.eclipse.org/content/pydev-python-ide-eclipse
A nice tutorial for Python programming in Eclipse is available at: https://www.ics.uci.edu/~pattis/common/handouts/introtopythonineclipse/
Finally, if you are interested in adding the eGit plugin to save your code versions in a faster way, you can install it from:
http://marketplace.eclipse.org/content/egit-git-team-provider
I will not cover eclipse in these lectures, but I encourage you to try it out if you should start to code in a more serious way.
Anaconda comes also with another wonder: the notebook.
Calling:
you open your favorite browser to navigate the directory and opening *.ipynb files where notes, codes, and results are conserved (as these lessons). Once opening a notebook, the interactive version of Python which is running in the background allows one to run snippets of code and reproduce the results inside the notebook. Through magic commands, one can embed figures inside the document. So, the document can be conserve to reproduce your own research or passed to a collaborator to illustrate what you did. It is therefore a wonderful way to do your research, document, and conserve it. We will make a great use of this environment during the lectures to learn how to code in Python, before starting to write code in a serious package.
An advanced lecture on notebook is available in the 2nd lecture.
If you have ever used a computer with some UNIX, you are already familiar with many UNIX command. But you are probably unaware of the many possibilities of shell commands. Or you probably never did a shell script.
I find that knowing how to use shell commands can help you organizing your file structure, find things which you thought were lost, make complicate changes in several files, run programs in a repetitive way, and so on.
So, my first advice to enhance your computer skills is to study how to use your shell efficiently. In this lecture, I will go through the main capabilities. Then, it is up to you to learn more and expand your skills. Usually, a good way to find how to do it, is asking Google. There are many answers out there if you know how to ask a question.
A shell is, as the name suggests, an enclosure of the operative system of the computer which allows us to interact with it hiding its complexity. The most used Unix shell today is the Bash shell. This is a command shell, i.e. a shell that allows us to interact with the computer through a series of criptic commands which we write on a keyboard.
The commands are terse to be written in the fastest way possible. This makes them very cryptic. How many of you know what pwd means ?
Yes, it's print working directory.
Although nowadays it is possible to interact with computers with graphical interfaces, such as mice, touchscreens, thouchpads, etcetera, command lines remain very effective and the only way to automate long sequences of orders and to interact with remote machines.
One of the most important thing to do is monitoring the machine. If you are running a program and you have to stop it, how do you identify it ? And, what it's running now ?
It turns out that in Unix any process is identified by a number. We can find the process and stop it, put it into back- or foreground, and kill it.
To find out what you launched:
To find out all the processes running:
To stop a running process, use the interrupt:
Then, put it in background:
If you want to put in foreground again, find its number with jobs and run:
Finally, if you want to kill a process:
with i the process number. Finally, to have a dynamic view of what is going on inside your machine, use:
which you can stop with:
In [28]:
%%bash
ps -a
In [4]:
!echo $PS1
Redefining the prompt.
In [3]:
%%bash
PS1='$ '
echo $PS1
In [9]:
%%bash
whoami
pwd
These commands gives the name of the user who is connected to the computer account and the current working directory. It shows the entire tree (or absolute path).
At this point we want to know how what is contained in one directory and how to move around. This is accomplished with the commands:
which mean list and change directory.
As we will see, any command can come with several arguments. To know everything about the use of a command, we will use the command:
as manual. For instance:
In [10]:
%%bash
man pwd
Another command which can be useful to find a command, is:
For instance, if we search for pwd, we can see if there is a command with "working directory":
In [11]:
%%bash
apropos "working directory"
Commands with many options usually have an help page which can be called typing --help after the command.
In [18]:
%%bash
ls --help
In [25]:
%%bash
pwd
cd
pwd
In the second case, we go in the parent directory.
'..’ means ‘the directory above the current one’; ‘.’ on its own means ‘the current directory’.
In [26]:
%%bash
pwd
cd ..
pwd
Finally, we can go to a path. This can be an absolute path (containing the entire path) or a relative path (i.e. a subdirectory).
In [27]:
%%bash
pwd
cd ../Notes
pwd
ls is a very versatile command. Can be used with a miriad of options. So, let see together a few frequently used ones.
In the 1st case, an identifier is added at the end of the item. For instance, in the case of directories, a slash is added to the names of directories.
The option -a allows one to see all the files, also the hidden files which in Unix start with a dot "." and are commonly invisible.
The third option shows the files with a lot of information, such as privileges, size, dates. The extra option -h writes the sizes in human-readable form (such as K bytes).
Finally, a useful option is ordering the files according to their last modification date. The most recent are listed first.
In [40]:
%%bash
ls -d */
mkdir Test
echo " ----- After creation ----- "
ls -d */
mv Test Test2
echo " ----- After moving -------"
ls -d */
rmdir Test2
echo " ----- After removing ------"
ls -d */
We proceed in a similar way with files. It is possible to create an empty file using the command touch. This command is usually used to update the time stamp of the file. If the file does not exist, it creates an empty file.
So, let's see another example.
In [44]:
%%bash
echo " ---- start ------"
ls test*
touch test
echo " ----- file created ----"
ls test*
mv test test2
echo " ------ file renamed -----"
ls test*
rm test2
echo " ------ file removed ------"
ls test*
A good measure when using "rm" is to ask for confirmation. This is achieved with the option "-i"
In [45]:
%%bash
touch test
rm -i test
Finally, we want to know how to copy files, directories, and entire trees. This is accomplished with the command cd.
In particular we can copy files into an existing directory:
Copy all files starting ending with txt in a directory:
Copy recursiverly a directory into another location:
Editing files is another big chapter. For this scope one can use one of the major editors: vi, emacs, etctera. Although I use mainly emacs, it is useful to know the basics of vi since it is always installed in a Unix system and can be handy when managing big files.
The basic command to learn are:
Inside the file, you can search for a string by typing backslash string:
At this point, after introducing some basic commands, let's have a look to one of the most powerful feature of the shell: combining commands.
To introduce some examples, we will use a popular command:
As usually, we will use it with an option to get the number of lines in a file: -l.
We can use -c or -w to count characters or words.
In [47]:
%%bash
wc -l *.ipynb
At this point, we will redirect this to a file, using >.
In [48]:
%%bash
wc -l *.ipynb > lengths.txt
We can see the content of this file with the command: cat as catalog:
In [49]:
%%bash
cat lengths.txt
We can sort this file according to the length (first argument in each line):
In [50]:
%%bash
sort -n lengths.txt
We can output this sorted list in a file and get the first line with head and the last line with tail:
In [52]:
%%bash
sort -n lengths.txt > sorted-lengths.txt
head -n 1 sorted-lengths.txt
tail -n 1 sorted-lengths.txt
Instead of saving the result of a command in a file, we can directly pipe the result in another command:
In [53]:
%%bash
wc -l *.ipynb | sort -n
And even pipe it again to get only the shortest file:
In [54]:
%%bash
wc -l *.ipynb | sort -n | head -1
This fact is made possible by the way commands work in Unix. Each command is a program which accepts data from a channel called standard input (or stdin) and outputs results on another channel called standard output (or stdout). A third channel, called standard error (or stderr) is used to communicate error messages.
When a pipe is used, the stdout of the first program becomes the stdin of the following program. So, in Unix it is possible to build many little programs and chain them to execute complicate operations in an efficient way.
More on redirection:
Let do a little digression about wildcards. When we want to consider a group of files with part of name in common, we can use two symbols:
The first allows to substitute a generic string. The second one a generic character. We can also use wildcard expressions. These contain a set of characters inside square brackets.
For instance, 'ls *[AB].txt' will match all the files ending in A.txt or B.txt. If we want to exclude only these files we will use 'ls *[^AB].txt'. The expressions [0-4] mean all the digits between 0 and 4. [a-c] is equivalent to [abc].
In [57]:
%%bash
ls *[sk].ipynb
In [58]:
%%bash
ls *[^sk].ipynb
In [5]:
%%bash
wc -l *.ipynb
echo " "
wc -l *.ipynb | cut -c 8-25
In [11]:
%%bash
wc -l *.ipynb | tr -s ' ' | cut -d' ' -f 3
In [28]:
%%bash
echo 'one
two
three and four' | tr "\n" " " | tr -s ' '
In [77]:
%%bash
for filename in *.ipynb
do
echo $filename
head -40 $filename | tail -n 1
done
In this case we create a list of files with the command "*.ipynb". The "for" loop assigns to the variable $filename each name of the list. Then operates the commands inside the loop to each one of them.
Whitespace is used to separate the elements of the list. If one file happens to have a space in the name, it will be treated as two files. So, it's a good practice to avoid spaces in the names. Otherwise, the name has to be passed inside quotes, such as "file .dat".
We can also give the same command in one line using semicolons to divide the commands:
In [78]:
%%bash
for filename in *.ipynb; do echo $filename; head -40 $filename | tail -n 1; done
Another way to repeat commands is to use the history:
history | tail -n 5
gives the last five commands. It does not work in notebook, anyway. Each command has a number. So, if we are interested in repeating the command number 450, we will simply do:
!450
If we remember we wrote a command, we can search in the history with Ctrl-R and part of the command.
Finally, we can go through the history simply using the up- and down-arrows on the keyboard.
In [85]:
%%bash
echo '#!/usr/bin/env bash' > script.sh
echo '
for filename in *.ipynb
do
echo $filename
head -40 $filename | tail -n 1
done
' >> script.sh
chmod +x script.sh
At this point we can execute the file.
In [91]:
%%bash
script.sh
Now, a more complex example. We write a script to explore part of a file. The name of the file, as well as starting and ending lines are given. To make the script more readable we will also add comments (lines starting with #)
In [90]:
%%bash
echo '#!/usr/bin/env bash' > middle.sh
echo '# show the middle part of a file' >> middle.sh
echo '# Usage: middle.sh filename end_line number_of_lines ' >> middle.sh
echo 'head -n "$2" "$1" | tail -n "$3"' >> middle.sh
chmod +x middle.sh
middle.sh Lecture-1.ipynb 40 1
Finally, an example with a list as input. In this case we use the variable $@ which refers to all of the input parameters.
In [92]:
%%bash
echo '#!/usr/bin/env bash' > script2.sh
echo '
for filename in "$@"
do
echo $filename
head -40 $filename | tail -n 1
done
' >> script2.sh
chmod +x script2.sh
script2.sh *.ipynb
If we want to define variables and arrays inside a bash script we have a particular syntax. Variables are declared as:
var="variable"
and called as $var.
Arrays are defined as:
ARRAY=( "val1" "val2" "val3" )
And are called in a loop as "\${ARRAY[@]}" or as single values \$ARRAY[1]. Let's see an example:
In [129]:
%%bash
echo '#!/usr/bin/env bash' > script3.sh
echo '
dir0="/dir/"
ARRAY=(
"val1"
"val2"
"val3"
)
echo "2nd value is "${ARRAY[1]}
for name in "${ARRAY[@]}"
do
echo $dir0$name
done
' >> script3.sh
chmod +x script3.sh
script3.sh
In [95]:
%%bash
echo '
Yesterday it worked
Today it is not working
Windows is like that
- - - - - - - - - - - -
Stay the patient course
Of little worth is your ire
The network is down
- - - - - - - - - - - -
Three things are certain:
Death, taxes, and lost data.
Guess which has occurred.
- - - - - - - - - - - -
Chaos reigns within.
Reflect, repent, and reboot.
Order shall return.
- - - - - - - - - - - -
ABORTED effort:
Close all that you have.
You ask way too much.
- - - - - - - - - - - -
The Tao that is seen
Is not the true Tao, until
You bring fresh toner.
- - - - - - - - - - - -
A crash reduces
your expensive computer
to a simple stone.
- - - - - - - - - - - -
Error messages
cannot completely convey.
We now know shared loss.
' > haiku
We want to find lines with particular words. Our friend is grep.
grep allows one to search a pattern inside a file. It can be a simple string:
In [96]:
%%bash
grep day haiku
Or, we can search only for the word "day" and not the occurences of the string:
In [97]:
%%bash
grep -w day haiku
No output, but we can search a sentence:
In [98]:
%%bash
grep -w "is not" haiku
Another useful option is "-n" which gives the line number of the occurence of a string:
In [99]:
%%bash
grep -n "it" haiku
And, of course, we can combine the two things:
In [100]:
%%bash
grep -n -w "it" haiku
Sometimes, we want to make the search case insesitive:
In [102]:
%%bash
grep -i -n -w "the" haiku
Or make the inverse search, all the lines without "the":
In [103]:
%%bash
grep -n -w -v -i "the" haiku
We can have many more options, but the real power of grep comes from the usage of regular expressions. Regular expressions can be very complicated. We will talk more about them during the other lectures. Otherwise, I advise to read the wikipedia page about them: https://en.wikipedia.org/wiki/Regular_expression
To give an idea, the following example searches for lines starting with words whose third letter is "o":
In [107]:
%%bash
grep -e '^..o' haiku
Another case, search for lines with two commas followed by space:
In [110]:
%%bash
grep -e ',\s.*,\s' haiku
While grep finds lines in a file, the find command finds files. This is another commands with plenty of options.
The main thing to remember is that find needs a starting directory, then it will explore all the subdirectories to find files with specified features. A simple example:
In [111]:
%%bash
find . -name "*.ipynb"
Another case is searching for directories:
In [113]:
%%bash
find . -type d
We can easily combine find with other commands. For instance with wc:
In [114]:
%%bash
wc -l $(find . -name '*.ipynb')
Or with grep:
In [117]:
%%bash
grep "extraordinary simple" $(find . -name '*.ipynb')
Finally, find can be used with the option -exec which is quite powerful since it can run a specified command on the selected files. For instance, if we look for directories containing files with a specific extension we can use the following syntax:
In [122]:
%%bash
find . -name '*.ipynb' -exec dirname {} \; | uniq
In [119]:
%%bash
echo 'The cat runs of the roof' | sed 's/run/walk/'
If you have perl installed on your laptop, it is possible to substitute string inside files in place, i.e. without creating new files. This is very handy, believe me !
In [124]:
%%bash
echo 'This file is rotten' > file.txt
cat file.txt
perl -pi -e 's/rotten/fresh/' file.txt
cat file.txt
Practicing is the only way to understand. So, after such a long introduction to the Unix bash shell, it is time to solve a few problems.
Write a command to find all the files of type *.dat whose name contains the string "ose" but not "temp"
Find the list of unique names of animals in this file (hint: use cut, sort, uniq):
2013-11-05,deer,5
2013-11-05,rabbit,22
2013-11-05,raccoon,7
2013-11-06,rabbit,19
2013-11-06,deer,2
2013-11-06,fox,1
2013-11-07,rabbit,18
2013-11-07,bear,1
In [134]:
%%bash
echo '
2013-11-05,deer,5
2013-11-05,rabbit,22
2013-11-05,raccoon,7
2013-11-06,rabbit,19
2013-11-06,deer,2
2013-11-06,fox,1
2013-11-07,rabbit,18
2013-11-07,bear,1
' > test.txt
cat test.txt | cut -d , -f 2 | sort | uniq
In [ ]: