Lecture 1

Software required

This is the list of software you should have installed on your computer to follow the classes:

  • Python (anaconda distribution)
  • git
  • bash

Part of the course will be explaining the basics of UNIX using the bash shell. For this reason, I will support only Linux and OS-X operative systems.

You will also need to open an account on github to use the git software to preserve remotely your codes.

Github - http://www.github.com

To install Python, you can simply get the installation bash script installer on:

https://www.continuum.io/downloads

Please, retrieve the Python 2.7 version since we will study an important package which is not yet ported on Python 3. You can run this script on your computer to install the anaconda distribution in your home directory.

After installation a line is added by the installer to your ~/.bashrc file to add the anaconda/bin to your PATH.

Once installed, we will learn how to use the "conda" command to import new packages, upgrade them, and finally build your own packages.

To install git on your computer you will use a different way depending if you have a Linux or a OS-X based computer.

Typically on Linux it is sufficient to install it with:

sudo apt-get install git

In this case you need to be root to install the code. On Mac OS-X, you can use macports.

Installing packages with conda

Once downloaded and installed anaconda Python, it is time to use the package manager conda to check, upgrade, install, remove, etc. packages.

We can start by upgrading the same conda.

conda update conda

We will have to install several new packages, for instance:

conda install astropy

If we cannot find a certain package, we can search conda for it:

anaconda search -t conda lmfit

Sometimes, a package is not available directly, but through a channel. This is the case of lmfit which can be found also in astropy channel. To install it, we have to do:

anaconda install -c astropy lmfit

Another small package we will use is: version_information:

conda install -c pydy version_information

This allows to mark the notebook with the versions used


In [1]:
%load_ext version_information
%version_information numpy, scipy, astropy, matplotlib, version_information


Out[1]:
SoftwareVersion
Python2.7.12 64bit [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
IPython5.1.0
OSLinux 4.4.0 57 generic x86_64 with debian jessie sid
numpy1.11.2
scipy0.18.1
astropy1.2.1
matplotlib1.5.3
version_information1.0.3
Wed Jan 04 16:28:04 2017 CST

IDE

Anaconda comes with its own interactive development environment (IDE) called spyder. Obviously you can use any editor to edit your codes. For instance, a widely used editor on Linux is emacs, which has its own Python modes. The advantage of an IDE is that it is self-contained. It can suggest classes, methods, etc. which are available in Python when editing. It helps with debugging, signals immediately unuseful lines of code or when the code is incorrect. It is also possible to run the code inside it. And it allows to define a project inside it when you are working on a complicate package.

To experiment with spyder simply call it:

spyder

We will use this IDE to develop a package in the second series of lectures

A more advanced IDE is eclipse. This tool is worth studying if you think to do more coding also in other languages such as C++ of Java. To run on python it needs an additional plugin. It is also possible to install a plugin to directly access github through git commands. The installation is extraordinary simple. The latest eclipse is available on:

https://eclipse.org/downloads/index.php?show_instructions=TRUE

To work with Python you will need to install the following plugin:

https://marketplace.eclipse.org/content/pydev-python-ide-eclipse

A nice tutorial for Python programming in Eclipse is available at: https://www.ics.uci.edu/~pattis/common/handouts/introtopythonineclipse/

Finally, if you are interested in adding the eGit plugin to save your code versions in a faster way, you can install it from:

http://marketplace.eclipse.org/content/egit-git-team-provider

I will not cover eclipse in these lectures, but I encourage you to try it out if you should start to code in a more serious way.

Notebook

Anaconda comes also with another wonder: the notebook.

Calling:

  • jupyter notebook *

you open your favorite browser to navigate the directory and opening *.ipynb files where notes, codes, and results are conserved (as these lessons). Once opening a notebook, the interactive version of Python which is running in the background allows one to run snippets of code and reproduce the results inside the notebook. Through magic commands, one can embed figures inside the document. So, the document can be conserve to reproduce your own research or passed to a collaborator to illustrate what you did. It is therefore a wonderful way to do your research, document, and conserve it. We will make a great use of this environment during the lectures to learn how to code in Python, before starting to write code in a serious package.

An advanced lecture on notebook is available in the 2nd lecture.

The Unix shell

If you have ever used a computer with some UNIX, you are already familiar with many UNIX command. But you are probably unaware of the many possibilities of shell commands. Or you probably never did a shell script.

I find that knowing how to use shell commands can help you organizing your file structure, find things which you thought were lost, make complicate changes in several files, run programs in a repetitive way, and so on.

So, my first advice to enhance your computer skills is to study how to use your shell efficiently. In this lecture, I will go through the main capabilities. Then, it is up to you to learn more and expand your skills. Usually, a good way to find how to do it, is asking Google. There are many answers out there if you know how to ask a question.

What is a shell ?

A shell is, as the name suggests, an enclosure of the operative system of the computer which allows us to interact with it hiding its complexity. The most used Unix shell today is the Bash shell. This is a command shell, i.e. a shell that allows us to interact with the computer through a series of criptic commands which we write on a keyboard.

The commands are terse to be written in the fastest way possible. This makes them very cryptic. How many of you know what pwd means ?

Yes, it's print working directory.

Although nowadays it is possible to interact with computers with graphical interfaces, such as mice, touchscreens, thouchpads, etcetera, command lines remain very effective and the only way to automate long sequences of orders and to interact with remote machines.

Jobs

One of the most important thing to do is monitoring the machine. If you are running a program and you have to stop it, how do you identify it ? And, what it's running now ?

It turns out that in Unix any process is identified by a number. We can find the process and stop it, put it into back- or foreground, and kill it.

To find out what you launched:

  • jobs

To find out all the processes running:

  • ps -a
  • ps -u username

To stop a running process, use the interrupt:

  • CTRL-z

Then, put it in background:

  • bg

If you want to put in foreground again, find its number with jobs and run:

  • fg %i

Finally, if you want to kill a process:

  • kill -9 %i

with i the process number. Finally, to have a dynamic view of what is going on inside your machine, use:

  • top

which you can stop with:

  • CTRL-c

In [28]:
%%bash
ps -a


  PID TTY          TIME CMD
 5170 pts/18   00:00:05 jupyter-noteboo
 5958 pts/18   00:00:00 eclipse
 5961 pts/18   00:02:46 java
 6089 pts/18   00:00:03 python
 7303 pts/18   00:00:00 more

Setting the prompt

We call prompt the string that appears in a window in the line where we can write a command. It is usually quite complicated since it can contain your username, the directory path, the time, etcetera. We can redefine the prompt as a simple sign.


In [4]:
!echo $PS1


$

Redefining the prompt.


In [3]:
%%bash
PS1='$ '
echo $PS1


$

First of all, let realize where and who we are. In Unix you can ask these questions to the computer.


In [9]:
%%bash
whoami
pwd


dario
/media/Data/workspace/Workshops/CompSkills4Astro/Lessons

These commands gives the name of the user who is connected to the computer account and the current working directory. It shows the entire tree (or absolute path).

At this point we want to know how what is contained in one directory and how to move around. This is accomplished with the commands:

  • ls
  • cd

which mean list and change directory.

As we will see, any command can come with several arguments. To know everything about the use of a command, we will use the command:

  • man

as manual. For instance:


In [10]:
%%bash
man pwd


PWD(1)                           User Commands                          PWD(1)



NAME
       pwd - print name of current/working directory

SYNOPSIS
       pwd [OPTION]...

DESCRIPTION
       Print the full filename of the current working directory.

       -L, --logical
              use PWD from environment, even if it contains symlinks

       -P, --physical
              avoid all symlinks

       --help display this help and exit

       --version
              output version information and exit

       NOTE:  your shell may have its own version of pwd, which usually super‐
       sedes the version described here.  Please refer to your  shell's  docu‐
       mentation for details about the options it supports.

AUTHOR
       Written by Jim Meyering.

REPORTING BUGS
       Report pwd bugs to bug-coreutils@gnu.org
       GNU coreutils home page: <http://www.gnu.org/software/coreutils/>
       General help using GNU software: <http://www.gnu.org/gethelp/>
       Report pwd translation bugs to <http://translationproject.org/team/>

COPYRIGHT
       Copyright  ©  2013  Free Software Foundation, Inc.  License GPLv3+: GNU
       GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
       This is free software: you are free  to  change  and  redistribute  it.
       There is NO WARRANTY, to the extent permitted by law.

SEE ALSO
       getcwd(3)

       The  full  documentation for pwd is maintained as a Texinfo manual.  If
       the info and pwd programs are properly installed at your site, the com‐
       mand

              info coreutils 'pwd invocation'

       should give you access to the complete manual.



GNU coreutils 8.21                March 2016                            PWD(1)

Another command which can be useful to find a command, is:

  • apropos

For instance, if we search for pwd, we can see if there is a command with "working directory":


In [11]:
%%bash
apropos "working directory"


chdir (2)            - change working directory
Cwd (3perl)          - get pathname of current working directory
fchdir (2)           - change working directory
get_current_dir_name (3) - get current working directory
getcwd (2)           - get current working directory
getcwd (3)           - get current working directory
getwd (3)            - get current working directory
git-stash (1)        - Stash the changes in a dirty working directory away
pwd (1)              - print name of current/working directory
pwdx (1)             - report current working directory of a process

Commands with many options usually have an help page which can be called typing --help after the command.


In [18]:
%%bash
ls --help


Usage: ls [OPTION]... [FILE]...
List information about the FILEs (the current directory by default).
Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.

Mandatory arguments to long options are mandatory for short options too.
  -a, --all                  do not ignore entries starting with .
  -A, --almost-all           do not list implied . and ..
      --author               with -l, print the author of each file
  -b, --escape               print C-style escapes for nongraphic characters
      --block-size=SIZE      scale sizes by SIZE before printing them.  E.g.,
                               '--block-size=M' prints sizes in units of
                               1,048,576 bytes.  See SIZE format below.
  -B, --ignore-backups       do not list implied entries ending with ~
  -c                         with -lt: sort by, and show, ctime (time of last
                               modification of file status information)
                               with -l: show ctime and sort by name
                               otherwise: sort by ctime, newest first
  -C                         list entries by columns
      --color[=WHEN]         colorize the output.  WHEN defaults to 'always'
                               or can be 'never' or 'auto'.  More info below
  -d, --directory            list directory entries instead of contents,
                               and do not dereference symbolic links
  -D, --dired                generate output designed for Emacs' dired mode
  -f                         do not sort, enable -aU, disable -ls --color
  -F, --classify             append indicator (one of */=>@|) to entries
      --file-type            likewise, except do not append '*'
      --format=WORD          across -x, commas -m, horizontal -x, long -l,
                               single-column -1, verbose -l, vertical -C
      --full-time            like -l --time-style=full-iso
  -g                         like -l, but do not list owner
      --group-directories-first
                             group directories before files.
                               augment with a --sort option, but any
                               use of --sort=none (-U) disables grouping
  -G, --no-group             in a long listing, don't print group names
  -h, --human-readable       with -l, print sizes in human readable format
                               (e.g., 1K 234M 2G)
      --si                   likewise, but use powers of 1000 not 1024
  -H, --dereference-command-line
                             follow symbolic links listed on the command line
      --dereference-command-line-symlink-to-dir
                             follow each command line symbolic link
                             that points to a directory
      --hide=PATTERN         do not list implied entries matching shell PATTERN
                               (overridden by -a or -A)
      --indicator-style=WORD  append indicator with style WORD to entry names:
                               none (default), slash (-p),
                               file-type (--file-type), classify (-F)
  -i, --inode                print the index number of each file
  -I, --ignore=PATTERN       do not list implied entries matching shell PATTERN
  -k, --kibibytes            use 1024-byte blocks
  -l                         use a long listing format
  -L, --dereference          when showing file information for a symbolic
                               link, show information for the file the link
                               references rather than for the link itself
  -m                         fill width with a comma separated list of entries
  -n, --numeric-uid-gid      like -l, but list numeric user and group IDs
  -N, --literal              print raw entry names (don't treat e.g. control
                               characters specially)
  -o                         like -l, but do not list group information
  -p, --indicator-style=slash
                             append / indicator to directories
  -q, --hide-control-chars   print ? instead of non graphic characters
      --show-control-chars   show non graphic characters as-is (default
                             unless program is 'ls' and output is a terminal)
  -Q, --quote-name           enclose entry names in double quotes
      --quoting-style=WORD   use quoting style WORD for entry names:
                               literal, locale, shell, shell-always, c, escape
  -r, --reverse              reverse order while sorting
  -R, --recursive            list subdirectories recursively
  -s, --size                 print the allocated size of each file, in blocks
  -S                         sort by file size
      --sort=WORD            sort by WORD instead of name: none -U,
                             extension -X, size -S, time -t, version -v
      --time=WORD            with -l, show time as WORD instead of modification
                             time: atime -u, access -u, use -u, ctime -c,
                             or status -c; use specified time as sort key
                             if --sort=time
      --time-style=STYLE     with -l, show times using style STYLE:
                             full-iso, long-iso, iso, locale, +FORMAT.
                             FORMAT is interpreted like 'date'; if FORMAT is
                             FORMAT1<newline>FORMAT2, FORMAT1 applies to
                             non-recent files and FORMAT2 to recent files;
                             if STYLE is prefixed with 'posix-', STYLE
                             takes effect only outside the POSIX locale
  -t                         sort by modification time, newest first
  -T, --tabsize=COLS         assume tab stops at each COLS instead of 8
  -u                         with -lt: sort by, and show, access time
                               with -l: show access time and sort by name
                               otherwise: sort by access time
  -U                         do not sort; list entries in directory order
  -v                         natural sort of (version) numbers within text
  -w, --width=COLS           assume screen width instead of current value
  -x                         list entries by lines instead of by columns
  -X                         sort alphabetically by entry extension
  -Z, --context              print any SELinux security context of each file
  -1                         list one file per line
      --help     display this help and exit
      --version  output version information and exit

SIZE is an integer and optional unit (example: 10M is 10*1024*1024).  Units
are K, M, G, T, P, E, Z, Y (powers of 1024) or KB, MB, ... (powers of 1000).

Using color to distinguish file types is disabled both by default and
with --color=never.  With --color=auto, ls emits color codes only when
standard output is connected to a terminal.  The LS_COLORS environment
variable can change the settings.  Use the dircolors command to set it.

Exit status:
 0  if OK,
 1  if minor problems (e.g., cannot access subdirectory),
 2  if serious trouble (e.g., cannot access command-line argument).

Report ls bugs to bug-coreutils@gnu.org
GNU coreutils home page: <http://www.gnu.org/software/coreutils/>
General help using GNU software: <http://www.gnu.org/gethelp/>
For complete documentation, run: info coreutils 'ls invocation'

Changing directory

We can navigate around the directory tree using the command cd. There are three possible uses of cd:

  1. cd
  2. cd ..
  3. cd path

In the first case, we go to the home directory.


In [25]:
%%bash
pwd
cd
pwd


/media/Data/workspace/Workshops/CompSkills4Astro/Lessons
/home/dario

In the second case, we go in the parent directory.

'..’ means ‘the directory above the current one’; ‘.’ on its own means ‘the current directory’.


In [26]:
%%bash
pwd
cd ..
pwd


/media/Data/workspace/Workshops/CompSkills4Astro/Lessons
/media/Data/workspace/Workshops/CompSkills4Astro

Finally, we can go to a path. This can be an absolute path (containing the entire path) or a relative path (i.e. a subdirectory).


In [27]:
%%bash
pwd
cd ../Notes
pwd


/media/Data/workspace/Workshops/CompSkills4Astro/Lessons
/media/Data/workspace/Workshops/CompSkills4Astro/Notes

Listing a directory content

ls is a very versatile command. Can be used with a miriad of options. So, let see together a few frequently used ones.

  1. ls -F
  2. ls -a
  3. ls -lh
  4. ls -t

In the 1st case, an identifier is added at the end of the item. For instance, in the case of directories, a slash is added to the names of directories.

The option -a allows one to see all the files, also the hidden files which in Unix start with a dot "." and are commonly invisible.

The third option shows the files with a lot of information, such as privileges, size, dates. The extra option -h writes the sizes in human-readable form (such as K bytes).

Finally, a useful option is ordering the files according to their last modification date. The most recent are listed first.

Creating and destroying

To organize files we need a directory structure. So, it is useful to know how to create, move, and destroy directories. This is accomplished with the commands:

  1. mkdir
  2. mv
  3. rmdir and rm -r

Let see an example.


In [40]:
%%bash

ls -d */
mkdir Test
echo " ----- After creation ----- "
ls -d */
mv Test Test2
echo " ----- After moving  -------"
ls -d */
rmdir Test2
echo " ----- After removing ------"
ls -d */


Astropy/
 ----- After creation ----- 
Astropy/
Test/
 ----- After moving  -------
Astropy/
Test2/
 ----- After removing ------
Astropy/

We proceed in a similar way with files. It is possible to create an empty file using the command touch. This command is usually used to update the time stamp of the file. If the file does not exist, it creates an empty file.

So, let's see another example.


In [44]:
%%bash
echo " ---- start ------"
ls test*
touch test
echo " ----- file created ----"
ls test*
mv test test2
echo " ------ file renamed -----"
ls test*
rm test2
echo " ------ file removed ------"
ls test*


 ---- start ------
 ----- file created ----
test
 ------ file renamed -----
test2
 ------ file removed ------
ls: cannot access test*: No such file or directory
ls: cannot access test*: No such file or directory

A good measure when using "rm" is to ask for confirmation. This is achieved with the option "-i"


In [45]:
%%bash
touch test
rm -i test


rm: remove regular empty file ‘test’? 

Finally, we want to know how to copy files, directories, and entire trees. This is accomplished with the command cd.

In particular we can copy files into an existing directory:

  • cp file directory

Copy all files starting ending with txt in a directory:

  • cp *.txt directory

Copy recursiverly a directory into another location:

  • cp -r directory1 path

Editing files is another big chapter. For this scope one can use one of the major editors: vi, emacs, etctera. Although I use mainly emacs, it is useful to know the basics of vi since it is always installed in a Unix system and can be handy when managing big files.

The basic command to learn are:

  • vi filename opens a file
  • i enter text ESC to insert text at the cursor
  • a enter text ESC to append text after the cursor
  • x to delete a character
  • dd to delete a line
  • ZZ save the file
  • :q quit

Inside the file, you can search for a string by typing backslash string:

  • \string

Pipes and filters

At this point, after introducing some basic commands, let's have a look to one of the most powerful feature of the shell: combining commands.

To introduce some examples, we will use a popular command:

  • wc, called word count

As usually, we will use it with an option to get the number of lines in a file: -l.

We can use -c or -w to count characters or words.


In [47]:
%%bash
wc -l *.ipynb


  1907 git.ipynb
  1066 Lecture 1.ipynb
  1658 Notebook.ipynb
  4373 pandas.ipynb
   314 PythonBasics.ipynb
  9318 total

At this point, we will redirect this to a file, using >.


In [48]:
%%bash
wc -l *.ipynb > lengths.txt

We can see the content of this file with the command: cat as catalog:


In [49]:
%%bash
cat lengths.txt


  1907 git.ipynb
  1066 Lecture 1.ipynb
  1658 Notebook.ipynb
  4373 pandas.ipynb
   314 PythonBasics.ipynb
  9318 total

We can sort this file according to the length (first argument in each line):


In [50]:
%%bash
sort -n lengths.txt


   314 PythonBasics.ipynb
  1066 Lecture 1.ipynb
  1658 Notebook.ipynb
  1907 git.ipynb
  4373 pandas.ipynb
  9318 total

We can output this sorted list in a file and get the first line with head and the last line with tail:


In [52]:
%%bash
sort -n lengths.txt > sorted-lengths.txt
head -n 1 sorted-lengths.txt
tail -n 1 sorted-lengths.txt


   314 PythonBasics.ipynb
  9318 total

Instead of saving the result of a command in a file, we can directly pipe the result in another command:


In [53]:
%%bash
wc -l *.ipynb | sort -n


   314 PythonBasics.ipynb
  1161 Lecture 1.ipynb
  1658 Notebook.ipynb
  1907 git.ipynb
  4373 pandas.ipynb
  9413 total

And even pipe it again to get only the shortest file:


In [54]:
%%bash
wc -l *.ipynb | sort -n | head -1


   314 PythonBasics.ipynb

This fact is made possible by the way commands work in Unix. Each command is a program which accepts data from a channel called standard input (or stdin) and outputs results on another channel called standard output (or stdout). A third channel, called standard error (or stderr) is used to communicate error messages.

When a pipe is used, the stdout of the first program becomes the stdin of the following program. So, in Unix it is possible to build many little programs and chain them to execute complicate operations in an efficient way.

More on redirection:

  • < file , use the content of a file as stdin
  • >> file, append the result to an existing file

Wildcards

Let do a little digression about wildcards. When we want to consider a group of files with part of name in common, we can use two symbols:

  • *
  • ?

The first allows to substitute a generic string. The second one a generic character. We can also use wildcard expressions. These contain a set of characters inside square brackets.

  • [AB]
  • [a-c]
  • [0-4]
  • [^AB]

For instance, 'ls *[AB].txt' will match all the files ending in A.txt or B.txt. If we want to exclude only these files we will use 'ls *[^AB].txt'. The expressions [0-4] mean all the digits between 0 and 4. [a-c] is equivalent to [abc].


In [57]:
%%bash
ls *[sk].ipynb


Notebook.ipynb
pandas.ipynb
PythonBasics.ipynb

In [58]:
%%bash
ls *[^sk].ipynb


git.ipynb
Lecture 1.ipynb

Other useful commands:

  • tr

Translate, squeeze, deletes characters. For instance, we can squeeze all the blanks in a file with "tr ' ' file"

  • cut

Cut the output. We can use a column by giving a delimiter or the column numbers.

  • uniq

Filters the unique entries. The input should be sorted.


In [5]:
%%bash
wc -l *.ipynb
echo " "
wc -l *.ipynb | cut -c 8-25


   1907 git.ipynb
   2370 Lecture-1.ipynb
   1675 Lecture-2A.ipynb
   3975 Lecture-2B.ipynb
   5243 Lecture-3.ipynb
   3289 Lecture-4.ipynb
   1977 Lecture-5.ipynb
   1425 Lecture-7.ipynb
   4364 pandas.ipynb
     81 Solutions-1.ipynb
  26306 total
 
 git.ipynb
 Lecture-1.ipynb
 Lecture-2A.ipynb
 Lecture-2B.ipynb
 Lecture-3.ipynb
 Lecture-4.ipynb
 Lecture-5.ipynb
 Lecture-7.ipynb
 pandas.ipynb
 Solutions-1.ipynb
 total

In [11]:
%%bash
wc -l *.ipynb | tr -s ' ' | cut -d' ' -f 3


git.ipynb
Lecture-1.ipynb
Lecture-2A.ipynb
Lecture-2B.ipynb
Lecture-3.ipynb
Lecture-4.ipynb
Lecture-5.ipynb
Lecture-7.ipynb
pandas.ipynb
Solutions-1.ipynb
total

In [28]:
%%bash
echo 'one
two
three and  four' | tr "\n" " " | tr -s ' '


one two three and four 

Loops

Sometimes we have to repeat the same command several times using different arguments. For instance, changing names of files or showing the beginning of a list of files. That's what loops are for !

Let's see a simple example:


In [77]:
%%bash
for filename in *.ipynb
do
    echo $filename
    head -40 $filename | tail -n 1 
done


git.ipynb
    "RCS can used on almost any digital content, so it is not only restricted to software development, and is also very useful for manuscript files, figures, data and notebooks!\n",
Lecture-1.ipynb
    "Typically on Linux it is sufficient to install it with:\n",
Notebook.ipynb
    "\n",
pandas.ipynb
   "outputs": [
PythonBasics.ipynb
    "How to define a variable and check its type:"

In this case we create a list of files with the command "*.ipynb". The "for" loop assigns to the variable $filename each name of the list. Then operates the commands inside the loop to each one of them.

Whitespace is used to separate the elements of the list. If one file happens to have a space in the name, it will be treated as two files. So, it's a good practice to avoid spaces in the names. Otherwise, the name has to be passed inside quotes, such as "file .dat".

We can also give the same command in one line using semicolons to divide the commands:


In [78]:
%%bash
for filename in *.ipynb; do echo $filename; head -40 $filename | tail -n 1; done


git.ipynb
    "RCS can used on almost any digital content, so it is not only restricted to software development, and is also very useful for manuscript files, figures, data and notebooks!\n",
Lecture-1.ipynb
    "Typically on Linux it is sufficient to install it with:\n",
Notebook.ipynb
    "\n",
pandas.ipynb
   "outputs": [
PythonBasics.ipynb
    "How to define a variable and check its type:"

History

Another way to repeat commands is to use the history:

history | tail -n 5

gives the last five commands. It does not work in notebook, anyway. Each command has a number. So, if we are interested in repeating the command number 450, we will simply do:

!450

If we remember we wrote a command, we can search in the history with Ctrl-R and part of the command.

Finally, we can go through the history simply using the up- and down-arrows on the keyboard.

Shell scripts

When we use frequently a series of commands, we can save them in a shell script. A typical script start with a !bang line which define the language (bash in this case) and the file has an extensions .sh We can make the file executable (chmod +x file.sh) and run it directly.


In [85]:
%%bash
echo '#!/usr/bin/env bash' > script.sh
echo '
for filename in *.ipynb
do
    echo $filename
    head -40 $filename | tail -n 1 
done
' >> script.sh

chmod +x script.sh

At this point we can execute the file.


In [91]:
%%bash
script.sh


git.ipynb
    "RCS can used on almost any digital content, so it is not only restricted to software development, and is also very useful for manuscript files, figures, data and notebooks!\n",
Lecture-1.ipynb
    "Typically on Linux it is sufficient to install it with:\n",
Notebook.ipynb
    "\n",
pandas.ipynb
   "outputs": [
PythonBasics.ipynb
    "How to define a variable and check its type:"

Now, a more complex example. We write a script to explore part of a file. The name of the file, as well as starting and ending lines are given. To make the script more readable we will also add comments (lines starting with #)


In [90]:
%%bash
echo '#!/usr/bin/env bash' > middle.sh
echo '# show the middle part of a file' >> middle.sh
echo '# Usage: middle.sh filename end_line number_of_lines  ' >> middle.sh
echo 'head -n "$2" "$1" | tail -n "$3"' >> middle.sh

chmod +x middle.sh

middle.sh Lecture-1.ipynb 40 1


    "Typically on Linux it is sufficient to install it with:\n",

Finally, an example with a list as input. In this case we use the variable $@ which refers to all of the input parameters.


In [92]:
%%bash
echo '#!/usr/bin/env bash' > script2.sh
echo '
for filename in "$@"
do
    echo $filename
    head -40 $filename | tail -n 1 
done
' >> script2.sh

chmod +x script2.sh
script2.sh *.ipynb


git.ipynb
    "RCS can used on almost any digital content, so it is not only restricted to software development, and is also very useful for manuscript files, figures, data and notebooks!\n",
Lecture-1.ipynb
    "Typically on Linux it is sufficient to install it with:\n",
Notebook.ipynb
    "\n",
pandas.ipynb
   "outputs": [
PythonBasics.ipynb
    "How to define a variable and check its type:"

If we want to define variables and arrays inside a bash script we have a particular syntax. Variables are declared as:

var="variable"

and called as $var.

Arrays are defined as:

ARRAY=( "val1" "val2" "val3" )

And are called in a loop as "\${ARRAY[@]}" or as single values \$ARRAY[1]. Let's see an example:


In [129]:
%%bash
echo '#!/usr/bin/env bash' > script3.sh
echo '
dir0="/dir/"
ARRAY=(
"val1"
"val2"
"val3"
)
echo "2nd value is "${ARRAY[1]}
for name in "${ARRAY[@]}"
do
    echo $dir0$name
done
' >> script3.sh

chmod +x script3.sh
script3.sh


2nd value is val2
/dir/val1
/dir/val2
/dir/val3

Finding things

Let start with some humour.


In [95]:
%%bash
echo '
 Yesterday it worked
 Today it is not working
 Windows is like that
  - - - - - - - - - - - -
 Stay the patient course
 Of little worth is your ire
 The network is down
  - - - - - - - - - - - -
 Three things are certain:
 Death, taxes, and lost data.
 Guess which has occurred.
  - - - - - - - - - - - -
 Chaos reigns within.
 Reflect, repent, and reboot.
 Order shall return.
  - - - - - - - - - - - -
 ABORTED effort:
 Close all that you have.
 You ask way too much.
  - - - - - - - - - - - -
 The Tao that is seen
 Is not the true Tao, until
 You bring fresh toner.
  - - - - - - - - - - - -
 A crash reduces
 your expensive computer
 to a simple stone.
  - - - - - - - - - - - -
 Error messages
 cannot completely convey.
 We now know shared loss.
' > haiku

We want to find lines with particular words. Our friend is grep.

grep allows one to search a pattern inside a file. It can be a simple string:


In [96]:
%%bash
grep day haiku


 Yesterday it worked
 Today it is not working

Or, we can search only for the word "day" and not the occurences of the string:


In [97]:
%%bash
grep -w day haiku

No output, but we can search a sentence:


In [98]:
%%bash
grep -w "is not" haiku


 Today it is not working

Another useful option is "-n" which gives the line number of the occurence of a string:


In [99]:
%%bash
grep -n "it" haiku


2: Yesterday it worked
3: Today it is not working
7: Of little worth is your ire
14: Chaos reigns within.

And, of course, we can combine the two things:


In [100]:
%%bash
grep -n -w "it" haiku


2: Yesterday it worked
3: Today it is not working

Sometimes, we want to make the search case insesitive:


In [102]:
%%bash
grep -i -n -w "the" haiku


6: Stay the patient course
8: The network is down
22: The Tao that is seen
23: Is not the true Tao, until

Or make the inverse search, all the lines without "the":


In [103]:
%%bash
grep -n -w -v -i "the" haiku


1:
2: Yesterday it worked
3: Today it is not working
4: Windows is like that
5:  - - - - - - - - - - - -
7: Of little worth is your ire
9:  - - - - - - - - - - - -
10: Three things are certain:
11: Death, taxes, and lost data.
12: Guess which has occurred.
13:  - - - - - - - - - - - -
14: Chaos reigns within.
15: Reflect, repent, and reboot.
16: Order shall return.
17:  - - - - - - - - - - - -
18: ABORTED effort:
19: Close all that you have.
20: You ask way too much.
21:  - - - - - - - - - - - -
24: You bring fresh toner.
25:  - - - - - - - - - - - -
26: A crash reduces
27: your expensive computer
28: to a simple stone.
29:  - - - - - - - - - - - -
30: Error messages
31: cannot completely convey.
32: We now know shared loss.
33:

We can have many more options, but the real power of grep comes from the usage of regular expressions. Regular expressions can be very complicated. We will talk more about them during the other lectures. Otherwise, I advise to read the wikipedia page about them: https://en.wikipedia.org/wiki/Regular_expression

To give an idea, the following example searches for lines starting with words whose third letter is "o":


In [107]:
%%bash
grep -e '^..o' haiku


 Today it is not working
 You ask way too much.
 You bring fresh toner.
 your expensive computer
 to a simple stone.

Another case, search for lines with two commas followed by space:


In [110]:
%%bash
grep -e ',\s.*,\s' haiku


 Death, taxes, and lost data.
 Reflect, repent, and reboot.

While grep finds lines in a file, the find command finds files. This is another commands with plenty of options.

The main thing to remember is that find needs a starting directory, then it will explore all the subdirectories to find files with specified features. A simple example:


In [111]:
%%bash
find . -name "*.ipynb"


./Astropy/FITS-tables.ipynb
./Astropy/redshift_plot.ipynb
./Astropy/FITS-images.ipynb
./Astropy/plot-catalog.ipynb
./Astropy/FITS-header.ipynb
./Astropy/Coordinates.ipynb
./Astropy/Quantities.ipynb
./Notebook.ipynb
./pandas.ipynb
./.ipynb_checkpoints/Notebook-checkpoint.ipynb
./.ipynb_checkpoints/Lecture-1-checkpoint.ipynb
./.ipynb_checkpoints/PythonBasics-checkpoint.ipynb
./git.ipynb
./Lecture-1.ipynb
./PythonBasics.ipynb

Another case is searching for directories:


In [113]:
%%bash
find . -type d


.
./Astropy
./.ipynb_checkpoints

We can easily combine find with other commands. For instance with wc:


In [114]:
%%bash
wc -l $(find . -name '*.ipynb')


    532 ./Astropy/FITS-tables.ipynb
    562 ./Astropy/redshift_plot.ipynb
    577 ./Astropy/FITS-images.ipynb
    582 ./Astropy/plot-catalog.ipynb
    572 ./Astropy/FITS-header.ipynb
    555 ./Astropy/Coordinates.ipynb
    582 ./Astropy/Quantities.ipynb
   1658 ./Notebook.ipynb
   4373 ./pandas.ipynb
   1658 ./.ipynb_checkpoints/Notebook-checkpoint.ipynb
   2045 ./.ipynb_checkpoints/Lecture-1-checkpoint.ipynb
    314 ./.ipynb_checkpoints/PythonBasics-checkpoint.ipynb
   1907 ./git.ipynb
   2078 ./Lecture-1.ipynb
    314 ./PythonBasics.ipynb
  18309 total

Or with grep:


In [117]:
%%bash
grep "extraordinary simple" $(find . -name '*.ipynb')


./.ipynb_checkpoints/Lecture-1-checkpoint.ipynb:    "The installation is extraordinary simple.\n",
./Lecture-1.ipynb:    "The installation is extraordinary simple.\n",

Finally, find can be used with the option -exec which is quite powerful since it can run a specified command on the selected files. For instance, if we look for directories containing files with a specific extension we can use the following syntax:


In [122]:
%%bash
find . -name '*.ipynb' -exec dirname {} \; | uniq


./Astropy
.
./.ipynb_checkpoints
.

Substitute strings

Another common task is making substitutions in files. sed is a possibility. Other ways are using awk and eventually perl.


In [119]:
%%bash
echo 'The cat runs of the roof' | sed 's/run/walk/'


The cat walks of the roof

If you have perl installed on your laptop, it is possible to substitute string inside files in place, i.e. without creating new files. This is very handy, believe me !


In [124]:
%%bash
echo 'This file is rotten' > file.txt
cat file.txt
perl -pi -e 's/rotten/fresh/' file.txt
cat file.txt


This file is rotten
This file is fresh

Suggested activities

Practicing is the only way to understand. So, after such a long introduction to the Unix bash shell, it is time to solve a few problems.

  • Write a command to find all the files of type *.dat whose name contains the string "ose" but not "temp"

  • Find the list of unique names of animals in this file (hint: use cut, sort, uniq):

2013-11-05,deer,5
2013-11-05,rabbit,22
2013-11-05,raccoon,7
2013-11-06,rabbit,19
2013-11-06,deer,2
2013-11-06,fox,1
2013-11-07,rabbit,18
2013-11-07,bear,1


In [134]:
%%bash
echo '
2013-11-05,deer,5  
2013-11-05,rabbit,22  
2013-11-05,raccoon,7  
2013-11-06,rabbit,19  
2013-11-06,deer,2  
2013-11-06,fox,1  
2013-11-07,rabbit,18  
2013-11-07,bear,1  
' > test.txt

cat test.txt | cut -d , -f 2 | sort | uniq


bear
deer
fox
rabbit
raccoon

In [ ]: