Teach you some Bash!
You will need the following programs to use this code:
BashSee Wikipedia's entry. http://en.wikipedia.org/wiki/Bash_(Unix_shell).
There is certainly not shortage of tutorials for Bash and writing scripts. In fact, Google is going to be one of your best friends when it comes to debugging errors and other issues with Bash. Remember, someone else probably had the problem before you and already posted the solution. Remember this before you email your course instructor or TA. Below you will find a list of online Bash tutorials that you may find useful:
We are going to assume that you are ssh’d into proteus and working out of the course's GitHub repo. We will keep this repo update as the course goes on, bugs are found, and general improvements are made. If you have not already, clone the repo to your local folder. This can be done by running:
git clone https://github.com/gditzler/bio-course-materials.git
which will create a clone of the repo in the folder where the command was called. From time to time you may feel the need to update the repo with the staff's latest changes. To do this, run:
git reset --hard
git pull origin master
in the directory where you cloned the repo. Note that if you have been modifying the files in the repo, you'll encounter merge conflicts. Using git reset --hard will erase any changes that you have made. Therefore, it is recommend that you either: (i) copy the files you wish to experiment with to a new file before modifying them, or (ii) be aware that your changes will be erased whenever you reset the master branch. If you prefer to use IPython in our examples, run:
ipython notebook --pylab inline
Bash commands and conceptsBelow you will find a list of common commands using in Bash programming. While these commands are rather simple, we can manipulate them and use them with slightly more powerful commands to build complex expression with relatively few lines of code. Ignore the : character when calling these commands.
In [1]:
# cd <some path> : change directory
# cp <file a> <file b> : copy `file a` to `file b`. note that `file a` still remains.
# mv <file a> <file b> : move `file a` to `file b`
# ls : list the conents of a directory
# cat <file a> : print the contents of `file a`
# cat <file a> <file b> ... : concatenate the contents of files `file a` `file b` ...
# echo "Hello World!" : basic hello world program
# head -M <file a> : print the first M lines of `file a`
# tail -M <file a> : print the last M lines of `file a`
# wget <web address> : download a file from a web address
# mkdir <directory> : create a new directory
# touch <file a> : create an empty file
# rm <file> : remove a file (cannot be undone)
# rm -Rf <folder> : remove a folder and all of its contents (cannot be undone)
# find <folder> : print the files in the directory and all of its sub-directorys
# This is a comment!
Be extremely cautious when you are using the rm command as its action cannot be undone. This is not like placing an item in the trash bin on your desktop. Once you rm a file of folder you can never get it back. You have been warned!
Its important to note that you can always get help with a command by viewing its man page (man is short for manual). While, the man page can be helpful, Google is perhaps even more helpful! As is the tradition of shell scripting, asking for helps seem to lead to RTFM!
In [2]:
%%bash
man echo # prints out a little weird!
I have placed a very basic file tab delimited file in the data/ folder. Lets use some of the above Bash commands to pick the file a part. Since I am working in IPython, I need to add %%bash to the begining of all my lines. You can ignore them.
In [3]:
%%bash
ls -l ../data/ # list the ../data/ directory
# note `..` tells us to look back a directory. the -l is a flag that specifies ls to print the output in a list
echo " "
echo "Lets look at the files in this directory"
ls
Lets perform the following tasks now that we know the location of the eesi-names.txt file.
eesi-names.txteesi-names.txteesi-names.txteesi-names.txt to eesi-names-mycopy.txt
In [4]:
%%bash
head -1 ../data/eesi-names.txt
In [5]:
%%bash
tail -2 ../data/eesi-names.txt
In [6]:
%%bash
cat ../data/eesi-names.txt ../data/eesi-names.txt ../data/eesi-names.txt
In [7]:
%%bash
cp ../data/eesi-names.txt ../data/eesi-names-mycopy.txt
# check to make sure its there
ls -l ../data/
Just as with any other programming langauges, Bash has variables. Some can be scalars, strings or arrays. In this section we go over some basic types and how we can manipulate them. We are going to define our variables just as we would with any other programming langauge; however, when we access them we need to place $ in front of the name. For example,
In [8]:
%%bash
my_var=Greg
echo "Hello $my_var"
n=1
echo $(($n+1))
In this section, we are going to take care of a couple of topics (arrays, for loops and if statements). Something to keep in mind about bash is that it is very picky when it comes to whitespace. Sometimes it matters and sometimes it doesn’t! This section will bring up some of the times when it is going to matter.
First, let use define an array by using the parenthesis and separating each of the entries with a space. All of the objects in our array are of the same type in that there is nothing special about them. In general, curly brackets in Bash are used to group things together and the square brackets are used to index something. We can use the @ symbol as an index to list all of the entries out, which we are going to need for the for loop. As shown below the for is pretty boiler plate compared to other scripting langauges; however the if statement is a bit different. Notice that there is whitespace padded inside of the square brackets. Removing this space will produce an error. This is one of those times where whitespace makes a difference. Furthermore, the test for equality is performed using a single = symbol, which is different than most other programming languages. Refer to this website for many examples of using conditional statements with Bash.
In [9]:
%%bash
names=( Gail Yemin Greg Cricket Steve )
echo "The entry in position 1 is ${names[1]}"
for name in ${names[@]}; do
echo $name
if [ "$name" = "Greg" ]; then
echo "${name}ory"
fi
done
echo ${names[@]}
Pipes allow us to take the output of one program and feed them into the input of another program. While this concept is very simple, it will allow us to build very complex expressions. Lets us just do an example to see how this works. A pipe is given by "|". Lets say that we want to print out the second to last name in eesi-names.txt. We know that head prints out the header of a file and tail will print out the end of a file. We can use tail to print out the last two names then head to take the output from tail to get the second to last name. In code this is given by
In [10]:
%%bash
cat ../data/eesi-names.txt | tail -2 | head -1
Let us finish this example by using a redirect. Redirects allow us to redirect the std output to a file. That is, dump what is being printed out to a file rather than printing it out to the user. Let us redirect the second to last EESI name to a file. Note there are other, more clever ways to use redirects, however, we are only covering one usage.
In [11]:
%%bash
cat ../data/eesi-names.txt | tail -2 | head -1 > ../data/second-to-last-user.txt
cat ../data/second-to-last-user.txt
In this section, we are going to look at a couple of commands that use regular expressions, or regex for short. The first command we want to look at is grep. The grep utility searches any given input files, selecting lines that match one or more patterns. We can do many more operation with grep; however, just printing out certian lines of a file is powerful enough own its own because it can lead to further manipulation. Lets come back to eesi-names.txt and print out only the lines at start with the pattern G ('^' is used to denote the start of a line in the expression -- $ is used for the end of a line).
In [12]:
%%bash
cat ../data/eesi-names.txt | grep '^G'
echo " "
cat ../data/eesi-names.txt | grep 'Yemin'
The final regex tool we are going to look at is sed. This is a very powerful tool, however, we are only going to be interested in find/replace functionality. The way this works is we are going to give sed an expression telling it the pattern we want to search for and the pattern that we want to repace it with. For example, lets find any occurance of Gregory and replace it with Greg in eesi-names.txt. Have a close look at how we are calling sed in the example below.
In [13]:
%%bash
cat ../data/eesi-names.txt | sed -e 's/Gregory/Greg/g'
echo " "
# we can also group things and replace them. see if you can tell whats going on here
cat ../data/eesi-names.txt | sed -e 's/\(^G[a-z]*\)/\1 MIDDLE /g'
There are several flavors of text editors for the shell. Some are:
If you are interested in looking through a file but not editing it, I would recommend using less or more.
Lets say I need to download some photos from a website, and I am far too lazy to right click on every photo and save it to my computer. I am, however, stubborn enough to write some Bash code to parse and html file and download the photos without needing to right click a single image! In this example, we are going to download all of the photos from Dr. Rosen's EESI webpage and, surprisingly, given the few simple commands we have learned, can be achieved with just a few lines of code.
This problem is actually very easy to accomplish with Bash. First of all, lets think about the logicial steps that need to be accomplished and how we can use these basic commands to achieve this task.
PHP file. This is really the first step and you have already been given the web address! The file can be downloaded with wget and after looking through the file, we find that the image location is specified with scr=<path to the file>. Bash tools: wget.src= being found in PHP line. However, src need not just include images, it could include JavaScript. Therefore, we only need the line with the jpg file extension. The links to the images should be saved into an array. Bash tools: cat, grep, and sed.Bash tools: wget.Then 3 (actual) lines of code later!
In [14]:
%%bash
# rm *.jpg
# rm people.php*
web_home=http://www.ece.drexel.edu/gailr/EESI
wget -q ${web_home}/people.php
image_array=$(cat people.php | grep -E "src=.*jpg" | sed -e "s/.*src=\"\(.*\.jpg\)\".*/\1/g")
for image in ${image_array}; do
wget -q $web_home/$image
done
Ideas
Bash scriptsWriting a Bash script is relatively easy one we know the commands. In fact, its much like writing a script in any other language, such as Matlab. There are a few subtle differences from Matlab though. First, we can pass in any arbitrary number of arguments and access them which $1, $2, ..., where $1 is the first argument, $2 is the second argument, etc. Second, we should stick with a convention with our file names. Therefore, we will use sh to denote that the script is a shell script. Finally, we are going to add #!/usr/bin/env bash to the top of every file. This will tell the interpreter that the script is a Bash shell script as opposed to being a Python or Awk script. Here is an example of a very simple script:
In [15]:
%%bash
cat ../examples/bash-script.sh
To run the script, call sh and the name of the script with any arguments.
In [16]:
%%bash
sh ../examples/bash-script.sh Greg Ditzler