Workshop 1 - Jupyter notebooks and the bash kernel

This workshop is designed to introduce you to the two core tools that you will be using for the other workshops.

Before you begin this workshop you should know some very basic unix commands. These are covered in chapters 1-10 of the interactive guide. Unless you are already familiar with UNIX it is essential that you read over those chapters before you start (~ 30 minutes).

At the end of this workshop you should;

  1. Understand what a jupyter notebook is and how it relates to the unix command line
  2. Be able to edit text and run unix commands from within a jupyter notebook
  3. Know how to assess your learning by using the self-assessment exercises in a jupyter notebook

Jupyter notebooks

The document you are reading is a jupyter notebook.

It consists of series of cells that contain either text or computer code.

Jupyter notebooks are very useful for bioinformatics because they allow text to be mixed together with code for manipulating data, running programs and creating plots.

Text cells and Markdown

The cell you are reading is a text cell. Click on it to make it the currently active/selected cell. The active cell will have thin coloured border around it with a thicker border on the left. If the border is blue the cell is not editable.

Double click on this cell to make it editable.

You should see that it's border turns green. You should also see that it's content changes to plain text in Markdown format. Markdown is a way of writing documentation that is very simple but still allows some basic styling (headers, links, images, code, bold, italics, equations, quotes)

Code cells and the Bash kernel

The text you type into code cells should consist of valid commands that can be interpreted by the notebook's kernel. A notebook's kernel is the engine it uses to evaluate code cells. This notebook is running the Bash kernel. This means that when you run code cells they will be interpreted as if you typed the same text at the unix command prompt.

Jupyter notebooks support many types of kernels including Python, R and Bash which are particularly useful for bioinformatics.

Note: You can tell which kernel a notebook is running by looking at the kernel indicator in the top right corner.

Running cells

The notebook will not actually run your cells until you tell it to. You can do this by first selecting the cell and then using the menu to select Cell -> Run Cells.

The cell immediately below this one is a code cell.

The ls command in this cell should be familiar to you. Try running it.

Try double-clicking on a text cell to set it into edit mode. Then run the text cell. When text cells are run they aren't evaluated by the kernel but are rendered for display in your web browser.


In [1]:
ls


autograded_answer_example.png  E2  jupyter_intro.ipynb  setup.sh

IMPORTANT

Run the Setup Code

In order for this notebook to work properly you need to run the cell below before doing anything else. This will load custom functions and settings required to make the self assessment exercises work.

If you restart your kernel you will also need to rerun the setup code

Don't use the cd command

The answers to all self assessment exercises assume that you don't change your directory from the default. You shouldn't ever need to use the cd command to answer an exercise.


In [2]:
# Essential Setup Code : Must be run first.
wget https://www.dropbox.com/s/zqgacjshllprdcc/setup.sh?dl=0 -O setup.sh
source ./setup.sh


--2017-06-28 06:58:59--  https://www.dropbox.com/s/zqgacjshllprdcc/setup.sh?dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.82.1
Connecting to www.dropbox.com (www.dropbox.com)|162.125.82.1|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://dl.dropboxusercontent.com/content_link/w2958rgSXwoFZSm5xESk6MpKuxMUEqfeGDYcwcIDxkTifYLJKt7lPVytttatfCKe/file [following]
--2017-06-28 06:59:02--  https://dl.dropboxusercontent.com/content_link/w2958rgSXwoFZSm5xESk6MpKuxMUEqfeGDYcwcIDxkTifYLJKt7lPVytttatfCKe/file
Resolving dl.dropboxusercontent.com (dl.dropboxusercontent.com)... 162.125.7.6
Connecting to dl.dropboxusercontent.com (dl.dropboxusercontent.com)|162.125.7.6|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1534 (1.5K) [text/x-sh]
Saving to: 'setup.sh’

setup.sh            100%[===================>]   1.50K  --.-KB/s    in 0s      

2017-06-28 06:59:03 (82.4 MB/s) - 'setup.sh’ saved [1534/1534]

Setup Done

Keyboard shortcuts

Using the mouse all the time to run cells can be very tedious. To save time, select the cell type (command)-(enter). Depending on your keyboard this combination may be slightly different (eg (control)-(return) on a mac).

Exercise 1

Your task: Write a command to list the contents of the current directory

This is deliberately easy (the answer is ls) so that you can focus on understanding the self-assessment mechanism.

Follow these steps for every exercise:

  1. Read the text describing the problem and figure out your answer. Feel free to create new cells to experiment with commands until you get things right. You might also want to use the terminal on the tutorial site
  2. Enter your code into the answer cell. The answer cell contains a blank space for you to put your answer but it is important that you don't change the other code in the cell. Eg. like below
  3. Be sure to run your answer cell. This will make your answer accessible to the test cell
  4. Run the test cell to check your answer. The test cell is locked and always comes immediately after the answer cell.

In [3]:
e1_answer(){
### BEGIN SOLUTION
ls
### END SOLUTION
}

In [4]:
test_e1


Your answer is correct

In [5]:
# This code cell is for you to experiment with the ls command (see exercise below)
ls -aFl


total 76
drwxrwxr-x 4 iracooke iracooke  4096 Jun 28 06:59 ./
drwx------ 5 iracooke iracooke  4096 Jun 27 03:24 ../
-rw-rw-r-- 1 iracooke iracooke 26858 Jun 27 06:06 autograded_answer_example.png
drwxrwxr-x 2 iracooke iracooke  4096 Jun 28 06:59 E2/
drwxr-xr-x 2 iracooke iracooke  4096 Jun 27 03:24 .ipynb_checkpoints/
-rw-rw-r-- 1 iracooke iracooke 25768 Jun 28 06:56 jupyter_intro.ipynb
-rw-rw-r-- 1 iracooke iracooke  1534 Jun 28 06:59 setup.sh

Extending the ls command

Use the code cell above and try various optional arguments to the ls command. Eg.

ls -F
ls -1
ls -a
ls -R
ls -S

Now try printing the help text for the ls command

ls --help

Search through the help and look for each of the options in the commands above. Use the description for each option to understand the output you see when you run each command.

Note: Another way to bring up the help is the man command but unfortunately this doesn't work well in a jupyter notebook

Exercise 2

Your task: Write a command to list the contents of your current directory (not including hidden files) in reverse order


In [6]:
e2_answer(){
### BEGIN SOLUTION
ls -r
### END SOLUTION
}

In [7]:
test_e2


Your answer is correct

Exercise 3

Your task: Write a command to list the contents of the E2 directory


In [8]:
e3_answer(){
### BEGIN SOLUTION
ls E2
### END SOLUTION
}

In [9]:
test_e3


Your answer is correct

Exercise 4

Your task: Write a command to list the contents of the E2 directory with one item per line and sorted by reverse size


In [10]:
e4_answer(){
### BEGIN SOLUTION
ls -1 -Sr E2
### END SOLUTION
}

In [11]:
test_e4


Your answer is correct

Exercise 5

Your task: Write a command to list the contents of the E2 directory one item per line so that the word HELLO is spelled. Your output should look like the text below

E2/5_H.txt
E2/2_E.txt
E2/3_L.txt
E2/4_L.txt
E2/1_O.txt

Hint 1: You will need to use the wild-card character *. See chapter 13 of the guide for examples.
Hint 2: Look at the sizes of files using ls -l


In [12]:
e5_answer(){
### BEGIN SOLUTION
ls -1 -Sr E2/*.txt
### END SOLUTION
}

In [13]:
test_e5


Your answer is correct

In [14]:
ls --help


Usage: ls [OPTION]... [FILE]...
List information about the FILEs (the current directory by default).
Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.

Mandatory arguments to long options are mandatory for short options too.
  -a, --all                  do not ignore entries starting with .
  -A, --almost-all           do not list implied . and ..
      --author               with -l, print the author of each file
  -b, --escape               print C-style escapes for nongraphic characters
      --block-size=SIZE      scale sizes by SIZE before printing them; e.g.,
                               '--block-size=M' prints sizes in units of
                               1,048,576 bytes; see SIZE format below
  -B, --ignore-backups       do not list implied entries ending with ~
  -c                         with -lt: sort by, and show, ctime (time of last
                               modification of file status information);
                               with -l: show ctime and sort by name;
                               otherwise: sort by ctime, newest first
  -C                         list entries by columns
      --color[=WHEN]         colorize the output; WHEN can be 'always' (default
                               if omitted), 'auto', or 'never'; more info below
  -d, --directory            list directories themselves, not their contents
  -D, --dired                generate output designed for Emacs' dired mode
  -f                         do not sort, enable -aU, disable -ls --color
  -F, --classify             append indicator (one of */=>@|) to entries
      --file-type            likewise, except do not append '*'
      --format=WORD          across -x, commas -m, horizontal -x, long -l,
                               single-column -1, verbose -l, vertical -C
      --full-time            like -l --time-style=full-iso
  -g                         like -l, but do not list owner
      --group-directories-first
                             group directories before files;
                               can be augmented with a --sort option, but any
                               use of --sort=none (-U) disables grouping
  -G, --no-group             in a long listing, don't print group names
  -h, --human-readable       with -l and/or -s, print human readable sizes
                               (e.g., 1K 234M 2G)
      --si                   likewise, but use powers of 1000 not 1024
  -H, --dereference-command-line
                             follow symbolic links listed on the command line
      --dereference-command-line-symlink-to-dir
                             follow each command line symbolic link
                               that points to a directory
      --hide=PATTERN         do not list implied entries matching shell PATTERN
                               (overridden by -a or -A)
      --indicator-style=WORD  append indicator with style WORD to entry names:
                               none (default), slash (-p),
                               file-type (--file-type), classify (-F)
  -i, --inode                print the index number of each file
  -I, --ignore=PATTERN       do not list implied entries matching shell PATTERN
  -k, --kibibytes            default to 1024-byte blocks for disk usage
  -l use a long listing format
  -L, --dereference when showing file information for a symbolic
                               link, show information for the file the link
                               references rather than for the link itself
  -m fill width with a comma separated list of entries
  -n, --numeric-uid-gid like -l, but list numeric user and group IDs
  -N, --literal print raw entry names (don't treat e.g. control
                               characters specially)
  -o like -l, but do not list group information
  -p, --indicator-style=slash
                             append / indicator to directories
  -q, --hide-control-chars   print ? instead of nongraphic characters
      --show-control-chars   show nongraphic characters as-is (the default,
                               unless program is 'ls' and output is a terminal)
  -Q, --quote-name           enclose entry names in double quotes
      --quoting-style=WORD   use quoting style WORD for entry names:
                               literal, locale, shell, shell-always,
                               shell-escape, shell-escape-always, c, escape
  -r, --reverse              reverse order while sorting
  -R, --recursive            list subdirectories recursively
  -s, --size                 print the allocated size of each file, in blocks
  -S                         sort by file size, largest first
      --sort=WORD            sort by WORD instead of name: none (-U), size (-S),
                               time (-t), version (-v), extension (-X)
      --time=WORD            with -l, show time as WORD instead of default
                               modification time: atime or access or use (-u);
                               ctime or status (-c); also use specified time
                               as sort key if --sort=time (newest first)
      --time-style=STYLE     with -l, show times using style STYLE:
                               full-iso, long-iso, iso, locale, or +FORMAT;
                               FORMAT is interpreted like in 'date'; if FORMAT
                               is FORMAT1<newline>FORMAT2, then FORMAT1 applies
                               to non-recent files and FORMAT2 to recent files;
                               if STYLE is prefixed with 'posix-', STYLE
                               takes effect only outside the POSIX locale
  -t                         sort by modification time, newest first
  -T, --tabsize=COLS         assume tab stops at each COLS instead of 8
  -u                         with -lt: sort by, and show, access time;
                               with -l: show access time and sort by name;
                               otherwise: sort by access time, newest first
  -U                         do not sort; list entries in directory order
  -v                         natural sort of (version) numbers within text
  -w, --width=COLS           set output width to COLS.  0 means no limit
  -x                         list entries by lines instead of by columns
  -X                         sort alphabetically by entry extension
  -Z, --context              print any security context of each file
  -1                         list one file per line.  Avoid '\n' with -q or -b
      --help display this help and exit
      --version output version information and exit

The SIZE argument is an integer and optional unit (example: 10K is 10*1024).
Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000).

Using colour to distinguish file types is disabled both by default and
with --color=never.  With --color=auto, ls emits colour codes only when
standard output is connected to a terminal.  The LS_COLORS environment
variable can change the settings.  Use the dircolors command to set it.

Exit status:
 0  if OK,
 1  if minor problems (e.g., cannot access subdirectory),
 2  if serious trouble (e.g., cannot access command-line argument).

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
Full documentation at: <http://www.gnu.org/software/coreutils/ls>
or available locally via: info '(coreutils) ls invocation'

Optional

Playing with the command line is the best way to learn.

  1. Try the fortune command.
     fortune
    Run it a few times
  2. Try the cowsay command like this
     cowsay "keyboard good, mouse bad"
  3. To be a bit more faithful to the original we need to make the following change
     cowsay -f sheep "keyboard good, mouse bad"
  4. Now try combining the two commands
     fortune | cowsay

    This introduces a new concept, the pipe operator, |. A pipe allows the output of one command to be used as input for another .We will cover pipes in more detail in workshop 2

  5. Try out various cows. You can find more inside the directory /usr/share/cowsay/cows.
  6. Read the cowsay man page to see if you can change the appearance of cows in other ways.
  7. If you are truly unsatisfied with the default cows you can find more here