Operating systems are suites of programs, which make the computer work. UNIX is one of the oldest, dating back from the 1960s. It has been under constant development, and is a stable, multi-user, multi-tasking system popular for running servers that are used in bioinformatic analysis. There are different flavors of UNIX, the most popular of which is Linux, which is free and open source.
Linux is made up of the kernel, the shell and programs.
The kernel is low-level code that makes everything happen. It controls file access, memory allocation, and all the other tasks the computer needs to do.
The Linux kernel should not be confused with the Python (or R) kernels run in the Jupyter notebooks (see menu). These kernels also handle low-level functions, but specifically for the programming languages. They sit on top of the Linux kernel that works in the background.
The shell interfaces between the user and the kernel. It interprets commands and carries them out. Most frequently you interact with the shell using a command line interface in a terminal. Python notebooks in Jupyter allow you to execute commands on the Linux shell directly. You simply preface a command with an exclamation point:
In [1]:
! echo Hello, World!
This allows you to run system commands and incorporate them into your into your workflow.
How you get to the shell depends a lot on your particular flavor of UNIX, but Jupyter will take care of all our shell needs for the purpose of this exercise.
Most of the time you will be running some kind of program (such as Jupyter) to accomplish what you want. Linux typically comes with a rich suite of build-in programs, which allow you to perform many kinds of useful tasks, from file maniputaltion to list sorting. You can also install others, or write your own, depending on your needs.
Everything in UNIX is either a file or a process. A process is a running program. It gets a unique ID, and the kernel tracks its state. A file is a collection of data that lives on the hard drive. Files are organized into directories (aks folders) in a hierarchical manner.
In [2]:
! tree ..
The tree
command gives us an overview of the underlying file structure, which is hierarchical in nature. In this case we execute it with the path pointing to ../
, which refers to one directory above the current one. We see that there are three folders in the project directory: examples
, data
, ref
and src
. Each of the directories have files, and data
even has a sub-directory. The current notebook is called Introduction to Python.ipynb
that resides in the src
directory. Jupyter executes commands in the same folder where the notebook is, which is why we needed the '..' reference to point to the folder above for a general overview.
Just in case your curious, here is what the other files are:
Dockerfile
controls how the virtual machine is built. It contains all the instructions for configuring the kernel and programs we will use, as well as downloading the dataindex.ipynb
is the landing page, which gives the general overviewLICENSE.txt
explains how all this code can be usedREADME.md
is the file you used to launch the Brinder instance.One of the principal tasks you will perform in the shell is manipulating files and directories.
Files can be copied using the command cp source destination
Let's give it a try. First, we'll examine the current directory using the ls
command, which lists its contents
In [3]:
! ls
Now we can copy a file here from the examples
folder, and look again.
In [4]:
! cp ../examples/Sir\ Robin.txt example.txt
! ls
We moved a file called Sir Robin.txt
to the current folder and also renamed it example.txt
. If we wanted to keep the same name we could have issued the command cp ../examples/Sir\ Robin.txt .
As you can guess .
refers to the current directory, whereas ..
referred to the one above. This is an example of relative file paths, i.e., paths that you can specify relative to other paths. There is actually an absolute file path for every file, which is its unique address on the hard drive, but relative paths allow you to address files in surrounding directories. Using relative paths you (a) save typing and (b) your code works even if you move your project directory elsewhere.
Now, if the file is named Sir Robin.txt
, why did we have to type Sir\ Robin.txt
into the command? The reason is that whitespace is used as the command separator in cp
as in most Linux commands. It is also part of the file name. We can tell the shell to treat this space as part of the file name, by using the backslash, which is known as an escape character.
You can move files using the mv
command, in the same manner as copying them:
In [5]:
! mv example.txt example.txt.bak
!ls
In [6]:
%%bash
cp ../examples/Sir\ Robin.txt example.txt
mkdir myDir
cd myDir
cp ../example.* .
ls
This code snipped introduces a few new things. First, we see an IPython magic, a keyword that allows special commands. In this case %%bash
executes the rest of the cell in the Linux shell, which is more convenient than typing !
before every command, if there are many of them. Second, we see the appearance of a wildcard caracter *
. Wildcards allow us to potentially specify multiple files at once. In this case we ask cp
to copy everything starting with the word example
into the current directory. Another useful wildcard is ?
, which matches any character.
In [7]:
! cp example.txt Example.txt
! ls ?xample.txt
cd somedir
in a cell block, the next block will not be started in somedir
.You see that the filenames generally have the form *.*
, with the base name before the dot and the extension after the dot. The extension specifies the file type. There are standard extensions, such as txt
, which specifies text files. In bioinformatics many file types have standard extension types, which we'll deal with later.
We can remove files and folders using the rm
and rmdir
commands, respectively. Only empty directories can be removed, so we need to delete their contents first.
In [8]:
%%bash
rm myDir/*
rmdir myDir
ls
In [9]:
!cat example.txt
Often you will be dealing with large files, and you only want to diplay a small part of them, say the beginning or the end
In [10]:
! head -5 example.txt
In [11]:
! tail -5 example.txt
In [12]:
! grep Robin example.txt
! grep -c Robin example.txt
In [13]:
! head -5 example.txt > example2.txt
! cat example2.txt
You can append to a file using the >>
characters
In [14]:
! head -5 example.txt >> example2.txt
! cat example2.txt
Many commands can be linked together by the pipe, represented by |
processing the output that would usually be printed to screen (otherwise known as standard output).
In [15]:
! grep Robin example.txt | sort
In this example, we take the output of grep
and pass it to the sort
program, which then sorts them in alphabetical order.
Follow the link to the exercise worksbook
Continue on to Introduction to Python