Unix is the standard operating system on most large computer systems in scientific research, in the same way that Microsoft Windows is the dominant operating system on desktop PCs.
Unix and MS Windows both perform the important job of managing the computer's hardware (screen, keyboard, mouse, hard disks, network connections, etc...) on your behalf. They also provide you with tools to manage your files and to run application software. They both offer a graphical user interface (desktop). These desktop interfaces look different between the operating systems, use different names for things (e.g. directory versus folder) and have different images but they mostly offer the same functionality.
Unix is a powerful, secure, robust and stable operating system which allows dozens of people to run programs on the same computer at the same time. This is why it is the preferred operating system for large-scale scientific computing. It runs on all kinds of machines, from mobile phones (Android), desktop PCs... to supercomputers.
Increasingly, the output of biological research exists as in silico data, usually in the form of large text files. Unix is particularly suitable for working with such files and has several powerful and flexible commands that can be used to process and analyse this data. One advantage of learning Unix is that many of the commands can be combined in an almost unlimited fashion. So if you can learn just six Unix commands, you will be able to do a lot more than just six things.
Unix contains hundreds of commands, but to conduct your analysis you will probably only need 10 or so to achieve most of what you want to do. In this tutorial we will introduce you to some basic Unix commands followed by some more advanced commands and provide examples of how they can be used in bioinformatics analyses.
This tutorial consists of two sections, Introduction to UNIX and Advanced UNIX for Bioinformatics. By the end of the first section you can expect to be able to:
By the end of the second section you can expect to be able to:
Introduction to UNIX comprises the following sections:
Advanced UNIX for Bioinformatics comprises the following sections:
This tutorial was created by Jacqui Keane and Martin Hunt.
You can run the commands in this tutorial either directly from the Jupyter notebook (if using Jupyter), or by typing the commands in your terminal window.
If you are using Jupyter, command cells (like the one below) can be run by selecting the cell and clicking Cell -> Run from the menu above or using ctrl Enter to run the command. Let's give this a try by printing our working directory using the pwd command and listing the files within it. Run the commands in the two cells below.
In [ ]:
pwd
In [ ]:
ls -l
You can also follow this tutorial by typing all the commands you see into a terminal window. This is similar to the "Command Prompt" window on MS Windows systems, which allows the user to type DOS commands to manage files.
To get started, select the cell below with the mouse and then either press control and enter or choose Cell -> Run in the menu at the top of the page.
In [ ]:
echo cd $PWD
Now open a new terminal on your computer and type the command that was output by the previous cell followed by the enter key. The command will look similar to this:
In [ ]:
cd /home/manager/pathogen-informatics-training/Notebooks/Unix/
Now you can follow the instructions in the tutorial from here.
We've also included a cheat sheet. It probably won't make a lot of sense now, but it might be a useful reminder of this module later in the tutorial.
To get started with the tutorial, head to the first section: Basic unix