The commandline or 'terminal' is an interface you can use to run programs and analyse your data. If this is your first time using one it will seem pretty daunting at first but, with just a few commands, you'll start to see how it helps you to get things done much quicker. You're probably more familiar with software which uses a graphical user interface, also known as a GUI; unfortunately most of the best bioinformatics software has not been programed with this capability.
In [ ]:
echo "cd $PWD"
It should say something like cd /home/manager/pathogen-informatics-training/Notebooks/Unix/basic
. Type whatever it said into your terminal and press Enter
.
Then continue through the course, entering any commands that you encounter into your terminal window.
However, before getting started there are some general points to remember that will make your life easier:
ls
is not the same as typing LS
.Directories are the Unix equivalent of folders on a PC or Mac. They are organised in a hierarchy, so directories can have sub-directories and so on. Directories are very useful for organising your work and keeping your account tidy - for example, if you have more than one project, you can organise the files for each project into different directories to keep them separate. You can think of directories as rooms in a house. You can only be in one room (directory) at a time. When you are in a room you can see everything in that room easily. To see things in other rooms, you have to go to the appropriate door and crane your head around. Unix works in a similar manner, moving from directory to directory to access files. The location or directory that you are in is referred to as the current working directory.
If there is a file called genome.seq
in the dna
directory its location or full pathname can be expressed as /nfs/dna/genome.seq
.
The command pwd
stands for print working directory. A command (also known as a program) is something which tells the computer to do something. Commands are therefore often the first thing that you type into the terminal (although we'll show you some advanced exceptions to this rule later).
As described above, directories are arranged in a hierarchical structure. To determine where you are in the hierarchy you can use the pwd
command to display the name of the current working directory. The current working directory may be thought of as the directory you are in, i.e. your current position in the file-system tree.
To find out where you are, type this into your terminal.
In [ ]:
pwd
Remember that Unix is case sensitive, PWD
is not the same as pwd
.
pwd
will list each of the folders you would need to navigate through to get from the root
of the file system to your current directory. This is sometimes refered to as your 'absolute path' to distinguish that it gives a complete route rather than a 'relative path' which tells you how to get from one folder to another. More on that shortly.
In [ ]:
ls
You should see that there are 4 items in this directory.
To list the contents of a directory with extra information about the items type:
In [ ]:
ls -l
Instead of printing out a simple list, this should have printed out additional information about each file. Note that there is a space between the command ls
and the -l
. There is no space between the dash and the letter l.
-l
is our first example of an option. Many commands have options which change their behaviour but are not always required.
What do each of the columns represent?
To list all contents of a directory including hidden files and directories type:
In [ ]:
ls -a -l
This is an example of a command which can take multiple options at the same time. Different commands take different options and sometimes (unhelpfully) use the same letter to do different things.
How many hidden files and directories are there?
Try the same command but with the -h
option:
In [ ]:
ls -alh
You'll also notice that we've combined -a -l -h
into what appears to be a single -alh
option. It's almost always ok to do this for options which are made up of a single dash followed by a single letter.
What does the -h
option do?
To list the contents of the directory called Pfalciparum with extra information type:
In [ ]:
ls -l Pfalciparum/
In this case we gave ls
an argument describing the relative path to the directory Pfalciparum
from our current working directory. Arguments are very similar to options (and I often use the terms interchangably) but they often refer to things which are not prefixed with dashes.
How many files are there in this directory?
Typing out file names is really boring and you're likely to make typos which will at best make your command fail with a strange error and at worst overwrite some of your carefully crafted analysis. Tab completion is a trick which normally reduces this risk significantly.
Instead of typing out ls Pfalciparum/
, try typing ls P
and then press the tab
character (instead of Enter
). The rest of the folder name should just appear. If you have two folders with simiar names (e.g. my_awesome_scripts/
and my_awesome_results/
) then you might need to give your terminal a bit of a hand to work out which one you want. In this case you would type ls -l m
, when you press tab
the terminal would read ls -l my_awesome_
, you could then type s
followed by another tab
and it would work out that you meant my_awesome_scripts/
Every file and directory have a set of permissions which restrict what can be done with a file or directory.
The first set of permissions (characters 2,3,4) refer to what the owner of the file can do, the second set of permissions (5,6,7) refers to what members of the Unix group can do and the third set of permissions (8,9,10) refers to what everyone else can do.
The command cd
stands for change directory.
The cd
command will change the current working directory to another, in other words allow you to move up or down in the directory hierarchy.
To move into the Styphi
directory type the following. Note, you'll remember this more easily if you type this into the terminal rather than copying and pasting. Also remember that you can use tab completion to save typing all of it.
In [ ]:
cd Styphi/
Now use the pwd
command to check your location in the directory hierarchy and the ls
command to list the contents of this directory.
In [ ]:
pwd
ls
You should see that there are 3 files called:
Styphi.fa
, Stypi.gff
, Styphi.noseq.gff
In [ ]:
ls .
In [ ]:
ls ..
In [ ]:
ls ~
Try moving between directories a few times. Can you get into the Pfalciparum/
and then back into Styphi/
?
To copy the file Styphi.gff
to a new file called StyphiCT18.gff
type:
In [ ]:
cp Styphi.gff StyphiCT18.gff
Use ls
to check the contents of the current directory for the copied file:
In [ ]:
ls
The mv
command stand for move.
The mv
command will move a file from one location to another. This moves the file rather than copies it, therefore you end up with only one file rather than two. When using the command, the path or pathname is used to tell Unix where to find the file. You refer to files in other directories by using the list of hierarchical names separated by slashes. For example, the file called bases in the directory genome has the path genome/bases. If no path is specified, Unix assumes that the file is in the current working directory.
To move the file StyphiCT18.gff
from the current directory to the directory above type:
In [ ]:
mv StyphiCT18.gff ..
Use the ls
command to check the contents of the current directory and the directory above to see that StyphiCT18.gff
has been moved.
In [ ]:
ls
In [ ]:
cd ..
ls
In [ ]:
rm StyphiCT18.gff
Use the ls
command to check the contents of the current directory to see that the file StyphiCT18.gff
has been removed.
In [ ]:
ls
Unfortunately there is no "recycle bin" on the command line to recover the file from, so you have to be careful.
To find all files in the current directory and all its subdirectories that end with the suffix gff:
In [ ]:
find . -name "*.gff"
How many gff files did you find?
To find all the subdirectories contained in the current directory type:
In [ ]:
find . -type d
How many subdirectories did you find?
These are just two basic examples of the find command but it is possible to use the following find options to search in many other ways:
-mtime
: search files by modifying date -atime
: search files by last access date -size
: search files by file size-user
: search files by user they belong toMany people panic when they are confronted with a Unix prompt! Don’t! All the commands you need to solve these exercises are provided above and don't be afraid to make a mistake. If you get lost ask a demonstrator. If you are a person skilled at Unix, be patient this is only a short exercise.
To begin, open a terminal window and navigate to the basic
directory in the Unix
directory (remember use the Unix command cd
) and then complete the exercise below.
ls
command to show the contents of the basic
directory.Pfalciparum
directory?Pfalciparum
directory?Pfalciparum
directory.fasta
directory?Pfalciparum.bed
in the Pfalciparum
directory into the annotation
directory.Pfalciparum
to the fasta
directory.fasta
directory?find
command to find all gff files in the Unix
directory, how many files did you find?find
command to find all the fasta files in the Unix
directory, how many files did you find?