Basic Unix

The Commandline

The commandline or 'terminal' is an interface you can use to run programs and analyse your data. If this is your first time using one it will seem pretty daunting at first but, with just a few commands, you'll start to see how it helps you to get things done much quicker. You're probably more familiar with software which uses a graphical user interface, also known as a GUI; unfortunately most of the best bioinformatics software has not been programed with this capability.

Getting started

Before we get started, let's check that you're in the right place. Please click on the cell below and press the crtl and Enter keys. If you're not sure what this command does, don't worry for now; we'll explain it in more detail later.


In [ ]:
echo "cd $PWD"

It should say something like cd /home/manager/pathogen-informatics-training/Notebooks/Unix/basic. Type whatever it said into your terminal and press Enter.

Then continue through the course, entering any commands that you encounter into your terminal window.

However, before getting started there are some general points to remember that will make your life easier:

  • Unix is case sensitive - typing ls is not the same as typing LS.
  • Often when you have problems with Unix, it is due to a spelling mistake. Check that you have not missed or added a space. Pay careful attention when typing commands across a couple of lines.

Files and directories

Directories are the Unix equivalent of folders on a PC or Mac. They are organised in a hierarchy, so directories can have sub-directories and so on. Directories are very useful for organising your work and keeping your account tidy - for example, if you have more than one project, you can organise the files for each project into different directories to keep them separate. You can think of directories as rooms in a house. You can only be in one room (directory) at a time. When you are in a room you can see everything in that room easily. To see things in other rooms, you have to go to the appropriate door and crane your head around. Unix works in a similar manner, moving from directory to directory to access files. The location or directory that you are in is referred to as the current working directory.

If there is a file called genome.seq in the dna directory its location or full pathname can be expressed as /nfs/dna/genome.seq.

pwd - find where you are

The command pwd stands for print working directory. A command (also known as a program) is something which tells the computer to do something. Commands are therefore often the first thing that you type into the terminal (although we'll show you some advanced exceptions to this rule later).

As described above, directories are arranged in a hierarchical structure. To determine where you are in the hierarchy you can use the pwd command to display the name of the current working directory. The current working directory may be thought of as the directory you are in, i.e. your current position in the file-system tree.

To find out where you are, type this into your terminal.


In [ ]:
pwd

Remember that Unix is case sensitive, PWD is not the same as pwd.

pwd will list each of the folders you would need to navigate through to get from the root of the file system to your current directory. This is sometimes refered to as your 'absolute path' to distinguish that it gives a complete route rather than a 'relative path' which tells you how to get from one folder to another. More on that shortly.

ls - list the contents of a directory

The command ls stands for list. The ls command can be used to list the contents of a directory.

To list the contents of your current working directory type:


In [ ]:
ls

You should see that there are 4 items in this directory.

To list the contents of a directory with extra information about the items type:


In [ ]:
ls -l

Instead of printing out a simple list, this should have printed out additional information about each file. Note that there is a space between the command ls and the -l. There is no space between the dash and the letter l.

-l is our first example of an option. Many commands have options which change their behaviour but are not always required.

What do each of the columns represent?

To list all contents of a directory including hidden files and directories type:


In [ ]:
ls -a -l

This is an example of a command which can take multiple options at the same time. Different commands take different options and sometimes (unhelpfully) use the same letter to do different things.

How many hidden files and directories are there?

Try the same command but with the -h option:


In [ ]:
ls -alh

You'll also notice that we've combined -a -l -h into what appears to be a single -alh option. It's almost always ok to do this for options which are made up of a single dash followed by a single letter.

What does the -h option do?

To list the contents of the directory called Pfalciparum with extra information type:


In [ ]:
ls -l Pfalciparum/

In this case we gave ls an argument describing the relative path to the directory Pfalciparum from our current working directory. Arguments are very similar to options (and I often use the terms interchangably) but they often refer to things which are not prefixed with dashes.

How many files are there in this directory?

Tab completion

Typing out file names is really boring and you're likely to make typos which will at best make your command fail with a strange error and at worst overwrite some of your carefully crafted analysis. Tab completion is a trick which normally reduces this risk significantly.

Instead of typing out ls Pfalciparum/, try typing ls P and then press the tab character (instead of Enter). The rest of the folder name should just appear. If you have two folders with simiar names (e.g. my_awesome_scripts/ and my_awesome_results/) then you might need to give your terminal a bit of a hand to work out which one you want. In this case you would type ls -l m, when you press tab the terminal would read ls -l my_awesome_, you could then type s followed by another tab and it would work out that you meant my_awesome_scripts/

File permissions

Every file and directory have a set of permissions which restrict what can be done with a file or directory.

  • Read (r): permission to read from a file/directory
  • Write (w): permission to modify a file/directory
  • Execute (x): Tells the operating system that the file contains code for the computer to run, as opposed to a file of text which you open in a text editor.

The first set of permissions (characters 2,3,4) refer to what the owner of the file can do, the second set of permissions (5,6,7) refers to what members of the Unix group can do and the third set of permissions (8,9,10) refers to what everyone else can do.

cd - change current working directory

The command cd stands for change directory.

The cd command will change the current working directory to another, in other words allow you to move up or down in the directory hierarchy.

To move into the Styphi directory type the following. Note, you'll remember this more easily if you type this into the terminal rather than copying and pasting. Also remember that you can use tab completion to save typing all of it.


In [ ]:
cd Styphi/

Now use the pwd command to check your location in the directory hierarchy and the ls command to list the contents of this directory.


In [ ]:
pwd
ls

You should see that there are 3 files called: Styphi.fa, Stypi.gff, Styphi.noseq.gff

Tips

There are some short cuts for referring to directories:

  • . Current directory (one full stop)
  • .. Directory above (two full stops)
  • ~ Home directory (tilda)
  • / Root of the file system (like C:\ in Windows)

Try the following commands, what do they do?


In [ ]:
ls .

In [ ]:
ls ..

In [ ]:
ls ~

Try moving between directories a few times. Can you get into the Pfalciparum/ and then back into Styphi/?

cp - copy a file

The command cp stands for copy.

The cp command will copy a file from one location to another and you will end up with two copies of the file.

To copy the file Styphi.gff to a new file called StyphiCT18.gff type:


In [ ]:
cp Styphi.gff StyphiCT18.gff

Use ls to check the contents of the current directory for the copied file:


In [ ]:
ls

mv - move a file

The mv command stand for move.

The mv command will move a file from one location to another. This moves the file rather than copies it, therefore you end up with only one file rather than two. When using the command, the path or pathname is used to tell Unix where to find the file. You refer to files in other directories by using the list of hierarchical names separated by slashes. For example, the file called bases in the directory genome has the path genome/bases. If no path is specified, Unix assumes that the file is in the current working directory.

To move the file StyphiCT18.gff from the current directory to the directory above type:


In [ ]:
mv StyphiCT18.gff ..

Use the ls command to check the contents of the current directory and the directory above to see that StyphiCT18.gff has been moved.


In [ ]:
ls

In [ ]:
cd ..
ls

rm - delete a file

The command rm stands for remove.

The rm command will delete a file permanently from your computer so take care!

To remove the copy of the S. typhi file, called StyphiCT18.gff type:


In [ ]:
rm StyphiCT18.gff

Use the ls command to check the contents of the current directory to see that the file StyphiCT18.gff has been removed.


In [ ]:
ls

Unfortunately there is no "recycle bin" on the command line to recover the file from, so you have to be careful.

find - find a file

The find command can be used to find files matching a given expression. It can be used to recursively search the directory tree for a specified name, seeking files and directories that match the given name.

To find all files in the current directory and all its subdirectories that end with the suffix gff:


In [ ]:
find . -name "*.gff"

How many gff files did you find?

To find all the subdirectories contained in the current directory type:


In [ ]:
find . -type d

How many subdirectories did you find?

These are just two basic examples of the find command but it is possible to use the following find options to search in many other ways:

  • -mtime : search files by modifying date
  • -atime : search files by last access date
  • -size : search files by file size
  • -user : search files by user they belong to

Exercises

Many people panic when they are confronted with a Unix prompt! Don’t! All the commands you need to solve these exercises are provided above and don't be afraid to make a mistake. If you get lost ask a demonstrator. If you are a person skilled at Unix, be patient this is only a short exercise.

To begin, open a terminal window and navigate to the basic directory in the Unix directory (remember use the Unix command cd) and then complete the exercise below.

  1. Use the ls command to show the contents of the basic directory.
  2. How many files are there in the Pfalciparum directory?
  3. What is the largest file in the Pfalciparum directory?
  4. Move into the Pfalciparum directory.
  5. How many files are there in the fasta directory?
  6. Copy the file Pfalciparum.bed in the Pfalciparum directory into the annotation directory.
  7. Move all the fasta files in the directory Pfalciparum to the fasta directory.
  8. How many files are there in the fasta directory?
  9. Use the find command to find all gff files in the Unix directory, how many files did you find?
  10. Use the find command to find all the fasta files in the Unix directory, how many files did you find?