Unix Basics

The command line allows you to input commands, such as creating folders, deleting and copying files and extracting information from files.

By the end of this notebook, you will...

  • Have (maybe installed) and used the command line on your personal laptop
  • Created files and folders, and looked around a directory, using just the keyboard
  • Used command line tools like sed, awk, and grep to find replace text, fetch columns, and find words in files

Opening the command line.

  • From Mac OS
    • Applications folder, open Utilities and launch Terminal
  • From Linux machine
    • Applications, Accessories and launch Terminal
  • From Windows

Getting Started: Navigating your folders and files

You start any terminal session in your "home area". View your "present working directory"

$ pwd

Your default home folder (also called $HOME) is represented by the character alias ~ (tilde)

$ echo ~

Change directory

$ cd ~/Desktop

List all the files in the present working directory using

$ ls
$ ls .

Arguments for unix commands

$ man ls

Creating a folder

$ mkdir data
$ mkdir software

Change directory into data or software (tab complete or use Up and Down). [TAB] means to press the tab key on your keyboard, not to write out the characters.

$ cd da[TAB]

Change back to the parent directory from any subdirectory:

$ cd ..

Create an empty file

$ touch emptyfile.txt

Write some text in it

$ echo "hello world" > emptyfile.txt

Look at the contents of the file with cat

$ cat emptyfile.txt

Append to your file with >>

$ echo "I love bioinformatics" >> emptyfile.txt

Exercise: look at the file

Count the number of lines with wc -l

$ wc -l emptyfile.txt

Move or rename a file

$ mv emptyfile.txt notempty.txt

Copy a file

$ cp notempty.txt deleteme.txt

Delete a file

$ rm deleteme.txt

Create a shortcut aka a pointer ("symbolic link" or "symlink") to a file

$ ln -s notempty.txt pointer

File Manipulation: Getting some data from UCSC's Table Browser

Go to the UCSC Table browser and choose "position" to pick a single chromosome (chr10+) to save the knownGene table with "all fields from selected table" (should be the default) as knownGene.txt.

Exercise 1

Move knownGene.txt to Desktop using the command line. What is the command?

Exercise 2

less and more are other commands (besides cat) you can use to look at the contents of files. How are they different?

See what's in the first n lines (in this case 10)

$ head -n 10 knownGene.txt

How many lines are in the file?

$ wc -l knownGene.txt

What if we tried the wc command with no arguments? What does that output? Check man wc to read about it.

$ wc knownGene.txt

What's in the last n lines?

$ tail -n 10 knownGene.txt

Extract specific columns. In this first line, we're extracting column 1

$ cut -f 1 knowngene.txt

Whoa that's a lot of output. Let's save it to a file

$ cut -f 1 knownGene.txt > column1.txt

Exercise 3: cut and paste

  1. Cut the second column of knownGene.txt.
  2. Use the "paste" command below to glue the two columns together into a new file
$ paste column1.txt column2.txt > 2columns.txt

Exercise 4: How many genes start on a nucleotide count that ends in 222?

Let's build this command stepwise. Let's say the fourth column has the gene start nucleotide of knownGene.txt.

$ cut -f 4 knownGene.txt

Whoa that's a lot of output. Let's shorten it by using the "|" character (pronounced "pipe") to send the output to head

$ cut -f 4 knownGene.txt | head

We'll use Regular Expressions "regex" (cheatsheet) to search for three 1's in a row in this cut up column.

$ cut -f 4 knownGene.txt | grep 111 | head

We can do the same thing using the {} notation.

$ cut -f 4 knownGene.txt | grep '1{3}' | head

If that didn't give any output, you may need to add the -E flag for "extended regular expressions because grep is a really old program and it was pretty good when it first started, so vanilla grep is okay. But then Perl came along and made regular expressions really awesome, but, since all the programs had to be backwards compatible, they made it so you can access Perl regular expressions with -E.

$ cut -f 4 knownGene.txt | grep -E '1{3}' | head

We can then pipe this to wc -l to count line numbers!

$ cut -f 4 knownGene.txt | grep -E '1{3}' | wc -l

This shows us how many genes have a "111" in the transcript start.

How many genes have "222" in the transcript END?

Exercise 5: How many genes have 3 exons?

Let's build this command stepwise. Let's pretend the fourth column has the exon count of knownGene.txt.

$ grep -c 'REGEXSEARCHTERM' target.txt

How many genes have 1...max # exons? We can use the "pipe" (|) to send the output of one command into the next. Let's build this command stepwise.

$ cut -f 1 filename.txt
$ cut -f 1 filename.txt | sort -n
$ cut -f 1 filename.txt | sort -n | uniq -c

File permissions

Which user are you logged in as?

$ whoami

What groups is that user associated with?

$ groups

What is the ownership status of all files in my current directory? Here's some help for interpreting output.

$ ls -lrt

Changing permissions

$ chmod 775

The three digits indicate the affected user subset:

  • Front = Owner
  • Middle = Group
  • Rear = All Users

The value indicates visibility encoded as a sum of octal numbers. For example, read + execute = 2 + 3 = 5. 775 or 755 are the most common permissions setups because then you the owner can do everything to your files, and maybe the rest of the group can, but the "all" or "world" can only read and execute your programs, but not overwrite them.

# Permission rwx
7 read, write and execute rwx
6 read and write rw-
5 read and execute r-x
4 read only r--
3 write and execute -wx
2 write only -w-
1 execute only --x
0 none ---

Changing Files Recursively

$ chmod -R 777 Directory/
$ chmod -R o-rwx ~/

Changing executable nature of files

$ chmod +x

Introduction to awk

awk is a command-line tool to

Another way to extract all lines

$ awk -F "\t" '{print;}' knownGene.txt

What if we only wanted one column

$ awk -F "\t" '{print $8;}' knownGene.txt  | head

What if we wanted the length of genes?

$ awk -F "\t" '{ len = $5-$4;} {print len;}' knownGene.txt | head

Length of all genes summed?

$ awk -F "\t" '{ len = $5-$4;} {tot = tot + len;} END {print tot;}' knownGene.txt | head

Don't process the header line (introduction to conditionals)

$ awk -F "\t" '{
if (FNR==1){
    next
};
tot = tot + $5-$4};
END {print tot;}' knownGene.txt | head

What if you only want the total length of genes in chromosome 1?

$ awk -F "\t" '{
    if (FNR==1){
        next;
        };
    chr =$2;
    if (chr == "chr1") {
        tot = tot + $5-$4;
    }
};
END {print tot;}' knownGene.txt

Getting comfortable with unix commands

The purpose of this homework is to familiarize yourself with the different commands available on all UNIX machines. This is not meant to teach you how to use each command at an expert level, but rather to show you that these commands exist.

For each of the commands below, do the following:

  1. Figure out how to run the command by getting its usage help. The four ways you can get help are below:
    1. $ programname
      1. Without any arguments, this will call the program. Sometimes this will give you usage information, sometimes it will result in an error.
    2. $ programname -h
      1. The -h is (generally) a universal short version flag for "help me!!! I don't know what's going on!!" Though sometimes that doesn't work either ...
    3. $ programname --help
      1. The expanded version of -h
    4. $ man programname
      1. man is for "manual" and will bring up the manual pages for that command
  2. Get it to run with no errors. This may mean you need to give it a file name or a command name
  3. Run it again with no errors, with at least one option flag
  4. (Optional) Search google or StackOverflow (programmer "Yahoo Answers") for more information about the command, like "when would you actually use programname?" You may need to add "unix" to your search terms

Tips:

  • If you're in a program and want to exit, try these steps:
    1. Press q (for "quit")
    2. Press Control-C or Control-Z or Control-X

List of commands

  1. echo
  2. uptime
  3. Help with commands and programs
    1. man
    2. history
    3. whatis
    4. which
  4. Working with files and directories
    1. cd
      1. How can you go UP a directory? Hint: ".."
    2. pwd
    3. ls
    4. mkdir
    5. touch
      1. If you touch a file that's already there, what changes about it? Try using ls -l
    6. cp
    7. mv
    8. rm
      1. How can you remove a directory?
      2. rm -rf will force remove directories recursively - use with caution!! There is no "trash bin" so once it's gone - it's gone.
  5. Looking at files
    1. cat
    2. head
    3. tail
    4. less
    5. more
      1. How are less and more different?
  6. Getting information about files
    1. wc
  7. Looking at which programs are running and turning them off
    1. ps
    2. top
    3. kill
    4. killall
      1. How are kill and killall different?
  8. Working with compressed files
    1. gzip
    2. gunzip
    3. zcat
  9. Getting information about available or occupied disk space
    1. df
    2. du
      1. How are df and du different? Try using -h (human readable) with them
  10. Making shortcuts for files and programs
    1. alias
    2. ln
      1. How are alias and ln different?
  11. Manipulating and searching files
    1. sed
    2. grep
    3. awk
    4. expand
    5. cut
    6. paste
    7. sort
    8. uniq
  12. Dealing with users and permissions
    1. chmod
    2. whoami
    3. groups

Plus 10 commands of your choice. Here is a nice list of all possible commands.