The command line allows you to input commands, such as creating folders, deleting and copying files and extracting information from files.
By the end of this notebook, you will...
sed, awk, and grep to find replace text, fetch columns, and find words in filesYou start any terminal session in your "home area". View your "present working directory"
$ pwd
Your default home folder (also called $HOME) is represented by the character alias ~ (tilde)
$ echo ~
Change directory
$ cd ~/Desktop
List all the files in the present working directory using
$ ls
$ ls .
Arguments for unix commands
$ man ls
Creating a folder
$ mkdir data
$ mkdir software
Change directory into data or software (tab complete or use Up and Down). [TAB] means to press the tab key on your keyboard, not to write out the characters.
$ cd da[TAB]
Change back to the parent directory from any subdirectory:
$ cd ..
Create an empty file
$ touch emptyfile.txt
Write some text in it
$ echo "hello world" > emptyfile.txt
Look at the contents of the file with cat
$ cat emptyfile.txt
Append to your file with >>
$ echo "I love bioinformatics" >> emptyfile.txt
Count the number of lines with wc -l
$ wc -l emptyfile.txt
Move or rename a file
$ mv emptyfile.txt notempty.txt
Copy a file
$ cp notempty.txt deleteme.txt
Delete a file
$ rm deleteme.txt
Create a shortcut aka a pointer ("symbolic link" or "symlink") to a file
$ ln -s notempty.txt pointer
Go to the UCSC Table browser and choose "position" to pick a single chromosome (chr10+) to save the knownGene table with "all fields from selected table" (should be the default) as knownGene.txt.
Move knownGene.txt to Desktop using the command line. What is the command?
less and more are other commands (besides cat) you can use to look at the contents of files. How are they different?
See what's in the first n lines (in this case 10)
$ head -n 10 knownGene.txt
How many lines are in the file?
$ wc -l knownGene.txt
What if we tried the wc command with no arguments? What does that output? Check man wc to read about it.
$ wc knownGene.txt
What's in the last n lines?
$ tail -n 10 knownGene.txt
Extract specific columns. In this first line, we're extracting column 1
$ cut -f 1 knowngene.txt
Whoa that's a lot of output. Let's save it to a file
$ cut -f 1 knownGene.txt > column1.txt
cut and pasteknownGene.txt.paste" command below to glue the two columns together into a new file$ paste column1.txt column2.txt > 2columns.txt
Let's build this command stepwise. Let's say the fourth column has the gene start nucleotide of knownGene.txt.
$ cut -f 4 knownGene.txt
Whoa that's a lot of output. Let's shorten it by using the "|" character (pronounced "pipe") to send the output to head
$ cut -f 4 knownGene.txt | head
We'll use Regular Expressions "regex" (cheatsheet) to search for three 1's in a row in this cut up column.
$ cut -f 4 knownGene.txt | grep 111 | head
We can do the same thing using the {} notation.
$ cut -f 4 knownGene.txt | grep '1{3}' | head
If that didn't give any output, you may need to add the -E flag for "extended regular expressions because grep is a really old program and it was pretty good when it first started, so vanilla grep is okay. But then Perl came along and made regular expressions really awesome, but, since all the programs had to be backwards compatible, they made it so you can access Perl regular expressions with -E.
$ cut -f 4 knownGene.txt | grep -E '1{3}' | head
We can then pipe this to wc -l to count line numbers!
$ cut -f 4 knownGene.txt | grep -E '1{3}' | wc -l
This shows us how many genes have a "111" in the transcript start.
How many genes have "222" in the transcript END?
Let's build this command stepwise. Let's pretend the fourth column has the exon count of knownGene.txt.
$ grep -c 'REGEXSEARCHTERM' target.txt
How many genes have 1...max # exons? We can use the "pipe" (|) to send the output of one command into the next. Let's build this command stepwise.
$ cut -f 1 filename.txt
$ cut -f 1 filename.txt | sort -n
$ cut -f 1 filename.txt | sort -n | uniq -c
Which user are you logged in as?
$ whoami
What groups is that user associated with?
$ groups
What is the ownership status of all files in my current directory? Here's some help for interpreting output.
$ ls -lrt
Changing permissions
$ chmod 775
The three digits indicate the affected user subset:
The value indicates visibility encoded as a sum of octal numbers. For example, read + execute = 2 + 3 = 5. 775 or 755 are the most common permissions setups because then you the owner can do everything to your files, and maybe the rest of the group can, but the "all" or "world" can only read and execute your programs, but not overwrite them.
| # | Permission | rwx |
|---|---|---|
| 7 | read, write and execute | rwx |
| 6 | read and write | rw- |
| 5 | read and execute | r-x |
| 4 | read only | r-- |
| 3 | write and execute | -wx |
| 2 | write only | -w- |
| 1 | execute only | --x |
| 0 | none | --- |
Changing Files Recursively
$ chmod -R 777 Directory/
$ chmod -R o-rwx ~/
Changing executable nature of files
$ chmod +x
awkawk is a command-line tool to
Another way to extract all lines
$ awk -F "\t" '{print;}' knownGene.txt
What if we only wanted one column
$ awk -F "\t" '{print $8;}' knownGene.txt | head
What if we wanted the length of genes?
$ awk -F "\t" '{ len = $5-$4;} {print len;}' knownGene.txt | head
Length of all genes summed?
$ awk -F "\t" '{ len = $5-$4;} {tot = tot + len;} END {print tot;}' knownGene.txt | head
Don't process the header line (introduction to conditionals)
$ awk -F "\t" '{
if (FNR==1){
next
};
tot = tot + $5-$4};
END {print tot;}' knownGene.txt | head
What if you only want the total length of genes in chromosome 1?
$ awk -F "\t" '{
if (FNR==1){
next;
};
chr =$2;
if (chr == "chr1") {
tot = tot + $5-$4;
}
};
END {print tot;}' knownGene.txt
The purpose of this homework is to familiarize yourself with the different commands available on all UNIX machines. This is not meant to teach you how to use each command at an expert level, but rather to show you that these commands exist.
For each of the commands below, do the following:
$ programname$ programname -h-h is (generally) a universal short version flag for "help me!!! I don't know what's going on!!" Though sometimes that doesn't work either ...$ programname --help-h$ man programnameman is for "manual" and will bring up the manual pages for that commandTips:
q (for "quit")echouptimemanhistorywhatiswhichcd.."pwdlsmkdirtouchtouch a file that's already there, what changes about it? Try using ls -lcpmvrmrm -rf will force remove directories recursively - use with caution!! There is no "trash bin" so once it's gone - it's gone.catheadtaillessmoreless and more different?wcpstopkillkillallkill and killall different?gzipgunzipzcatdfdudf and du different? Try using -h (human readable) with themaliaslnalias and ln different?sedgrepawkexpandcutpastesortuniqchmodwhoamigroupsPlus 10 commands of your choice. Here is a nice list of all possible commands.