In this lecture, we'll eschew all things Python and Biology, and focus entirely on the step before either of these: becoming familiar with the command line (or command prompt). By the end of this lecture, you should be able to:
If you've never used a command-line before... Don't be intimidated!
If you're on a Windows machine, you can either:
I have a macOS laptop, an Ubuntu workstation, a bunch of RedHat servers, and a Windows 10 home desktop.
I'm most at home with either macOS or Ubuntu.
It's like learning another language: you'll only get better at it if you immerse yourself in it, even when you don't want to.
Last login: Mon Jan 9 18:36:07 on ttys006 example1:~ squinn$ ls Applications Dropbox Music SpiderOak Hive Desktop Google Drive Pictures metastore_db Documents Library Programming nltk_data Downloads Movies Public rodeo.log example1:~ squinn$
example1:~ squinn$ ls -l total 264 drwx------ 7 squinn staff 238 Oct 23 2015 Applications drwx------+ 59 squinn staff 2006 Jan 9 17:49 Desktop drwx------+ 20 squinn staff 680 Dec 23 09:35 Documents drwx------+ 5 squinn staff 170 Jan 9 18:27 Downloads drwx------@ 17 squinn staff 578 Jan 8 18:03 Dropbox drwx------@ 49 squinn staff 1666 Jan 4 15:47 Google Drive drwx------+ 74 squinn staff 2516 Nov 17 15:06 Library drwx------+ 6 squinn staff 204 May 20 2015 Movies drwx------+ 5 squinn staff 170 Oct 22 2014 Music drwx------+ 18 squinn staff 612 Jul 29 11:31 Pictures drwxr-xr-x 37 squinn staff 1258 Jan 4 15:57 Programming drwxr-xr-x+ 5 squinn staff 170 Oct 21 2014 Public drwx------@ 8 squinn staff 272 Jun 30 2015 SpiderOak Hive drwxr-xr-x 9 squinn staff 306 Sep 17 2015 metastore_db drwxr-xr-x 4 squinn staff 136 Apr 27 2016 nltk_data -rw-r--r-- 1 squinn staff 131269 Jan 9 18:32 rodeo.log example1:~ squinn$
Anything that starts with a d on the left is a folder (or directory), otherwise it's a file.
Ok, that's cool. I can tell what is what where I currently am. ...but wait, how do I even know where I am?
example1:~ squinn$ pwd /home/squinn example1:~ squinn$
Great! Now I know where I am, and what is what where I am. How do I move somewhere else?
example1:~ squinn$ cd Music/ example1:Music squinn$ ls iTunes example1:Music squinn$
You'll notice the output of the ls command has now changed, which hopefully isn't surprising.
Since we've Changed Directories with the cd command--you essentially double-clicked the "Music" folder--now we're in a different folder with different contents; in this case, a lone "iTunes" folder.
Folders within folders represent a recursive hierarchy. We won't delve too much into this concept, except to say that, unless you're in the root directory (/ on Linux, C:\ on Windows), there is always a parent directory--the enclosing folder around the folder you are currently in.
Therefore, while you can always change to a very specific directory by supplying the full path--
example1:~ squinn$ cd /home/squinn/Dropbox example1:Dropbox squinn$ ls Cilia_Papers Imaging_Papers OdorAnalysis Public Computer Case LandUseChange OrNet cilia movies Icon? NSF_BigData_2015 OrNet Videos example1:Dropbox squinn$
--I can also navigate to the parent folder of my current location, irrespective of my specific location, using the special .. notation.
Let's see some other examples!
example1: squinn$ ls Lecture1.ipynb example1: squinn$ ls -l total 40 -rw-r--r-- 1 squinn staff 18620 Jan 5 19:54 Lecture1.ipynb example1: squinn$ pwd /home/squinn/teaching/4835/lectures example1: squinn$ cd .. example1: squinn$ pwd
What prints out?
~//home/squinn/home/squinn/teaching/home/squinn/teaching/4835$ ls -l total 8 -rw-rw-r-- 1 squinn staff 19 Sep 3 09:08 hello.txt drwxrwxr-x 2 squinn staff 4096 Sep 3 09:08 lecture $ ls *.txt
What prints out?
hello.txt*.txthello.txt lectureThe first word you type is the program you want to run. bash will search PATH for an appropriately named executable and run it with the specified arguments.
$ echo Hello > h.txt $ echo World >> h.txt $ cat h.txt
What prints out?
$ echo Hello > h.txt $ echo World > h.txt $ cat h.txt
What prints out?
cat dump file to stdout
more paginated output
head show first 10 lines
tail show last 10 lines
wc count lines/words/characters
sort sort file by line and print out (-n for numerical sort)
uniq remove adjacent duplicates (-c to count occurances)
cut extract fixed width columns from file
$ cat text a b a b b $ cat text | uniq | wc
What is the first number to print out?
$ cat text a b a b b $ cat text | sort | uniq | wc
What is the first number to print out?
$ grep a text | wc
What is the first number to print out?
$ sed 's/a/b/' text | uniq | wc
What is the first number to print out?
Pattern scanning in processing language. We'll mostly use it to extract columns/fields. It processes a file line-by-line and if a condition holds runs a simple program on the line.
awk 'optional condition {awk program}' file
wc Spellman.csv (gives number of lines, because of header this is off by one)
grep YA Spellman.csv |wc
grep ^YA Spellman.csv |wc (this is a bit better, ^ matches begining of line)
grep ^YA -c Spellman.csv (grep can provide the count itself)
awk -F, 'NR > 1 {print $1}' Spellman.csv | cut -b 1-2 | sort | uniq -c
awk -F, 'NR > 1 {print $1}' Spellman.csv | cut -b 1-3 | sort | uniq -c
awk -F, 'NR > 1 && $2 > 0 {print $0}' Spellman.csv | wc
awk -F, 'NR > 1 {print $1,$2}' Spellman.csv | sort -k2,2 -n | tail
awk -F, 'NR > 1 {print $1,$2}' Spellman.csv | sort -k2,2 -n -r | tail
awk -F, 'NR > 1 && $3 > $2 && $4 > $3 {print $0}' Spellman.csv |wc
awk -F, 'NR > 1 && $3 > $2 && $4 > $3 {print $4-$2,$0}' Spellman.csv | sort -n -k1,1
grep ^ATOM 1shs.pdb > newpdb.pdb (^matches beginning of line)
grep ^ATOM 1shs.pdb | awk '$5 == "A" {print $0}'
#this is UNSAFE with pdb files since there is no guarantee that fields
#will be whitespace seperated, safer is:
grep ^ATOM 1shs.pdb | awk ' substr($0,22,1) == "A" {print $0}' > newpdb.pdb
grep ^ATOM 1shs.pdb | awk ' substr($0,22,1) == "A" {print $0}' | cut -b 78- | sort | uniq -c
Did everyone finish the pre-test? It was due today before lecture. https://docs.google.com/forms/d/1ka9yH5G3bOCfdJUTaeZXV2BdtvqqsiPaxnvKI2f4YK4/
Office hours: Tuesdays (today!) at 11:00 - 12:30. Boyd GSRC 638A.