In this lecture, we'll eschew all things Python and Biology, and focus entirely on the step before either of these: becoming familiar with the command line (or command prompt). By the end of this lecture, you should be able to:
If you've never used a command-line before... Don't be intimidated!
If you're on a Windows machine, you can either:
I have a macOS laptop, an Ubuntu workstation, a bunch of RedHat servers, and a Windows 10 home desktop.
I'm most at home with either macOS or Ubuntu.
It's like learning another language: you'll only get better at it if you immerse yourself in it, even when you don't want to.
Last login: Mon Jan 9 18:36:07 on ttys006 example1:~ squinn$ ls Applications Dropbox Music SpiderOak Hive Desktop Google Drive Pictures metastore_db Documents Library Programming nltk_data Downloads Movies Public rodeo.log example1:~ squinn$
example1:~ squinn$ ls -l total 264 drwx------ 7 squinn staff 238 Oct 23 2015 Applications drwx------+ 59 squinn staff 2006 Jan 9 17:49 Desktop drwx------+ 20 squinn staff 680 Dec 23 09:35 Documents drwx------+ 5 squinn staff 170 Jan 9 18:27 Downloads drwx------@ 17 squinn staff 578 Jan 8 18:03 Dropbox drwx------@ 49 squinn staff 1666 Jan 4 15:47 Google Drive drwx------+ 74 squinn staff 2516 Nov 17 15:06 Library drwx------+ 6 squinn staff 204 May 20 2015 Movies drwx------+ 5 squinn staff 170 Oct 22 2014 Music drwx------+ 18 squinn staff 612 Jul 29 11:31 Pictures drwxr-xr-x 37 squinn staff 1258 Jan 4 15:57 Programming drwxr-xr-x+ 5 squinn staff 170 Oct 21 2014 Public drwx------@ 8 squinn staff 272 Jun 30 2015 SpiderOak Hive drwxr-xr-x 9 squinn staff 306 Sep 17 2015 metastore_db drwxr-xr-x 4 squinn staff 136 Apr 27 2016 nltk_data -rw-r--r-- 1 squinn staff 131269 Jan 9 18:32 rodeo.log example1:~ squinn$
Anything that starts with a d
on the left is a folder (or directory), otherwise it's a file.
Ok, that's cool. I can tell what is what where I currently am. ...but wait, how do I even know where I am?
example1:~ squinn$ pwd /home/squinn example1:~ squinn$
Great! Now I know where I am, and what is what where I am. How do I move somewhere else?
example1:~ squinn$ cd Music/ example1:Music squinn$ ls iTunes example1:Music squinn$
You'll notice the output of the ls
command has now changed, which hopefully isn't surprising.
Since we've Changed Directories with the cd
command--you essentially double-clicked the "Music" folder--now we're in a different folder with different contents; in this case, a lone "iTunes" folder.
Folders within folders represent a recursive hierarchy. We won't delve too much into this concept, except to say that, unless you're in the root directory (/
on Linux, C:\
on Windows), there is always a parent directory--the enclosing folder around the folder you are currently in.
Therefore, while you can always change to a very specific directory by supplying the full path--
example1:~ squinn$ cd /home/squinn/Dropbox example1:Dropbox squinn$ ls Cilia_Papers Imaging_Papers OdorAnalysis Public Computer Case LandUseChange OrNet cilia movies Icon? NSF_BigData_2015 OrNet Videos example1:Dropbox squinn$
--I can also navigate to the parent folder of my current location, irrespective of my specific location, using the special ..
notation.
Let's see some other examples!
example1: squinn$ ls Lecture1.ipynb example1: squinn$ ls -l total 40 -rw-r--r-- 1 squinn staff 18620 Jan 5 19:54 Lecture1.ipynb example1: squinn$ pwd /home/squinn/teaching/4835/lectures example1: squinn$ cd .. example1: squinn$ pwd
What prints out?
~/
/home/squinn
/home/squinn/teaching
/home/squinn/teaching/4835
$ ls -l total 8 -rw-rw-r-- 1 squinn staff 19 Sep 3 09:08 hello.txt drwxrwxr-x 2 squinn staff 4096 Sep 3 09:08 lecture $ ls *.txt
What prints out?
hello.txt
*.txt
hello.txt lecture
The first word you type is the program you want to run. bash will search PATH for an appropriately named executable and run it with the specified arguments.
$ echo Hello > h.txt $ echo World >> h.txt $ cat h.txt
What prints out?
$ echo Hello > h.txt $ echo World > h.txt $ cat h.txt
What prints out?
cat dump file to stdout
more paginated output
head show first 10 lines
tail show last 10 lines
wc count lines/words/characters
sort sort file by line and print out (-n for numerical sort)
uniq remove adjacent duplicates (-c to count occurances)
cut extract fixed width columns from file
$ cat text a b a b b $ cat text | uniq | wc
What is the first number to print out?
$ cat text a b a b b $ cat text | sort | uniq | wc
What is the first number to print out?
$ grep a text | wc
What is the first number to print out?
$ sed 's/a/b/' text | uniq | wc
What is the first number to print out?
Pattern scanning in processing language. We'll mostly use it to extract columns/fields. It processes a file line-by-line and if a condition holds runs a simple program on the line.
awk 'optional condition {awk program}' file
wc Spellman.csv (gives number of lines, because of header this is off by one) grep YA Spellman.csv |wc grep ^YA Spellman.csv |wc (this is a bit better, ^ matches begining of line) grep ^YA -c Spellman.csv (grep can provide the count itself) awk -F, 'NR > 1 {print $1}' Spellman.csv | cut -b 1-2 | sort | uniq -c awk -F, 'NR > 1 {print $1}' Spellman.csv | cut -b 1-3 | sort | uniq -c awk -F, 'NR > 1 && $2 > 0 {print $0}' Spellman.csv | wc awk -F, 'NR > 1 {print $1,$2}' Spellman.csv | sort -k2,2 -n | tail awk -F, 'NR > 1 {print $1,$2}' Spellman.csv | sort -k2,2 -n -r | tail awk -F, 'NR > 1 && $3 > $2 && $4 > $3 {print $0}' Spellman.csv |wc awk -F, 'NR > 1 && $3 > $2 && $4 > $3 {print $4-$2,$0}' Spellman.csv | sort -n -k1,1
grep ^ATOM 1shs.pdb > newpdb.pdb (^matches beginning of line) grep ^ATOM 1shs.pdb | awk '$5 == "A" {print $0}' #this is UNSAFE with pdb files since there is no guarantee that fields #will be whitespace seperated, safer is: grep ^ATOM 1shs.pdb | awk ' substr($0,22,1) == "A" {print $0}' > newpdb.pdb grep ^ATOM 1shs.pdb | awk ' substr($0,22,1) == "A" {print $0}' | cut -b 78- | sort | uniq -c
Did everyone finish the pre-test? It was due today before lecture. https://docs.google.com/forms/d/1ka9yH5G3bOCfdJUTaeZXV2BdtvqqsiPaxnvKI2f4YK4/
Office hours: Tuesdays (today!) at 11:00 - 12:30. Boyd GSRC 638A.