Introduction to Unix

Very general question time: What is it that computers do? Computers can interact with us, store information for us, run programs, etc. Computers can help us do science! We, as scientists, interact with computers in a couple of basic ways, through the command line interface (CLI) or a graphical user interface (GUI). What are some examples of GUIs? Well, almost everything you're used to using is a GUI! Word, Excel, etc. GUIs are great because they require no memorization of syntax or knowledge of programming--you can simply use menus and icons to move, delete, or open a file. So why bother with CLI at all? Because it can be very convenient and powerful, and we will learn how!

COMMAND SHELL: The command shell is a program that helps you communicate with your computer. You type a something into the terminal, and then the shell takes this and figures out what commands the computer needs to run and then orders the computer to do so. A commonly used shell for Unix is the Bash shell. Let's start off with some very basic shell commands. NOTE: Because I'm in this python notebook environment I need to include this first line with the percentage signs. It is called a magic function and it just helps me simulate being in the shell environment. You DO NOT need to use this.

First a little computer set-up...

Before we begin the lesson, we need to do a little bit of set-up. Do not worry about what this command means for now, just know that it is helping you to correctly set up things on your computer so that things like printing, running this notebook, and using python will work. Simply type the following on the command line of your terminal:

/astro/store/gradscratch/tmp/windemut/premap_setup2017

To make sure this has worked, try a couple of things. First type the following and you should see an interactive environment for python come up. It should say Python 3.

ipython

To exit this session use Ctrl+D, or type exit and enter.

Basic Unix Commands

We can ask the computer who the current user is using whoami or we can print the working directory (where we are) using pwd In both cases the shell finds the program (either whoami or pwd), runs that program, and then displays the output for us.

Type the following into your terminal window and press enter:

whoami

also try:

pwd

which stands for "print the working directory".

From the pwd command, we can see which directory we are currently working in.

/astro/users/<username> refers to your so-called home directory. This is like the top-level directory on your computer (remember: folder = directory).

We need to be working in a specific directory to do some of the examples in this lesson. Type the following into your command line and press enter.

cd ~/Lectures/UnixIntro

We'll worry about what this command means later. Let's investigate what is in our current working directory with the ls command:

ls

The ls command lists what is in the current working directory. You'll see something like this:

bar.txt
baz.txt
bio.txt
foo.txt
helloworld.py
wordcount.txt

This is showing you the same things that you'd see in a file browser.

The ls command LISTS what is in the current working directory. This is not very exciting right now, because all I have in here is this notebook and a few other files. Let's move to another directory to check out more stuff. You can either follow this command (change to my home directory) or insert your own uwnetid instead of my username:


In [4]:
%%bash
cd /astro/users/premapta
ls -F


UnixIntro.ipynb
bar.txt
baz.txt
bio.txt
foo.txt
helloworld.py
wordcount.txt
bash: line 1: cd: /astro/users/premapta: No such file or directory

So now we see a lot more stuff! I also added this trailing "-F" flag to ls to make it more clear which things are directories (adds trailing "/" to directories) and which things are files. We can tell what type of files all the files are by their extensions (i.e. names are ".pdf" or the like). How did I change to this directory? I used the cd command to change directories, followed by the PATH to the new destination directory. The cd command can be followed by a variety of other characters to move around our file system. Let's look at a couple:


In [5]:
%%bash
pwd
cd .
pwd
cd .. 
pwd 
cd ~
pwd


/Users/KG/Grad/Pre-MAP2015/UnixIntro
/Users/KG/Grad/Pre-MAP2015/UnixIntro
/Users/KG/Grad/Pre-MAP2015
/Users/KG

So what did we do? We print our working directory (/astro/users/premapta/Lectures/UnixIntro), then I changed to the "." directory, then printed that directory (still UnixIntro), then changed to the ".." directory, then printed that directory (Lectures), then changed to the home directory (/astro/users/premapta) and printed it. So what are the ".", "..", and "~" directories? These characters the current directory, the parent directory, and the home directory, respectively. We can get them to show up when we use a special flag with our ls command:


In [6]:
%%bash 
ls -a


.
..
.ipynb_checkpoints
UnixIntro.ipynb
bar.txt
baz.txt
bio.txt
foo.txt
helloworld.py
wordcount.txt

This stands for "list all." Now I want everyone to take a moment to do some exploring. If you type cd without a path to any directory, what happens? If you use the "-s" flag with ls what output do you get?

Now that we've seen how to navigate through the directory structure and see what is inside directories, let's see how to create things. Navigate back to your home directory--cd /astro/users/your_uwnetid_here--so that you will be able to create directories and write files. If we want to make a new directory we will use the following command:


In [7]:
%%bash
mkdir bio

The mkdir command stands for "make directory" so with this we have created a new directory that I named bio. If you move into this directory and try to see what is in it you will see that it is empty. Go ahead and do that now. The next thing we'll want to do is create a file in this new directory. Let's make this a draft file. To create a file we will use the touch command followed by the name of the file we want to create. (More on creating files with text editors later on).


In [8]:
%%bash 
cd bio
touch draft.txt
ls


draft.txt

So we can see that we have created a "draft.txt" file in our bio directory, but this file is empty. Let's say we want to make a copy of a this file and move in to another directory. To make copies of files we use the cp command. To move files we will use mv command. NOTE: I am only doing the cd bio command again because this notebook automatically reverts me back to the "/astro/users/premaptap/Lectures/UnixIntro" directory when I start a new cell. In the terminal you would NOT need to do this.


In [9]:
%%bash
cd bio
cp draft.txt draftcopy.txt
ls
mv draftcopy.txt ..
ls


draft.txt
draftcopy.txt
draft.txt

So let's see what we have done with this series of commands. We have copied "draft.txt" into a new file called "draftcopy.txt." By using the ls command I can see that both these files exist in the directory. Next I use the mv command followed by the destination (recall from earlier) to move this copy file to one directory up. Now if I do ls again I see that ONLY "draft.txt" remains in this directory. Now let's say that I only want "draftcopy.txt" and so I decided to remove the whole bio directory. The command to remove something is rm (Note: be careful since removing is PERMANENT). Does this command work? Why or why not?

You should find that it does not work since the directory is not empty. There are several things you could do here. You could go into the "bio" directory and remove the file using rm draft.txt and then subsequently delete the empty directory, or you could use one of the following commands:


In [10]:
%%bash
rm -r bio
rmdir bio


rmdir: bio: No such file or directory

The first command tells it to delete everything in the directory and then delete the directory itself ("r" stands for "recursive"), while the second command is the equivalent of rm for a directory (in that it will delete the whole directory). Again, be VERY careful when doing this. Deleting is permanent.

The next thing to look at is how we can combine existing programs (commands) to do powerful things from the command line. Let's imagine that I have a very crowded directory with lots of different files, how will I find just the files of a certain type? We can use something called the wildcard to accomplish this. For example, let's say I only want to look at ".txt" files. Here's how we might accomplish that:


In [11]:
%%bash
ls *.txt


bar.txt
baz.txt
bio.txt
draftcopy.txt
foo.txt
wordcount.txt

This "wildcard" symbol matches one or more characters. The "?" wildcard matches a single character. You can use these in combination with one another to get at more specific file names. For example, what would ls b*.txt output versus ls ?a* versus ls ba?.txt Try these out to see what you get! Note that the wildcard can be used with any other shell command, for example rm *. DON'T DO THIS. It's a bad idea to delete everything.

Let's say I want to write some of my output from the shell to a file. Let's use the wc command to count the number of lines,words, and characters in each file and then redirect that to a new file called "wordcount.txt."


In [12]:
%%bash
wc *.txt > wordcount.txt

The greater than symbol tells the computer to redirect the shell output to a file instead of printing it to a screen. If we want to see what is in this new file "wordcount.txt," we can use the cat command, which stands for "concatenate" and instructs the computer to print the contents of the file.


In [13]:
%%bash 
cat wordcount.txt


       3      19      93 bar.txt
       1      14      70 baz.txt
       4      88     565 bio.txt
       0       0       0 draftcopy.txt
       2      12      57 foo.txt
       0       0       0 wordcount.txt
      10     133     785 total

Imagine that I had to do this for a REALLY large file. Would I want all the output printed directly to my screen? Probably not. Imagine I only care about that summary line at the end. In order to get at just the first or last few lines of a file we can use the head or tail commands as follows:


In [14]:
%%bash
head -1 wordcount.txt
tail -1 wordcount.txt


       3      19      93 bar.txt
      10     133     785 total

The "-1" means either first line, or last line, respectively. We could change these numbers to get the first three lines, or last three lines, for example. Let's look at how we could search for instances of specific words in lines in files:


In [15]:
%%bash
cat *.txt | grep -n "file"


1:Here is another dummy file.
2:This is the second dummy file I made.
4:Here is another dummy file I made, but with only one line this time. 
9:This is a dummy file.
10:There are two lines in this file. 

The vertical bar is referred to as a "pipe". It tells the shell to take the output of the command on the left as the input to the command on the right. grep is a command that finds lines in files that match a particular pattern. It is a contraction of "global/regular expression/print." The "-n" flag means to print the line number where the expression we are searching for occurs. There are many other flag options to go along with grep, which you can find by doing man grep (which stands for manual). There are also lots of other ways to search for specific "regular expressions" using grep. As one example, let's look at what the following does:


In [16]:
%%bash
cat *.txt | grep '^Here'


Here is another dummy file.
Here is another dummy file I made, but with only one line this time. 

When we add in the carrot this prints out the file contents of only those files that have lines beginning with "Here."

We can also use the sort command on files. Here I will show an example of doing an alphabetical sort, but note that it is possible to do a numerical sort as well (you will try this later for yourselves). What if I tried this with sort -k 2 alpha.dat instead?


In [17]:
%%bash
sort -k 1 alpha.dat


sort: open failed: alpha.dat: No such file or directory

Logging onto machines from home

There are even more Unix commands (not covered here) that are listed on the Unix Cheat Sheet that you have. This includes things like how to download a file from the internet (given a web address of the file), how to create a tarball, how to copy things between machines, etc. Now that we've gone over these basic commands today I'll show you one last thing, which is how to remotely log into your machines. This command is also covered on the Unix Cheat Sheet, but requires a little set-up.

Mac Users

The first thing you need is the correct "config" file in the .ssh folder in your home directory. You only need to create this ONCE. First, check to see that this folder exists by doing the following command:

cd ~/.ssh

If this works, then the directory already exits. If you get an error, then you need to create this directory first, which you would do with mkdir ~/.ssh. Next, you need to create the config file. The command to do that is as follows:

touch ~/.ssh/config

Now open us this config file with your favorite text editor (something like emacs ~/.ssh/config) and edit it to include the following lines:

Host gateway

    User UWNETID

    Hostname gateway.phys.washington.edu

Host astrolabXX

    User UWNETID

    Hostname astrolabXX

    ProxyCommand ssh -q -W %h:%p gateway

Remember that to SAVE the emacs file you to a ctrl+xs and to EXIT emacs you do a ctrl+xc. In this file change UWNETID to your login username, and XX to the default lab computer you would like to log in to. Once you have this file all set up on your laptop, you should be able to remotely log into an astrolab computer with the following command (also on the Unix Cheat Sheet):

ssh astrolabXX

In order to log out of a remote session you just need to type exit in the terminal window. The next time you want to login all you have to do is type ssh astrolabXX again and it should prompt you for your password (twice) as before.

For Windows Users

First, download putty--use putty.exe--and then download Xming. For convenience drag both of these to make icons on your desktop so that you don't have to go searching for them later. Make sure both have installed properly. Make sure Xming is open (just double click, nothing will show up on the screen) before opening putty. Once this is done, you can open up putty. It should look like this:

Where it says "Host Name (or IP address) type gateway.phys.washington.edu. On the left hand side click the "+" next to "SSH." More options should come up. Next click on X11 and on the new screen check the box that says "enable X11 forwarding." Now go back to the first screen (click "Session" on the left hand side). Under "Saved Sessions" type something like "astrolab" and hit "Save." This saves the settings you just input under the name astrolab. Next click "Open." An Xming terminal should pop up and prompt you for your username. Enter your uwnetid and your password. If this works, you have tunneled through gateway! Only one more step. In the terminal type the following:

ssh -l UWNETID astrolabXX.astro.washington.edu

And once again enter your password when prompted. Note that the "-l" is a lowercase L (not a 1), and that you should put in your username and preferred astrolab computer where it says UWNETID and XX. Now you should be logged into your astrolab computer! Try doing an pwd to make sure you are in your home directory on the astrolab machine, or do an ls to see what is there. You will need to enter exit once to log out of your astrolab machine, and then exit once more to log out of gateway. Once you've done this your Xming terminal window should disappear.

The next time you want to do a remote login just make sure that Xming is open first and then open putty. Then you should be able to click on "astrolab" in your "Saved Sessions" and just hit "Open." An Xming terminal will again open and prompt you for your uwnetid and password. Then you type the ssh -l UWNETID astrolabXX.astro.washington.edu command above and you will be logged in again!

Next time we will cover a brief bit of astro background, talk about using text editors (to more easily create and edit files), and then get started on our first Unix assignment!


In [ ]: