Today's Agenda

  • Basic bash commands
  • Paths
  • Jupyter notebooks
  • Python 2 vs Python 3

Quick Info

  • Make sure you have completed the form: https://goo.gl/y2ammQ to be added to the Slack group and Github Group.
  • Mike and I will be meeting to finalize the schedule and start the interactive part of the course.
  • Starting this week, the notebooks will be available the Sunday before the class, so that you know the schedule ahead of time and decide if you want to join or not.
  • We added a new plug-in Anamak Bot to make it easier to ask questions if you want to remain anonymous.
  • Next week we'll start with the interactive session of Python.

Before we begin, let's ...

First, you will need to clone the existing repository for the course.

cd /Path/to/Repositories/folder

Then clone the repository https://github.com/VandyAstroML/Vanderbilt_Computational_Bootcamp

git clone https://github.com/VandyAstroML/Vanderbilt_Computational_Bootcamp

And now pull all the changes to the repository

git pull

Now, whenever you do "git pull" from inside that directory, you'll have the updated version of the repository and notebooks.

Introduction to bash

bash is a command-line interpreter or shell that instructs the operating system to carry out commands on a line-by-line basis, or when written together as scripts and is run in a terminal. bash is a Unix-based shell that we will be focusing on, but there are others out there (such as tcsh). If you're not sure what shell you're using, running the commands 'echo \$0' or 'ps -p \$\$' will *usually* tell you. In bash, "\$" denotes at the start of a string denotes a variable.

Basic bash commands

  • pwd
    - Returns the pathway that you are currently in (i.e. how to get to your current directory)
  • cd [path]
    - Change the path you are in
  • ls
    - list all files in the current location
  • locate [file]
    - find file with a particular name
  • cp [file1] [file2]
    - Copy a file to another place (original file remains)
  • mv [file1] [file2]
    - Move a file to another place (original file is gone)
  • mkdir [name]
    - Create a new directory
  • rm [file1]
    - Remove a file **WARNING: Using this incorrectly, you can delete EVERYTHING**
  • rmdir [directory]
    - Remove a directory. This one will only delete __empty__ directories, so it's not that useful after all
  • man [command]
    - Opens the manual page for a file for full usage instructions and all options available. You can also use '[command] --help' for most standard commands to see how to use them
  • echo [string/variable]
    - prints the string or variable to the terminal
  • head [file]
    - prints the first 10 lines of the file to screen
  • tail [file]
    - prints the last 10 lines of the file to screen
  • cat [file]
    - prints out the content of [file] to the screen.

Not sure how to use something still? Or if it does what you want it to? Google usually helps (and will usually bring you to a mostly useful answer at stackoverflow)

Paths

When you try to run an executable, the path variable is a list of locations where it will check to find the file to run. This allows you to download or write new programs/scripts and be able to call them anywhere without having to always include the full path to the file. For example, this typically includes locations like /usr/bin.

To check what is in the path, you can run 'echo \$PATH' and see a list of the directories that will be checked. You can add new directories to the list of stored paths by opening .profile in a text editor, and adding a line like this:

PATH=$PATH:newdirectorypath 
After adding this, every time you start a terminal, this location will be part of the directories that will be checked for executibles.

An example for this is Ananconda. It appends the Anaconda directory to your '~/.bashrc' file.

Aliases

Aliases are ways to call a set of commands that you use frequently with a short nickname. In Linux systems, these are often stored in the .bashrc file (a file that contains the settings for when you run a bash terminal) or in .profile (a file that contains settings for running shell scripts). For Macs, these are stored in .bash_profile (again, contains settings for running shell scripts, but will not be seen by shells other than bash). We can talk about them later, but useful to know they're an available tool.

Adding aliases to your ~/.bashrc file

Normally, you add aliases like e.g.

alias somename=' cd /useful/directory/for/research'

and then you can use "somename" to access to that directory.

But that is works for the current shell session. But normally you want to be able to access these aliases at any given time. So you need to add it to your ~/.bashrc file.

  1. First create an "~/.aliases file
    touch ~/.aliases
    
  2. Open the file using your preferred text editors. I use vim for this example
    vim ~/.aliases
    
  3. Add useful aliases to the file.
  4. Add it to your ~/.bashrc file by sourcing it.
    vim ~/.bashrc
    

Everytime that you start a new shell session, you will be able to use the aliases in the ~/.aliases file.

Some useful aliases for are:

alias   lll='ls -lah'     # Lists files and directories (including hidden ones) as a list
alias   lla=lll           # Same as above
alias   llh='ls -lh'      # Lists files and directories (only visible ones) in a human-readable format
alias   llt='ls -lahtr'   # Lists files based on _date of last modified_
alias    LS='ls'          # In case you misspell "ls" 
alias    sl='ls'          # In case you misspell "ls" 
alias    SL=ls            # In case you misspell "ls" 
alias CLEAR='clear'
alias   clc='clear'
alias   CLC='clear'
alias    en='emacs -nw'   # Opens up a new "Emacs" window
alias pushd='cd -'        # Returns to the previous directory
alias untar_file='tar -zxvf $1' # Untars a file to the current directory

# You can also use functions as aliases
scppics() {
            scp -rp username@host.address:$1 /path/to/destination/folder
          }

git_ref(){
           git remote -v
           git fetch upstream
           git checkout master
           git merge upstream/master
           git merge upstream/master
         }

# You can add the ssh-keys for easy access to Github, or other computers
alias  skeys='ssh-add ~/.ssh/id_rsa; ssh-add ~/.ssh/personalid; ssh-add ~/.ssh/vpac; ssh-add ~/.ssh/github_vandy'

Quick bash tutorial

To go over some of the commands, we're going to try doing some basic operations with some bash commands. To do this, first open up a terminal window. Run one of the commands from earlier to check that you're running bash by default. If you're not, you should be able to start a bash shell by just running 'bash'.

Now we'll set up a quick directory to play in. I'll call it compshop

mkdir compshop
And then go into this directory
cd compshop
We can quickly check to see the full path of where we are now
pwd
And we can confirm that right now the file is empty, too
ls
Because we're working with computers, we have to print this to the screen at some point....
echo Hello, World!
Since a lot of what bash is useful for is moving files around, we'll make a file to work with. We can also have echo write to a file by adding '> [filename]' at the end of the command.
echo Hello, World! > filegreeting
A quick check again with ls and we'll see we now have one file in the directory. We can also see that that file contains exactly what we wanted it to
head filegreeting
We can make a second copy of this file
cp filegreeting filegreeting2
Checking ls again, we now have two files. If we wanted to rename one of them, we can just use the mv command for that
mv filegreeting2 filegreetings
Check ls again, and we still have two files, but we changed the name of one of them. If we want to see more functionality about what ls can do, there's always google, but we can also check from the command line too. Try each of these:
man ls
ls --help
You'll see that they both start off with how we have to use ls, and then include a lot of information about all the different options we can run. For example, we can list all files, even those that would normally be hidden, by running this:
ls -a
If we want to see the last modification dates and file sizes of all the files, we can run
ls -l
We can even combine these two, so we see all files (even ones that might be hidden) and get their sizes, dates, etc.
ls -al
Finally, as we've tried out most of the bash commands here, we'll get rid of these two files. We can just give a list of files to be removed.
rm filegreeting filegreetings

All of these operations can get a lot more complex or sophisticated (like echo did when we had it write to a file), but these are a lot of the most common commands to show up in bash.

Jupyter notebooks

A Jupyter notebook is a file (like this one) that contains cells that can contain either plain text or code. So far, we've only been using plain text, but as we start working with Python, we can explain stuff in the text, and then you'll be able to run the sections that contain code and see what happens, as well as change the code to see how those changes effect the results. The first example of this is going to be in the last section today.

Running Jupyter notebooks

There's two ways of viewing Jupyter notebooks, both of which take place inside a browser. On the github we're using, the notebooks can be ran directly on github. In this sense, it works the same as simply visiting a website. However, this will only allow you to view it as a static page. It won't allow you to use any of the interactive features of a Jupyter notebook. For that, you'll need to use the second method, which takes place locally.

The second way requires having Jupyter downloaded, although Anaconda should have included this. From here, you may have an icon to start Jupyter. You can also start it from the terminal by running 'jupyter notebook'. This starts a jupyter server, and will also open up a tab in your default browser. Using that default browser, go through your directories to locate the notebook file that you want to run (you can also run 'jupyter notebook' while you're in the directory that file is in to speed this up).

Create Jupyter notebooks

Creating a Jupyter notebook file is very similar to the second way of accessing a Jupyter notebook file. As before, you have to run 'jupyter notebook' to start up the Jupyter server. In this case, though, you go to the directory you want the file to be located, and then select 'new' in the upper-right corner and select a Python notebook.

For now, we won't go into the details of formatting within a Jupyter notebook.

In order to create a new iPython notebook, one starts one by

$:  cd /path/to/directory
    $:  jupyter notebook

This will start a kernel in the current directory.

You then click on "New > Python 2" (or Python 3 if you have that one installed) and it will start a new iPython notebook.


In [1]:
# You can easily run and debug your code
%matplotlib inline

from __future__ import (absolute_import, division,
                        print_function, unicode_literals)

import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

x = np.arange(50)
y = x.copy()

print( len(x))


50
/Users/victor2/anaconda/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')

In [2]:
# Printing arrays
print(x)


[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49]

In [3]:
# Plotting arrays
import numpy as np
from scipy.stats import kendalltau
import seaborn as sns
sns.set(style="ticks")

rs = np.random.RandomState(11)
x = rs.gamma(2, size=1000)
y = -.5 * x + rs.normal(size=1000)

sns.jointplot(x, y, kind="hex", stat_func=kendalltau, color="#4CB391")


Out[3]:
<seaborn.axisgrid.JointGrid at 0x11458e390>

In [4]:
# Linear Regressions

df = sns.load_dataset("iris")
sns.pairplot(df, hue="species")


Out[4]:
<seaborn.axisgrid.PairGrid at 0x114511bd0>

You can also run complex commands like the following:


In [5]:
import multiprocessing

def worker(num):
    """thread worker function"""
    print ('Worker:', num)
    return

jobs = []
for i in range(5):
    p = multiprocessing.Process(target=worker, args=(i,))
    jobs.append(p)
    p.start()


Worker: 0
Worker: 1
Worker: 3
Worker: 2

Python 2 vs Python 3

It's important to take a moment to make a general mention about Python. In most cases in programming, it's fair to assume that everyone is working in the 'newest' version of the language that's stable. Python, however, is in a unique spot. Python 2.7 came out in 2010, but Python 3 started being released in 2008, and is currently up to Python 3.6, which came out in 2016. So the most recent Python 3 major update is 6 years newer. However, Python 2 and Python 3 have some differences that have meant not all Python 2 code can work with Python 3 without things being rewrittten. This means if you're using old code, you may need to use Python 2 if it hasn't been updated.

There's two basic differences to mention, but more can be found here, with examples of how the code runs in both Python 2 and Python 3.

The first big difference is how they treat division. In Python 2, division of two integers will return an integer answer. This is fine for 8/2, which will always be 4. However 7/2 will return 3 as the answer. Older code may be written to take advantage of this on purpose. Python 3, however, will not do this, and so 7/2=3.5. This is correct math, but may not be what older code wanted to have happen.

The second big difference is that there's additional formatting required for print statements. This will mean that Python 2 code with any print statements in the old format will have a syntax error when it's run in Python 3.

The below block of code is some simple Python that will print out what the Python version that is being run is. To run the block of code, select the cell and then hit shift+enter and the Python code will run, confirming which version you're using.


In [6]:
import sys
print (sys.version)


2.7.11 |Anaconda 2.5.0 (x86_64)| (default, Dec  6 2015, 18:57:58) 
[GCC 4.2.1 (Apple Inc. build 5577)]
Worker: 4

Normally you have "Python 2" code that you want to use in "Python 3". But "Python 3" code can sometimes not be compatible with "Python 3".

This is why there is a module called "future" that allows for a script to be run in both Python 2 and 3

You would need to add the following line


In [7]:
from __future__ import (absolute_import, division,
                        print_function, unicode_literals)

from builtins import (
         bytes, dict, int, list, object, range, str,
         ascii, chr, hex, input, next, oct, open,
         pow, round, super,
         filter, map, zip)

Adding this line at the top of your script will make it easier to use/migrate to Python 3.

This page is a Quickstart on how to convert Python 2 scripts into Python 3, and (maybe, if possible) vice versa.