First, you will need to clone the existing repository for the course.
cd /Path/to/Repositories/folder
Then clone the repository https://github.com/VandyAstroML/Vanderbilt_Computational_Bootcamp
git clone https://github.com/VandyAstroML/Vanderbilt_Computational_Bootcamp
And now pull all the changes to the repository
git pull
Now, whenever you do "git pull
" from inside that directory, you'll have the updated version of the repository and notebooks.
bash is a command-line interpreter or shell that instructs the operating system to carry out commands on a line-by-line basis, or when written together as scripts and is run in a terminal. bash is a Unix-based shell that we will be focusing on, but there are others out there (such as tcsh). If you're not sure what shell you're using, running the commands 'echo \$0' or 'ps -p \$\$' will *usually* tell you. In bash, "\$" denotes at the start of a string denotes a variable.
pwd
- Returns the pathway that you are currently in (i.e. how to get to your current directory)
cd [path]
- Change the path you are in
ls
- list all files in the current location
locate [file]
- find file with a particular name
cp [file1] [file2]
- Copy a file to another place (original file remains)
mv [file1] [file2]
- Move a file to another place (original file is gone)
mkdir [name]
- Create a new directory
rm [file1]
- Remove a file **WARNING: Using this incorrectly, you can delete EVERYTHING**
rmdir [directory]
- Remove a directory. This one will only delete __empty__ directories, so it's not that useful after all
man [command]
- Opens the manual page for a file for full usage instructions and all options available. You can also use '[command] --help' for most standard commands to see how to use them
echo [string/variable]
- prints the string or variable to the terminal
head [file]
- prints the first 10 lines of the file to screen
tail [file]
- prints the last 10 lines of the file to screen
cat [file]
- prints out the content of [file] to the screen.
Not sure how to use something still? Or if it does what you want it to? Google usually helps (and will usually bring you to a mostly useful answer at stackoverflow)
When you try to run an executable, the path variable is a list of locations where it will check to find the file to run. This allows you to download or write new programs/scripts and be able to call them anywhere without having to always include the full path to the file. For example, this typically includes locations like /usr/bin.
To check what is in the path, you can run 'echo \$PATH' and see a list of the directories that will be checked. You can add new directories to the list of stored paths by opening .profile in a text editor, and adding a line like this:
PATH=$PATH:newdirectorypath
After adding this, every time you start a terminal, this location will be part of the directories that will be checked for executibles.
An example for this is Ananconda. It appends the Anaconda directory to your '~/.bashrc' file.
Aliases are ways to call a set of commands that you use frequently with a short nickname. In Linux systems, these are often stored in the .bashrc file (a file that contains the settings for when you run a bash terminal) or in .profile (a file that contains settings for running shell scripts). For Macs, these are stored in .bash_profile (again, contains settings for running shell scripts, but will not be seen by shells other than bash). We can talk about them later, but useful to know they're an available tool.
~/.bashrc
fileNormally, you add aliases like e.g.
alias somename=' cd /useful/directory/for/research'
and then you can use "somename" to access to that directory.
But that is works for the current shell session. But normally you want to be able to access these aliases at any given time. So you need to add it to your ~/.bashrc file.
touch ~/.aliases
vim ~/.aliases
vim ~/.bashrc
Everytime that you start a new shell session, you will be able to use the aliases in the ~/.aliases file.
Some useful aliases for are:
alias lll='ls -lah' # Lists files and directories (including hidden ones) as a list
alias lla=lll # Same as above
alias llh='ls -lh' # Lists files and directories (only visible ones) in a human-readable format
alias llt='ls -lahtr' # Lists files based on _date of last modified_
alias LS='ls' # In case you misspell "ls"
alias sl='ls' # In case you misspell "ls"
alias SL=ls # In case you misspell "ls"
alias CLEAR='clear'
alias clc='clear'
alias CLC='clear'
alias en='emacs -nw' # Opens up a new "Emacs" window
alias pushd='cd -' # Returns to the previous directory
alias untar_file='tar -zxvf $1' # Untars a file to the current directory
# You can also use functions as aliases
scppics() {
scp -rp username@host.address:$1 /path/to/destination/folder
}
git_ref(){
git remote -v
git fetch upstream
git checkout master
git merge upstream/master
git merge upstream/master
}
# You can add the ssh-keys for easy access to Github, or other computers
alias skeys='ssh-add ~/.ssh/id_rsa; ssh-add ~/.ssh/personalid; ssh-add ~/.ssh/vpac; ssh-add ~/.ssh/github_vandy'
To go over some of the commands, we're going to try doing some basic operations with some bash commands. To do this, first open up a terminal window. Run one of the commands from earlier to check that you're running bash by default. If you're not, you should be able to start a bash shell by just running 'bash'.
Now we'll set up a quick directory to play in. I'll call it compshop
mkdir compshop
And then go into this directory
cd compshop
We can quickly check to see the full path of where we are now
pwd
And we can confirm that right now the file is empty, too
ls
Because we're working with computers, we have to print this to the screen at some point....
echo Hello, World!
Since a lot of what bash is useful for is moving files around, we'll make a file to work with. We can also have echo write to a file by adding '> [filename]' at the end of the command.
echo Hello, World! > filegreeting
A quick check again with ls and we'll see we now have one file in the directory. We can also see that that file contains exactly what we wanted it to
head filegreeting
We can make a second copy of this file
cp filegreeting filegreeting2
Checking ls again, we now have two files. If we wanted to rename one of them, we can just use the mv command for that
mv filegreeting2 filegreetings
Check ls again, and we still have two files, but we changed the name of one of them. If we want to see more functionality about what ls can do, there's always google, but we can also check from the command line too. Try each of these:
man ls
ls --help
You'll see that they both start off with how we have to use ls, and then include a lot of information about all the different options we can run. For example, we can list all files, even those that would normally be hidden, by running this:
ls -a
If we want to see the last modification dates and file sizes of all the files, we can run
ls -l
We can even combine these two, so we see all files (even ones that might be hidden) and get their sizes, dates, etc.
ls -al
Finally, as we've tried out most of the bash commands here, we'll get rid of these two files. We can just give a list of files to be removed.
rm filegreeting filegreetings
All of these operations can get a lot more complex or sophisticated (like echo did when we had it write to a file), but these are a lot of the most common commands to show up in bash.
A Jupyter notebook is a file (like this one) that contains cells that can contain either plain text or code. So far, we've only been using plain text, but as we start working with Python, we can explain stuff in the text, and then you'll be able to run the sections that contain code and see what happens, as well as change the code to see how those changes effect the results. The first example of this is going to be in the last section today.
There's two ways of viewing Jupyter notebooks, both of which take place inside a browser. On the github we're using, the notebooks can be ran directly on github. In this sense, it works the same as simply visiting a website. However, this will only allow you to view it as a static page. It won't allow you to use any of the interactive features of a Jupyter notebook. For that, you'll need to use the second method, which takes place locally.
The second way requires having Jupyter downloaded, although Anaconda should have included this. From here, you may have an icon to start Jupyter. You can also start it from the terminal by running 'jupyter notebook'. This starts a jupyter server, and will also open up a tab in your default browser. Using that default browser, go through your directories to locate the notebook file that you want to run (you can also run 'jupyter notebook' while you're in the directory that file is in to speed this up).
Creating a Jupyter notebook file is very similar to the second way of accessing a Jupyter notebook file. As before, you have to run 'jupyter notebook' to start up the Jupyter server. In this case, though, you go to the directory you want the file to be located, and then select 'new' in the upper-right corner and select a Python notebook.
For now, we won't go into the details of formatting within a Jupyter notebook.
In order to create a new iPython notebook, one starts one by
$: cd /path/to/directory
$: jupyter notebook
This will start a kernel in the current directory.
You then click on "New > Python 2" (or Python 3 if you have that one installed) and it will start a new iPython notebook.
In [1]:
# You can easily run and debug your code
%matplotlib inline
from __future__ import (absolute_import, division,
print_function, unicode_literals)
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
x = np.arange(50)
y = x.copy()
print( len(x))
In [2]:
# Printing arrays
print(x)
In [3]:
# Plotting arrays
import numpy as np
from scipy.stats import kendalltau
import seaborn as sns
sns.set(style="ticks")
rs = np.random.RandomState(11)
x = rs.gamma(2, size=1000)
y = -.5 * x + rs.normal(size=1000)
sns.jointplot(x, y, kind="hex", stat_func=kendalltau, color="#4CB391")
Out[3]:
In [4]:
# Linear Regressions
df = sns.load_dataset("iris")
sns.pairplot(df, hue="species")
Out[4]:
You can also run complex commands like the following:
In [5]:
import multiprocessing
def worker(num):
"""thread worker function"""
print ('Worker:', num)
return
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker, args=(i,))
jobs.append(p)
p.start()
It's important to take a moment to make a general mention about Python. In most cases in programming, it's fair to assume that everyone is working in the 'newest' version of the language that's stable. Python, however, is in a unique spot. Python 2.7 came out in 2010, but Python 3 started being released in 2008, and is currently up to Python 3.6, which came out in 2016. So the most recent Python 3 major update is 6 years newer. However, Python 2 and Python 3 have some differences that have meant not all Python 2 code can work with Python 3 without things being rewrittten. This means if you're using old code, you may need to use Python 2 if it hasn't been updated.
There's two basic differences to mention, but more can be found here, with examples of how the code runs in both Python 2 and Python 3.
The first big difference is how they treat division. In Python 2, division of two integers will return an integer answer. This is fine for 8/2, which will always be 4. However 7/2 will return 3 as the answer. Older code may be written to take advantage of this on purpose. Python 3, however, will not do this, and so 7/2=3.5. This is correct math, but may not be what older code wanted to have happen.
The second big difference is that there's additional formatting required for print statements. This will mean that Python 2 code with any print statements in the old format will have a syntax error when it's run in Python 3.
The below block of code is some simple Python that will print out what the Python version that is being run is. To run the block of code, select the cell and then hit shift+enter and the Python code will run, confirming which version you're using.
In [6]:
import sys
print (sys.version)
Normally you have "Python 2" code that you want to use in "Python 3". But "Python 3" code can sometimes not be compatible with "Python 3".
This is why there is a module called "future" that allows for a script to be run in both Python 2 and 3
You would need to add the following line
In [7]:
from __future__ import (absolute_import, division,
print_function, unicode_literals)
from builtins import (
bytes, dict, int, list, object, range, str,
ascii, chr, hex, input, next, oct, open,
pow, round, super,
filter, map, zip)
Adding this line at the top of your script will make it easier to use/migrate to Python 3.
This page is a Quickstart on how to convert Python 2 scripts into Python 3, and (maybe, if possible) vice versa.