Our hands-on exercises will use the Python programming language. No previous experience with Python is necessary!
Python is a powerful tool out of the box, but its strengh comes from the huge number of external packages. The following are vital for data analysis:
We'll cover the basics of Python itself and then dive in to some applications to explore each of these packages.
We are using the Jupyter Notebook interface, which like Mathematica allows you to intermingle computer code with generated plots and output.
The notebook contains two main types of cells: Markdown
cells for text, like this one...
In [ ]:
# and "code" cells for computation and output, like this one! Press Shift-Enter to execute it.
2+2
Double-click in this text cell to edit it:
WRITE YOUR NAME HERE
and then press Shift-Enter to execute it.
In [ ]:
You can change the types of cells as well.
This cell is currently a code cell--change it to Markdown using the toolbar at the top of the page.
Python and all of the packages that we discuss in this tutorial are open source software, so there are multiple options for installing them.
You should have already installed Python before our tutorial, but this section is here for your future reference.
The most current version of Python is 3.5, but there is a large existing codebase of Python 2.7 code, so you may see both in the wild. Unfortunately there are small differences between the two versions. Here is one guide to the differences.
Anaconda and Canopy are free, self-contained Python distributions for Linux, Mac, Windows. Both come with an excellent selection of up-to-date scientific Python libraries, including Astropy.
TIP These are both great Python environments for everyday use. If you need the latest version of Python in a headless environment like a cluster or web server, consider the lightweight Miniconda distribution.
You can install multiple versions of python alongside each other using conda environments: for example, to run python 2.7 as well as 3.5. Documentation is here; here is the key idea:
conda create --name summerschool python=3.5 anaconda
source activate summerschool
Most operating systems already come with Python. Release schedules vary, but generally the 'testing' versions of Debian and Ubuntu come with very up-to-date versions of Python. Debian Jessie and Ubuntu 13.10 (Saucy Salamander) both come with the latest version of Astropy. On these systems, you can set up a pretty complete Python environment with:
$ sudo apt-get install ipython-notebook python-matplotlib python-scipy python-pip python-astropy
Every version of Mac OS comes with a Python interpreter, but it's slightly easier to obtain Matplotlib and Numpy if you use a package manager such as MacPorts, HomeBrew, or Fink. I use MacPorts (and contribute to it, too), so that's what I suggest. Install MacPorts and then do:
$ sudo port install py35-pip py35-matplotlib py35-scipy py35-astropy py35-ipython +notebook
For Python 2.7, replace 35 above with 27.
The Python community has developed a huge collection of packages for various tasks. The best place to find Python libraries is the Python Package Index (PyPI, https://pypi.python.org/).
No matter how you installed Python, you can install any Python package from PyPI using pip. For example:
$ pip install astroquery
or
$ pip install --user astroquery
In Anaconda, you can also install some popular packages using conda
. For example:
$ conda install astropy
On Debian or Ubuntu, you can get many Python packages with apt-get
and on MacPorts with port
.
We're going to give you a whirlwind tour of Python--just enough to get you up to speed for the rest of this week's hands-on exercises. Because Python is widely used outside of astronomy, there are lots of great resources to learn more when you are ready!
In [ ]:
4-10
In [ ]:
2*3
Python's numeric types include integers and both real and complex floating point numbers:
In [ ]:
a = 30 # an integer
#b = 030 # WARNING! octal in python 2, =24 in decimal; probably not what you meant
#b = 0o30 # python 3 octal
c = 3.14159 # a floating point number
d = 5.1e10 # scientific notation
e = 2.5 + 5.3j # a complex number
hungry = True # boolean literal
need_coffee = False # another boolean literal
The expressions above assign values to the variables, so we can use (or modify) the variables instead of the values.
By the way, all of the text on a given line after the trailing hash sign (#
) is a comment, ignored by Python.
The arithmetic operators in Python are similar to C, C++, Java, and so on. There is addition (and subtraction):
In [ ]:
a + c
Multiplication:
In [ ]:
a * e
Division:
In [ ]:
a / c
But beware that, like in C, C++, Java, etc., in python 2, division of integers gives you integers. In Python 3, division of integers is true division.
In [ ]:
7 / 3
If you want true division in python 2, convert one or both of the operands to floating point. Or even better, use the future
module to make your python 2 code compatible with python 3!
If you are using Python 2.2 or higher, you can enable true division by putting the following statement at the top of your script:
from __future__ import division
However, in both Python 2 and Python 3, the double-slash operator //
represents integer division:
In [ ]:
a = 7
b = 3
float(a) / b
In [ ]:
7 // 3
The %
sign is the remainder (modulus) operator:
In [ ]:
32 % 26
Exponentiation is accomplished with the **
operator:
In [ ]:
5 ** 3
In [ ]:
# COMPLETE
In [ ]:
hello = 'Hello world!'
In the iPython notebook, executing a cell with an expression in it returns that expression, evaluated:
In [ ]:
hello
To display strings in functions, etc., we will print
them:
In [ ]:
print('Hello, world!')
This is a Python statement, consisting of the built-in command print
and a string surrounded by single quotes.
Note: print
changed from a built-in statement to a function in Python 3. If you are using Python 3, just put the arguments in parentheses. To get this behavior in previous versions of Python, put this at the top of your script:
from __future__ import print_function
If you need single or double quotes inside a string, use the opposite kind outside the string:
In [ ]:
print('She said, "Hello, world!"')
print("She said, 'Hello, world!'")
If you need a string that contains newlines, use triple quotes ('''
) or triple double quotes ("""
):
In [ ]:
print("""MIRANDA
O, wonder!
How many goodly creatures are there here!
How beauteous mankind is! O brave new world
That has such people in't!""")
Strings are concatenated (joined) by "adding" them:
In [ ]:
'abc' + 'def'
Part of the magic of Python is that every datatype in python is an object. This means that, along with data there are methods that operate on that data. Strings have many useful methods built-in. Let's use a filename as an example:
In [ ]:
filename = '../data/PTF_d022683_f02_c06_u000114210_p12_refimg.fits'
If we want to check if a file ends with a certain suffix, we can use the .endswith()
method:
In [ ]:
filename.endswith('fits')
We can split the filename into pieces (a list--see the next section!) based on certain characters:
In [ ]:
filename.split('/')
Note that we can get help on Python functions with help
.
In [ ]:
help(filename.split)
Or if you are typing them in a notebook, try pressing Shift-Tab after you type the function name!
In [ ]:
# put your cursor after "split" below and type Shift-Tab a few times:
filename.split
The dir
function lists the methods of a given python object. Methods with names surrounded by __double underscores__
are internal; you can ignore them for now.
In [ ]:
dir(filename)
In [ ]:
# COMPLETE
Lists are flexible containers which can contain python objects of any type. You can add or remove items or alter existing items. Lists are surrounded by square brackets and have items separated by commas.
In [ ]:
some_list = ['a','b',3,10]
Once you have made a list, you might want to retrieve a value from it. You index a list with square brackets, starting from zero:
In [ ]:
some_list[0]
In [ ]:
some_list[1]
You can access whole ranges of values using slice notation:
In [ ]:
some_list[1:4]
Or, to count backward from the end of the list, use a negative index:
In [ ]:
some_list[-2]
Lists can be nested:
In [ ]:
another_list = [4,5,[6,7,8]]
In [ ]:
another_list[-1]
In [ ]:
another_list[-1][1]
len
gives the number of items in a list, but note that nested lists only count as one object:
In [ ]:
len(another_list)
Strings can be treated just like lists of individual charaters:
In [ ]:
person = 'Miranda'
print(person[3:6])
In [ ]:
your_list = ['foo', 'bar', 'bat', 'baz']
my_list = ['xyzzy', 1, 3, 5, 7]
You can change elements:
In [ ]:
my_list[1] = 2
print(my_list)
Or append elements to an existing list:
In [ ]:
my_list.append(11)
print(my_list)
Or delete elements:
In [ ]:
del my_list[0]
print(my_list)
In [ ]:
# COMPLETE
Sometimes, you want a collection that is like a list, but whose indices are strings or other Python values. That's a dictionary. Dictionaries are handy for any type of database-like operation, or for storing mappings from one set of values to another. You create a dictionary by enclosing a list of key-value pairs in curly braces:
In [ ]:
my_grb = {'name': 'GRB 130702A', 'redshift': 0.145, 'ra': (14, 29, 14.78), 'dec': (15, 46, 26.4)}
my_grb
You can index items in dictionaries with square braces []
, similar to lists, but you provide the name of the mapping element rather than a numerical position. Dictionaries may store items in any order!
In [ ]:
my_grb['dec']
or add items to them:
In [ ]:
my_grb['url'] = 'http://gcn.gsfc.nasa.gov/other/130702A.gcn3'
my_grb
or delete items from them:
In [ ]:
del my_grb['url']
my_grb
Dictionary keys can be any immutable kind of Python object: tuples, strings, integers, and floats are all fine. Values in a dictionary can be any Python value at all, including lists or other dictionaries.
In [ ]:
menu_options = {
'entrees': ['chicken', 'veggie burger', 'BBQ'],
'cheeses': ['muenster', 'gouda', 'camembert', 'mozarella'],
'dessert': 'icecream',
'number_of_guests': 42
}
In [ ]:
# COMPLETE
Sometimes you need to represent the absence of a value, for instance, if you have a gap in a dataset. You might be tempted to use some special value like -1
or 99
for this purpose, but don't! Use the built-in object None
.
In [ ]:
a = None
In Python, control flow statements such as conditionals and loops have blocks indicated with indentation. Any number of spaces or tabs is fine, as long as you are consistent within a block. Four spaces is the recommended standard.
You can use the if
...elif
...else
statement to have different bits of code run depending on the truth or falsehood of boolean expressions. For example:
In [ ]:
a = 5
if a < 3:
print("i'm in the 'if' block")
message = 'a is less than 3'
elif a == 3:
print("i'm in the 'elif' block")
message = 'a is 3'
else:
print("i'm in the 'else' block")
message = 'a is greater than 3'
print(message)
You can chain together inequalities just like in mathematical notation:
In [ ]:
if 0 < a <= 5:
print('a is greater than 0 but less than or equal to 5')
You can also combine comparison operators with the boolean and
, or
, and not
operators:
In [ ]:
if (a < 6) or (a > 8):
print('yahoo!')
In [ ]:
if not a == 5: # same as a != 5
print('a is not 5')
In [ ]:
if : # COMPLETE
print('a is an odd number less than 6!')
The comparison operator is
tests whether two Python values are not only equal, but represent the same object. Since there is only one None
object, the is
operator is particularly useful for detecting None
.
In [ ]:
food = None
if food is None:
print('No soup for you!')
else:
print('Here is your', food)
Likewise, there is an is not
operator:
In [ ]:
if food is not None:
print('Yum!')
The in
and not in
operators are handy for testing for membership in a string, list, or dictionary:
In [ ]:
if 3 in [1, 2, 3, 4, 5]:
print('indeed it is')
In [ ]:
if 'i' not in 'team':
print('there is no "i" in "team"')
When referring to a dictionary, the in
operator tests if the item is among the keys of the dictionary.
In [ ]:
d = {'foo': 3, 'bar': 5, 'bat': 9}
if 'foo' in d:
print('the key "foo" is in the dictionary')
In Python, there are just two types of loops: for
and while
. for
loops are useful for repeating a set of statements for each item in a collection (tuple, set, list, dictionary, or string). while
loops are not as common, but can be used to repeat a set of statements until a boolean expression becomes false.
In [ ]:
for i in [0, 1, 2, 3]:
print(i**2)
The built-in function range
, which returns a list of numbers, is often handy here:
In [ ]:
for i in range(4):
print(i**2)
Or you can have the range start from a nonzero value:
In [ ]:
for i in range(-2, 4):
print(i**2)
You can iterate over the keys and values in a dictionary with .items()
:
In [ ]:
for key, val in d.items():
print(key, '...', val**3)
The syntax of the while
loop is similar to the if
statement:
In [ ]:
a = 1
while a < 5:
a = a * 2
print(a)
In [ ]:
# COMPLETE
Sometimes you need a loop to create one list from another. List comprehensions make this very terse. For example, the following for
loop:
In [ ]:
a = []
for i in range(5):
a.append(i * 10)
is equivalent to this list comprehension:
In [ ]:
a = [i * 10 for i in range(5)]
You can even incorporate conditionals into a list comprehension. The following:
In [ ]:
a = []
for i in range(5):
if i % 2 == 0:
# i is even
a.append(i * 10)
can be written as:
In [ ]:
a = [i * 10 for i in range(5) if i % 2 == 0]
In [ ]:
# COMPLETE
Functions are created with the def
statement. A function may either have or not have a return
statement to send back a return value.
In [ ]:
def square(n):
return n * n
a = square(3)
print(a)
If you want to return multiple values from a function, separate them by commas:
In [ ]:
def powers(n):
return n**2, n**3
print(powers(3))
If a function returns multiple values, you can automatically unpack them into multiple variables:
In [ ]:
square, cube = powers(3)
print(square)
If you pass a mutable value such as a list to a function, then the function may modify that value. For example, you might implement the Fibonacci sequence like this:
In [ ]:
def fibonacci(seed, n):
while len(seed) < n:
seed.append(seed[-1] + seed[-2])
# Note: no return statement
seed = [1, 1]
fibonacci(seed, 10)
print(seed)
You can also give a function's arguments default values, such as:
In [ ]:
def fibonacci(seed, n=6):
"""Fill a seed list with values from the Fibonacci sequence until
it has length n."""
# If the first line of a function is a string, then that string is
# used to create online help for the function.
while len(seed) < n:
seed.append(seed[-1] + seed[-2])
# Note: no return statement
seed = [1, 1]
fibonacci(seed)
print(seed)
If a function has a large number of arguments, it may be easier to read if you pass the arguments by keyword, as in:
In [ ]:
seq = [1, 1]
fibonacci(seed=seq, n=4)
In [ ]:
def fizzbuzz(n):
#COMPLETE
fizzbuzz(20)
Python comes with an extensive standard library consisting of individual modules that you can opt to use with the import
statement.
You can import specific functions or variables with the from x import y
syntax:
In [ ]:
from math import pi
pi
or import the whole module in a "namespace" with import x
or import long_module as y
. Then you can use all of the functions and variables by calling x.func()
or y.variable
.
In [ ]:
import math
math.log(math.e)
Some particularly useful parts of the Python standard library are:
random
: random number generatorspickle
: read/write Python objects into filesos
: operating system servicesos.path
: file path manipulationsqlite3
: SQLite database accessubprocess
: launch external processesemail
: compose, parse, receive, or send e-mailpdb
: built-in debuggerre
: regular expressionsSimpleHTTPServer
: built-in lightweight web serveroptparse
: build pretty command-line interfacesitertools
: exotic looping constructsmultiprocessing
: parallel processinghelp()
and dir()
are useful for finding what commands are available in the standard library.
Use the os
module to print the current working directory. Then list its contents. (Look here for functions.)
In [ ]:
# COMPLETE
It can be important for your code to be able to handle error conditions. For example, let's say that you are implementing a sinc function:
In [ ]:
def sinc(x):
return math.sin(x) / x
print(sinc(0))
Oops! We know that by definition $\mathrm{sinc}(0) = 1$ , so we should catch this error:
In [ ]:
def sinc(x):
try:
result = math.sin(x) / x
except ZeroDivisionError:
result = 1
return result
print(sinc(0))
The built-in open
function opens a file and returns a file
object that you can use to read or write data. Here's an example of writing data to a file:
In [ ]:
myfile = open('myfile.txt', 'w') # open file for writing
myfile.write("red 1\n")
myfile.write("green 2\n")
myfile.write("blue 3\n")
myfile.close()
And here is reading it:
In [ ]:
d = {} # create empty dictionary
for line in open('myfile.txt', 'r'): # open file for reading
color, num = line.split() # break apart line by whitespace
num = int(num) # convert num to integer
d[color] = num #
print(d)
Numpy provides array operations and linear algebra to Python. A Numpy array is a bit like a Python list, but arrays usually contain only numbers and support elementwise arithmetic. For example:
In [ ]:
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = 2 * x
print(y)
By default, multiplication is elementwise:
In [ ]:
x * y
To perform matrix multiplication, either convert arrays to np.matrix
or use np.dot
:
In [ ]:
np.dot(x, y)
Numpy arrays may have any number of dimensions:
In [ ]:
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
x
An array has a certain number of dimensions denoted .ndim
:
In [ ]:
x.ndim
and the dimensions' individual lengths are given by .shape
:
In [ ]:
x.shape
and the total number of elements by .size
:
In [ ]:
x.size
You can also perform comparison operations on arrays...
In [ ]:
x > 5
Although a boolean array doesn't directly make sense in an if
statement:
In [ ]:
if x > 5:
print 'oops'
In [ ]:
if np.any(x > 5):
print('at least some elements are greater than 5')
You can use conditional expressions like indices:
In [ ]:
x[x > 5] = 5
x
Or manipulate individual rows:
In [ ]:
x[1, :] = -1
x
Or individual columns:
In [ ]:
x[:, 1] += 100
x
Other useful features include various random number generators:
In [ ]:
r = np.random.randn(10000)
And statistical functions like mean and standard devation:
In [ ]:
print(np.mean(r), np.std(r))
Matplotlib is the standard python plotting package. Line plots, histograms, scatter plots, and error bar plots are the most useful:
In [ ]:
import matplotlib.pyplot as plt
%matplotlib inline
x = np.linspace(-10, 10)
y = 1 / (1 + np.exp(x))
plt.plot(x, y)
plt.xlabel('X coordinate')
plt.ylabel('Y coordinate')
plt.title('Title')
In [ ]:
plt.hist(np.random.randn(10000))
In [ ]:
plt.scatter(np.random.rand(500),np.random.randn(500),alpha=0.5,color='red')
In [ ]:
plt.errorbar(np.arange(10), np.arange(10), yerr=np.ones(10),alpha=0.5,fmt='_')
In [ ]:
# COMPLETE
Astropy is a core Python package for astronomy. It is formed from the merger of a number of other Python astronomy packages, but also contains a lot of original code. Core features include:
astropy.constants
, astropy.units
: Physical constants, units, and unit conversionastropy.time
: Manipulation of dates and timesastropy.coordinates
: Representation of and conversion between astronomical coordinate systemsastropy.table
: Tables and gridded dataastropy.io.fits
: Manipulating FITS filesastropy.io.ascii
: Manipulating ASCII tables of many different formatsastropy.io.votable
: Virtual Observatory tablesastropy.wcs
: World Coordinate System transformationsastropy.cosmology
: Cosmological calculationsastropy.stats
: Astrostatisticsastropy.modeling
: multi-D model fitting Swiss army knifeThe Astropy project also has sevearl "affiliated packages" that have similar design but are maintained separately, including:
Let's experiment by opening up a P48 image. We'll need several modules from the Astropy package for this exercise.
In [ ]:
import astropy.coordinates as coords
import astropy.constants as const
import astropy.units as u
import astropy.io.fits
import astropy.stats
import astropy.table
import astropy.wcs
We just have time to scratch the surface of what astropy is capable of. astropy.constants
and astropy.units
are both quite useful:
In [ ]:
const.c
Unit conversion is as simple as adding .to(unit)
:
In [ ]:
const.c.to('AU/year')
In [ ]:
# COMPLETE
Let's work with one of the PTF coadded images we'll use later in the hands-on sessions.
In [ ]:
image_filename = '../data/PTF_Refims_Files/PTF_d022683_f02_c06_u000114210_p12_refimg.fits'
hdulist = astropy.io.fits.open(image_filename)
hdulist
Let's grab the first HDU ("header data unit") of this FITS file:
In [ ]:
hdu = hdulist[0]
hdu.scale('int16','old') # this file needs to be rescaled, for some reason
Then let's take a look at the contents of the header:
In [ ]:
hdu.header
Now let's plot the image data. But let's use sigma-clipping to pick a nice scale for the image.
In [ ]:
clipped_data = astropy.stats.sigma_clip(hdu.data.flatten())
mid = np.median(clipped_data)
std = np.std(clipped_data - mid)
In [ ]:
plt.figure(figsize=(20, 10))
plt.imshow(hdu.data, vmin=0, vmax=mid+std, cmap='binary')
plt.xlabel('pixel $x$')
plt.ylabel('pixel $y$')
Let's look at a star that will quickly become our favorite: it has coordinates $\alpha_\mathrm{J2000}, \delta_\mathrm{J2000} = (312.503802, -0.706603)$.
We'll open the catalog file from IPAC and find this object.
In [ ]:
catalog_filename = image_filename.replace('refimg.fits','sexcat.ctlg')
catalog_table = astropy.table.Table.read(catalog_filename)
catalog_table
The astropy SkyCoord
code enables efficient crossmatching between positions.
Let's create a coordinates object to represent the target that we are searching for:
In [ ]:
target_coord = astropy.coordinates.SkyCoord(312.503802, -0.706603,unit=u.deg)
target_coord
Now, let's match this position against the source catalog:
In [ ]:
# Coordinates of objects in catalog
catalog_coords = astropy.coordinates.SkyCoord(
ra=catalog_table['ALPHAWIN_J2000'],
dec=catalog_table['DELTAWIN_J2000'])
# Do source matching
index, separation, distance = target_coord.match_to_catalog_sky(catalog_coords)
index, separation
In uncrowded PTF fields like this one, we typically use a match radius of $1.5^{\prime\prime}$. separation
is an astropy Angle
, which has units, so we can easily test if the matched object is the correct one:
In [ ]:
separation < 1.5*u.arcsec
Here is the closest-matching row:
In [ ]:
index = np.asscalar(index)
matching_row = catalog_table[index]
matching_row['MAG_AUTO']
In [ ]:
plt.figure(figsize=(10, 10))
plt.imshow(hdu.data, vmin=mid-std, vmax=mid+std, cmap='binary', interpolation='nearest')
plt.xlim(matching_row['XWIN_IMAGE'] - 128, matching_row['XWIN_IMAGE'] + 128)
plt.ylim(matching_row['YWIN_IMAGE'] - 128, matching_row['YWIN_IMAGE'] + 128)
# I happen to know that these images are upside down; use APLPy or the like to
# automatically orient North upward
plt.gca().invert_yaxis()
plt.xlabel('pixel $x$')
plt.ylabel('pixel $y$')
Let's look at where these catalog sources appear on the image. We'll need to convert the RA, Dec in the table to pixel coordinates. That's where the World Coordinate System (WCS) transformation comes in.
In [ ]:
wcs = astropy.wcs.WCS(hdu.header)
In [ ]:
plt.figure(figsize=(10, 10))
plt.imshow(hdu.data, vmin=mid-std, vmax=mid+std, cmap='binary')
plt.xlim(matching_row['XWIN_IMAGE'] - 128, matching_row['XWIN_IMAGE'] + 128)
plt.ylim(matching_row['YWIN_IMAGE'] - 128, matching_row['YWIN_IMAGE'] + 128)
# Note: last argument is 'origin': FITS standard uses 1-based Fortran-like
# indexing, but Python uses 0-based C-like indexing. In this case, we are
# aligning these locations to a Python array, so we want 0-based indexing.
x, y = wcs.all_world2pix(catalog_coords.ra.deg, catalog_coords.dec.deg, 0)
plt.scatter(x, y, facecolor='none', edgecolor='red')
# I happen to know that these images are upside down; use APLPy or the like to
# automatically orient North upward
plt.gca().invert_yaxis()
plt.xlabel('pixel $x$')
plt.ylabel('pixel $y$')
For more sophisticated plotting of astronomical images, WCSAxes
or APLPy
are useful modules.
The catalog file above is for R-band. Load the g-band catalog also and crossmatch both catalogs against each other using a $1.5^{\prime\prime}$ match radius. Scatter plot g-r color against r-band magnitude. (You'll want to adjust the x & y limits to zoom in on the data, i.e. ignore extreme outliers - sources with MAG_AUTO
= 99.)
In [ ]:
g_catalog_filename = '../data/PTF_Refims_Files/PTF_d022683_f01_c06_u000097743_p12_sexcat.ctlg'
g_catalog_table = astropy.table.Table.read(g_catalog_filename)
g_catalog_coords = astropy.coordinates.SkyCoord( # COMPLETE
indexes, separations, distances = g_catalog_coords.match_to_catalog_sky( # COMPLETE
wmatch = separations < # COMPLETE
gmags = g_catalog_table['MAG_AUTO'][wmatch]
rmags = catalog_table['MAG_AUTO'][indexes][wmatch]
In [ ]:
plt.scatter( # COMPLETE
plt.xlim( # COMPLETE
plt.ylim( # COMPLETE
plt.xlabel('R-band magnitude')
plt.ylabel('g-R magnitude')