AST 337 In-Class Lab #2

Wednesday, September 13, 2017

Names: [insert your names here]


In this lab, you will (1) work directly with published astronomical data from the VizieR database, (2) continue using pandas and matplotlib to read, manipulate, and plot datasets, and (3) gain experience writing functions.

On the science end, you will create color-magnitude diagrams -- the observer's H-R diagram -- for various stellar populations, compare the clusters, and relate these to the results from last week.


In [ ]:
# Load packages
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

The first step is to download the datasets we need from VizieR for the following clusters:

  • M22 (globular cluster) -- BVI photometry of M22 (Monaco+, 2004) -- we'll download this one together as a class
  • M4 (globular cluster) -- M4 UBV color-magnitude diagrams (Mochejska+, 2002)
  • M67 (open cluster) -- BVI photometry in M67 (Stassun+, 2002)
  • NGC 188 (open cluster) -- A star catalog for the open cluster NGC 188 (Stetson+, 2004)

Save each of these as semicolon delimited .tsv files in the directory with this notebook.

First, we'll need to read the datasets into pandas, as we did in last week's lab. However, VizieR gives us more information than just the raw data to look at, so we'll need to give pd.read_csv additional information so it can parse the datafile.

Let's first look at the actual M22 datafile, by opening the file in Jupyter's text editor mode. Go back to the browser tab with the directory containing this notebook, and double-click on the file you just saved, M22.tsv. Take a look at the contents of the file, then come back to this notebook.

Questions:

1) What information is contained in the header?

2) How many commented "#" lines are there before the dataset begins? (see the line numbers)

As useful as the header information is for us, pandas only needs to know where the data values start and what the column headers are.

To help pandas parse the data easily, edit the text file in the other Jupyter tab to add the '#' symbol at the beginnings of the two rows describing the column units and dashed lines. Be sure to save the text file!

We will now tell pandas to skip any commented rows and that the file is semicolon delimited by adding parameters to the read_csv function, separated by commas:

  • comment = '#'
  • delimiter = ';'

EDIT the cell below to add these parameters to the regular pd.read_csv command, then run the cell to read in the file.


In [ ]:
m22 = pd.read_csv('M22.tsv')

As before, we would like to check the table and its contents:


In [ ]:
m22

And we would like to check the column headers of the dataset and the data types (dtypes). Do so in the following two cells for the M22 dataset.


In [ ]:
# Check the columns here

In [ ]:
# Check the datatypes here. Do any columns need to be converted from object to float?

Calculating values for color-magnitude diagrams (CMDs)

We are most interested in the measured B and V magnitude columns for the stars in these clusters. However, these are both apparent magnitudes, and we want to use absolute magnitude for the V vs. B-V color-magnitude diagram.

Therefore, for each dataset we downloaded, we will need to do four things:

  • Read in the datafiles.

  • Ensure the data columns we want to manipulate have the appropriate data type. (Depending on the dataset, we might need to use the pd.to_numeric function, as in Lab1.)

  • Use the apparent magnitude and distance to calculate the absolute V magnitude (Y-axis proxy for luminosity).

  • Use the B and V magnitudes to calculate the color (X-axis proxy for temperature).

In the next steps, we are going to calculate new pandas series for absolute V magnitude and B-V color and add them to our existing M22 dataframe.

Questions:

3) What quantities are needed to calculate an absolute magnitude?

4) Do we need to use apparent magnitudes or absolute magnitudes to calculate the B-V color? Why?

Adding a new column to an existing pandas dataframe

For the X-axis of our CMD, we want to add a column to our existing data frame with the new calculated B-V value. With pandas, this is very simple -- all we have to do is define a new column label, and subtract the existing columns!

Let's first remind ourselves what the M22 dataframe/table looks like right now. We can view a snippet of the top of the table by using the following method, which shows the first five rows only:


In [ ]:
m22.head()

In pandas, dataframes have both "heads" and "tails". What do you expect the "tail" method to do? Try in the cell below.


In [ ]:

Now add a new column for B-V color to the existing M22 dataframe. We can label this column whatever we like, as long as the label isn't already used for a different column -- if the column label already exists, it will overwrite it!

Let's label the new column "BVcolor", the values of which are the differences between values in the existing B and V columns:


In [ ]:
m22['BVcolor'] = m22['Bmag'] - m22['Vmag']

Did that work? Check by viewing the table in the cell below.


In [ ]:
m22.head() # Could also do m22.columns

What if we wanted to calculate the V-I color instead of B-V? Add a new V-I column to the dataframe, and check to ensure the dataframe has updated:


In [ ]:
# Calculate a new column labeled "VIcolor"

In [ ]:
# Check for the updated dataframe column

A brief overview of functions in python

Programs often involve tasks that must be done repetitively, or there are tasks that we want to perform that are common to many programs. We will write a function that uses the distance modulus equation and calculates absolute magnitudes.

An example of a standard python function is the logarithm function, which is built into python within the numpy package (short for "numerical python"). We saw last week that np.log10 takes the base 10 logarithm of the input, and returns the corresponding power:


In [ ]:
input_value = 1000.0 # define a variable
return_value = np.log10(input_value) # use that variable within a function
print(return_value) # print the output of the function, which has been saved to the new variable "return_value"

There are many, many reasons why one might want to take the log of something, so it is useful to have the log function defined once and for all in a standard python package. This way, any program that needs to take the log can do so, rather than having the user come up with it again and again. But what if the function we want to use does not exist in python?

The capability that we are seeking is provided by defining new functions. This allows us to make our own functions that are just like the log function, and can be called in a similar way. Functions are defined by the following syntax:


In [ ]:
def myfunc(arg1, arg2):
    """
    This is a function that does nothing in particular
    """ 
    print("I am a function! Here are my arguments:")
    print(arg1)
    print(arg2)
    print("I am returning my first argument now!")
    return(arg1)

This defines a very simple function. Let's walk through this declaration step by step:

The first line begins with def, then the name of the function, and then in parentheses a list of arguments for the function, then a colon. Arguments are inputs to the function. For example, the np.sin function takes an angle as an argument, and calculates the sine of that angle. In this example our function has two arguments. The number of arguments is arbitrary, and can be zero, in which case the parentheses are just left empty. It is also possible to write functions where the number of arguments is variable, and need not be the same every time the function is called.

After the define line, we begin the body of the function. Note that all the lines in the function body are indented. This indentation is IMPORTANT. In python, indentation is used to indicate that a particular line belongs to a particular function, loop, or other block of code. All the lines of the function are indented four spaces. If you're using entering this manually in ipython, either at the command line or in the notebook, you don't need to type in those four spaces by hand; the ipython shell will automatically enter them for you after seeing the def line. If you're using emacs or another text editor, you can just hit the tab key and the correct number of spaces will be entered for you.

Within the body of the function, we can enter whatever commands we like. We can print things, for example. Or do a calculation. The arguments in that appeared in parentheses in the definition are accessible within the function, and can be manipulated however we like.

At the end of the function, we have a statement that begins return. A return function causes the function to give back a value, which the calling program can print, assign to a variable, or do something else with. For example, the np.log10 function returns the base 10 log for any positive number (or array of numbers!). Return values are optional: functions don't have to return anything, and can just end.

OK, with that in mind, let's run the myfunc function above, with a string and a float number as input arguments:


In [ ]:
myfunc('star', 3.14159265)

Note that myfunc includes something called a "docstring" (denoted with the triple quotations at the start and end). This is a description of what the function does and is visible when call that function with a question mark (as below). Many companies (e.g. Google) have extensive rules about what should be included in a docstring. For example, here is a sample Google docstring.

Generally speaking, docstrings should include a description of what the function does, some notes about the input and output, and specifics about any optional inputs ("keywords") and what they do. Keep your eye out for these as we proceed as we'll be asking you to include docstrings with all of the functions that you write this semester.

You can view the docstring for any function in python (built-in ones, and ones you write yourself!) using the question mark. Try it below:


In [ ]:
myfunc?

Try writing a simple function below called test_function that takes two numbers a and b as input, multiplies them togther, then divides the product by 2.0, and returns the answer.


In [ ]:
def test_function(a,b):
    # Your docstring goes here, in triple quotations
    # Your code goes here
    # Your return statement goes here

If all went according to plan, the following should return the value 42:


In [ ]:
answer_to_life_the_universe_and_everything = test_function(28,3)

print(answer_to_life_the_universe_and_everything)

Using a function to calculate absolute magnitude

Recall that the distance modulus equation is as follows:

$M - m = -5 log10(d) + 5$

In the cell below, write a function called absmagcalc that takes in two variables (distance in parsecs and apparent magnitude), calculates the absolute magnitude, then returns the value of the absolute magnitude.


In [ ]:
# Write your function here (don't forget a docstring!):

Let's test your new function with the Sun, which has an apparent magnitude of -26.74. The Sun is, on average, located at a distance of 4.848e-6 pc from Earth.

Question:

5) What is the absolute magnitude of the Sun?


In [ ]:
# Use your new function here:

Now that we have a handy function to calculate absolute magnitudes from apparent ones, we can add a new column for absolute magnitude to our existing dataframe. First, we'll need the approximate distances to each of the clusters, provided here.


In [ ]:
# all values in parsecs
dist_m22 = 3000.0
dist_ngc188 = 1770.0
dist_m67 = 850.0
dist_m4 = 1904.5

Now we will add a new column for absolute magnitude, Mv to our existing M22 dataframe. Use your new absmagcalc function to calculate the absolute magnitudes from the distance and existing apparent V magnitude column, and provide the output for this new column below:


In [ ]:
# Edit to continue this calculation using the absmagcalc function you defined
m22['Mv'] =

In the cell below, check your dataframe to see if it has been updated with the new column:


In [ ]:
# Check the dataframe again -- do you have all the columns you need?

We are now ready to plot!

Plotting from a pandas dataframe

Using the matplotlib.pyplot skills we learned last week, we can now plot our first color magnitude diagram. One convenient aspect of pandas is that we can plot columns taken directly from the dataframe itself. For example, for M22:

  • the X-axis is the series: m22['BVcolor']
  • the Y-axis is the series: m22['Mv']

In the following exercises, you will plot a color-magnitude diagram for M22, and then load and manipulate new pandas dataframes for two open clusters and a globular cluster.

Exercise 1

Plot the V vs. B-V color-magnitude diagram for M22. Scale the plot as necessary to show all of the data clearly (there are a lot of data points, so you may want to make the symbols small using markersize = 3 or another size). Don't forget to add axes labels+units and a title.

Hint #1: When scaling the axes, think about how this plot is analogous to the H-R diagrams from last week. Which way should the axes go?

Hint #2: Using a subscript for the Y-axis, you can use M$_{V}$ to indicate absolute magnitude (double-click the cell to see).

(For plotting methods, you may find it useful to refer back to Homework0 and Lab1.)


In [ ]:
# Plot your data directly from the m22 dataframe below:

Exercise 2 (Additional Clusters)

Now you will read in the data for the other clusters you downloaded from VizieR at the beginning of the lab and plot these data. For each cluster:

  1. Comment the datafile as needed
  2. Load the datafile using pandas (pd.read_csv)
  3. Calculate the B-V color
  4. Use your absolute magnitude function to calculate M$_{V}$ from the V mag and the distance to each cluster
  5. Make plots! (multipanel and overlay)

In [ ]:
# Load datafiles into new pandas dataframes. Be sure to check the datatypes -- if necessary, use pd.to_numeric.

In [ ]:
# Calculate B-V colors for NGC 188 and M67.

# Freebie! For M4, the B-V values are already provided in the table -- no need to calculate this one.

In [ ]:
# Calculate absolute V magnitudes (recall that the distances to each cluster are given earlier in the lab)

In [ ]:
# Make a multipanel plot showing each of the four clusters. 

# In each of the panel titles, put the name of the cluster and its type (e.g., "M22 - globular"):

fig,((ax1,ax2),(ax3,ax4)) = plt.subplots(2, 2, figsize=(10,10))
fig.suptitle('Globular and Open Cluster Comparison')
# Continue the rest of the plotting below

In [ ]:
# Make an overlay plot showing all four clusters on the same axes. 

# Hint: You may want to try plotting the datasets in various orders to make sure all datasets are visible.

Exercise 3 (Comprehension Questions)

1) How do the color-magnitude diagrams (CMDs) you have plotted compare to the H-R diagrams from last week? What features are similar? What features are different?

2) Why do you think there is so much scatter in the observational data? Do you think this is an astrophysical effect or an instrumental one? What are some potential sources of error?

3) Which clusters do you think are older or younger? Rank NGC 188, M4, M22, and M67 by relative age. How can you tell?

4) Why might the main sequences be offset for each of the clusters? (Hint: How would uncertainty in a measured/estimated value shift values up/down on the Y-axis?)

5) Bonus question, if there's time: Earlier, we also calculated a V-I color column for M22. Plot the V vs. V-I CMD for M22. Why do you think the plot looks different from the V vs. B-V one?


In [ ]:
# Bonus Plot