Lab 11 - Curve Fitting and Isochrones

Names:


Overview

In this final lab, we are going to do something a little bit different -- we will try to determine the distance to the Andromeda Galaxy, M31! Along the way, we will learn about fitting relations to data, which you will ultimately compare with your approach to fitting isochrones to your star cluster data in the final project.

At the end of the lab, you will download a few isochrones (the same ones to be used for the final project) and plot them to start to understand differences in the HR diagram with age, composition, etc.

Data for this Lab: Cepheid variable star data

Cepheid variable stars are intrinsic variable stars which pulsate in a predictable way. Furthermore, a Cepheid star's period (how often it pulsates) is directly related to its luminosity or brightness. Cepheid variables are extremely luminous and very distant ones can be observed and measured.

In this lab, you will work with the datafile cepheids2.dat, which should be in the same folder you pulled from Github. This file contains a table of data from Sebo et al (2002, ApJS, 142, 71) for 605 Cepheid variable stars in the Large Magellanic Cloud (LMC), with the following columns:

RAh - Hour of Right Ascension (J2000)
RAm - Minute of Right Ascension (J2000)
RAs - Second of Right Ascension (J2000)
DEd - Degree of Declination (J2000)
DEm - Arcminute of Declination (J2000)
DEs - Arcsecond of Declination (J2000)
DelV - Calculated Vmag error
Period - Cepheid period in days
Bmag - The mean phase-weighted B band magnitude
Vmag - The mean phase-weighted V band magnitude
Rmag - The mean phase-weighted R band magnitude
Imag - The mean phase-weighted I band magnitude
Name - Cepheid name

You will use these data to re-derive the famous Cepheid period-luminosity relation (PLR) pioneered by Henrietta Leavitt over 100 years ago.

11.1 - Creating and Evaluating a Model: Cepheid Variable Stars


In [ ]:
# The standard fare
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import astropy.io.fits as fits
import os.path
%matplotlib inline

In [ ]:
# Use pandas to read in the cepheid data file as a new dataframe. HINT: use "delim_whitespace = True".

cepheids =

In [ ]:
# Examine the data frame - did everything read in correctly?

cepheids

We will assume that the log of the period of the star's pulsation, P, and its apparent magnitude, V, follow a linear relationship, and can be given by the form:

$V = c_{1} log P + c_{2}$

Where $c_{1}$ and $c_{2}$ are simply the parameters for the linear relation.


In [ ]:
# Add a new dataframe to your column to calculate the log of the period:

cepheids['logP'] =

In the cell below, plot $V$ (y-axis) vs. $log(P)$ (on the x-axis). Be sure to plot $V$ with brightness increasing upwards! Adjust the scales, labels, etc., as needed.


In [ ]:
# Your plotting code below

Now, make an educated guess for what you think would be reasonable estimates for $c_{1}$ (the slope) and $c_{2}$ (the intercept), assuming a linear fit:

$c_{1}$ =

$c_{2}$ =

Create a model

In the cell below, define the model coefficients for your first estimate of a linear fit, based on the $c_{1}$ and $c_{2}$ values you just guessed. Then, you will use a new function, np.polyval, that reads in the coefficients and your x-values (i.e., logP) and provides the y-values for a linear model with those parameters. If you are not sure how to use np.polyval, check the docstring.


In [ ]:
coeffs = [??, ??] # c1, c2, your estimates here

Vfit = # use np.polyval here to calculate the model y-values, that is, f(x_i). Form is np.polyval(coeffs, your_x_data)

In the cell below, re-plot the Cepheid data as before, but now add the estimated model fit you just calculated. Add labels to each dataset (e.g., plt.plot(xdata, ydata, format='-', label='model')) and add a legend using plt.legend().


In [ ]:
# Replot the data along with the model, adding labels to the datasets and a legend

By eye, how well does your model fit the data?
Answer:

Calculate $\chi^{2}$ by hand, providing $c_{1}$ and $c_{2}$ values:

We will calculate $\chi^{2}$ explicitly for various combinations of $c_{1}$ and $c_{2}$ values. Recall that the $\chi^{2}$ statistic is given by:

$\chi^{2}$ = $\sum\limits_{i}^{} $ $(\frac{y_{i} - f(x_{i})}{\sigma_{y,i}})^{2}$

Which column corresponds to $\sigma_{y,i}$ in your data frame?
Answer:


In [ ]:
# Check dataframe

In the cell below, now estimate the $\chi^{2}$ values for the Vfit model data you just created, using np.sum to sum over all of the data points.


In [ ]:
# Calculate chi^2:
chi_sq = 

chi_sq

Try some more values of c1, c2:

In addition to the pair you just used, run the cells again to try at least three more pairs of c1, c2 values to see how the resulting $\chi^{2}$ changes. Remember that you want to minimize the value to get the best fit.

$c_{1}$ $c_{2}$ $\chi^{2}$
## ## ##
## ## ##
## ## ##
## ## ##

Finally, use your best-fit values and make a plot of the residuals $r_{i}$ -- that is, the differences between your model points and data points, $y_{i} - f(x_{i})$. Plot this as a function of logP. We will do this twice -- once for your best $\chi^{2}$ value, and again for the worst $\chi^{2}$ value.


In [ ]:
# Make a residual plot - plot residuals from best fit here:

In [ ]:
# Make a residual plot - plot residuals from worst fit here:

Now, instead of our guess-and-check approach so far, we will let python do the heavy lifting. Look closely at the comments in the code below to understand how it works. curve_fit returns two variables. The first is a list of best-fitting model parameters, and the second is the covariance matrix, which we won't delve into, but you can assume it is a matrix whose diagonal contains the uncertainties on our fit.

The only place you need to modify in the cell below is the initial guess, p0.


In [ ]:
def linear_func(x, m, b):
    """
    Function to calculate a straight line.
    
    Inputs:
    x: np.ndarray, float
        the positions at which to evaluate the function
    m: float
        the slope of the straight line
    b: float
        the intercept of the straight line
        
    Returns:
    np.ndarray, float
        the values predicted by our model at positions x
    """
    return m*x + b



from scipy.optimize import curve_fit

# We need an initial guess for our starting parameters, just like in the version calculated by hand.
p0 = [??, ??] # enter some estimates here

"""
The actual fitting takes place below --

1st argument: function that defines our model
2nd and 3rd arguments: our x and y data
4th argument: starting guess for the parameters
5th argument (optional): the errorbars. Without errors, curve_fit will do 
                        a sum-of-the-squares statistic instead of chi^2.
"""

# Remove infs and nans first, otherwise fit will fail:
logP = np.array(cepheids['logP'])
Vmag = np.array(cepheids['Vmag'])
Verr = np.array(cepheids['DelV'])

logP[np.isinf(logP)] = 0
Vmag[np.isinf(Vmag)] = 0
Verr[np.isinf(Verr)] = 0

popt, pcov = curve_fit(linear_func, logP, Vmag, p0)

# Print the fits, and use np.diag to pick out the diagonal elements of the covariance matrix.
print(popt)
print(np.diag(pcov))

m, b = popt
em, eb = np.diag(pcov)

# And we can print it out in a nicer format:
print('Slope     = {0:.3f} +/- {1:.3f}'.format(m, em))
print('Intercept = {0:.3f} +/- {1:.3f}'.format(b, eb))

Now we can compare the results from our best fit and the curve_fit estimate. Below, replot the data, your best fit, and the best fit from the curve_fit analysis on a single figure, with legend. How do they compare?
Answer:


In [ ]:
# Replot data, along with your best-estimate model and the curve_fit model. Add labels to the datasets and a legend.

11.2 - Applying a Model: Find the Distance to M31!

Knowing that the distance to the Large Magellanic Cloud is $d_{LMC} = 48$ kpc, say you are now Edwin Hubble observing M31, the Andromeda Galaxy, over several weeks in 1923. You discover a Cepheid variable star!

The newly-found star in M31 has a period $P = 39.81 \pm 0.01$ days and an average apparent V magnitude $V = 18.61$. How far away is M31?

Start by calculating the distance modulus for the LMC, $\mu = V - M_{V}$:


In [ ]:
# Calculate distance modulus here

d = 48000. # pc

mu_LMC =

In [ ]:
mu_LMC

Use the period for the new Cepheid in M31 and your derived relation for the LMC in 11.1 to determine the apparent magnitude of the M31 star, if it were located in the LMC.


In [ ]:
c1 = 
c2 = 
V =

In [ ]:
V

Now use the distance modulus for the LMC to determine $M_{V}$ of a star with that period:


In [ ]:
M_V =

In [ ]:
MV

Finally, use the actual apparent magnitude of the M31 Cepheid to determine the distance to M31 in parsecs:


In [ ]:
d_M31 = 

d_M31

With a quick query to Google, what is the actual distance to M31 in parsecs?


In [ ]:
d_M31_actual = 

d_M31_actual

In the cell below, calculate the percent error on your measurement compared to the actual distance to M31. How did you do?


In [ ]:
# what's the percent error?
error = 

error

11.3 - Working with Isochrone Data

In this directory, there are six different isochrone files at a variety of ages. Note that they are all the same stellar composition -- for the final project, you will also have access to different isochrones with various metallicities on Moodle.

Read the isochrone datasets into separate pandas dataframes:


In [ ]:
# Read in data files

Now, plot color-magnitude diagrams from the isochrone dataframes. Use the same filters as you're using for the final project (e.g., V vs. V-R, R vs. R-I, etc). You may have to add new columns to the dataframes to calculate the necessary quantities. Plot all six isochrones on the same figure, using color-coding and indicating the ages with labels and a legend.


In [ ]:
# Your multi-isochrone figure here

Describe the isochrones qualitatively. What are the major differences between the different ages?
Answer:

What stages of stellar evolution are present?
Answer:

What mass range of stars are we exploring?
Answer:

Finally, load in your calibrated M52 data from last week, and include the color-magnitude data on your multi-isochrone plot.


In [ ]:
# Load in last week's M52 data that you used to make a CMD, and plot the isochrones on top

Now, select one isochrone and add X- and Y- shifts as need when plotting to try to align the isochrone with the M52 data. By how much did you have to shift in each axis?
Answer:


In [ ]:
# Replot, with just one isochrone, shifted to fit.

Unfortunately, isochrones do not have a nice analytical form like the linear relations we've been fitting. (Stars are complex!) For the final project, you will repeat the process above for your own cluster data -- a "chi-by-eye" approach -- but will also quantify this with an estimate of the chi^2 value in one axis.