This tutorial is designed to help a new user of the SNCosmo library get accustomed to running built-in analyses of (simulated or otherwise) data files. It describes some more background in terms of statistics, supernova data and the results of running the analyses in SNCosmo. This is useful for people new to SNCosmo setup, SN data, or the methods being used

For you Python wunderkinds, taking a look at the Light Curve Fitting page, or even just the Fitting Photometric Data API, will probably be enough information for you to get started.

Fitting Light Curves in SNCosmo

So you've installed SNCosmo correctly, then initialized some models and now have some light curves (simulated or otherwise). You might want to fit your data (simulated or real) to a parameterized model.

SNCosmo provides three built-in fitting functions, fit_lc(), mcmc_lc(), and nest_lc(), that can be used to find best-fit parameters given a set of light curves, and a model to which to fit.

General Fitting Requisites

While they have some differing usage, all fitting functions require the following input arguments: a data set, model to which to be fit, and a list of the parameters you wish to fit (vparams). See the Fitting Photometric Data API for the correct data type for each of these inputs.

Each fitting function returns two objects: 1) a Result object, a dict which contains metadata about the fit, and 2) an instance of the Model class, where the parameters that were varied are replaced with the values of the fitted parameters. Specific examples for each function are shown below.

Provided your variable names are (following the principle of least surprise) something along the lines of data and model (in addition to the list of parameters to be varied), then a standard initialization of SNCosmo could look like:



In [1]:

    
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

import sncosmo
import triangle



In [2]:

    
source = sncosmo.get_source('salt2-extended')
model = sncosmo.Model(source=source)

sample_data = sncosmo.load_example_data()
model.set(**sample_data.meta)

Statistical Modeling

There is always some uncertainty in data taking. Therefore, even if our theoretical understanding is perfectly correct (which it rarely/never? is), there will always be some uncertainty in the results. We use statistical fitting to find what we hope are the "true" values for various parameters in the data, and to also quantify how large the possibility is that we arrive at the incorrect result, and by how much our "answer" differs from the "truth".

For us, we specify an empirical light curve model to describe the photon flux from a Type 1a supernova event. It is empirical because it is based off matching observational data to a function that has parameters that can be varied to fit most Type 1a supernova light curves.

For this tutorial, we are using the SALT2 model, which has the form:

$$ SED(p, \lambda) = x_0 [ M_0(p, \lambda) + x_1 M_1(p, \lambda) ]\exp(-c CL(\lambda))$$

where $M_0, M_1 $ and $CL$ are fixed functions determined from previous data.

and the parameters, denoted as $ \theta $, that can be varied to modify this functional form are:

$$ \theta = \lbrace x_{0}, x_{1}, c, t_{0} \rbrace $$

These are the values we would like to estimate, using statistical modeling.

The Likelihood Function

We usually think of data as reflecting the predictions of an underlying physical model and a realization dependent noise. For example, we may think of the data as being a set of measurements $D = \{d_i\}$, where the same quantity may be predicted by the physical model as a function $\{f_i(\theta)\},$ where $\theta$ are parameters of the model. One may write this as $$D = f(\theta) + n$$ where $n$ is a random variable (Think of this as a vector for each i value) drawn from a probability distribution.

One can then express the probability of obtaining the data $D$ as $$P = \Pr( D \mid \theta )$$ which is now dependent on the probability distribution of $n$.

As you see, the probability statement is the probability of getting that exact data set. How do we know what that probability is? This probability is calculated using our knowledge of the how the data was collected and the factors affecting those measurements.

Sometimes, this knowledge tells us that the noise corresponding to each data point that we have measured was independent of every other measure. Then, just using basic laws of probability, the probability of getting the given data set is just the product of all the individual probabilities of each data point.

$$P = \Pr(D \mid \theta) = \prod_{i} p_{i} = \prod_{i} Pr(d_{i} \mid \theta)$$

Once we have the probability for the data set we can construct the likelihood for our parameters. The likelihood is your probablity, $P$, with the value of your data set explicitly inputted for $D$. Thus, the likelihood is a quantity that is only a function of your parameters, which we have denoted as $\theta$:

$$L(\theta) = \prod_{i}^{N} Pr(d_{i}...d_{N} \mid \theta)$$

So far, this has been general, and we haven't specified any particular form for the likelihood yet. We will look at a concrete example below.

fit_lc()

This function uses Maximum Likelihood Estimation to find the 'best fit' for the parameters. As you would probably imagine, the goal of Maximum Likelihood Estimation (MLE) is to find to the parameters that maximize the value of the likelihood. It does this by finding an inflection point of the likelihood function using the First Derivative Test. Let's work through this explicitly.

To do that, we need to find an explicit form for the likelihood, $L(\theta)$.

Let's start with the individual probability, $p_{i}$, that a data point will be a certain value. For our data, we assume that the probability distribution is a Gaussian whose mean is equal to the value calculated by your parameterized model.

$$ p_{i} = \frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp{\frac{(d_{i} - M(\theta))^{2}}{\sigma^{2}}}$$

Conceptual check: If your model was perfectly correct, and you had no noise in your data, what would this probability distribution look like?

Answer: It would be single-valued at the true value.

Now, we've assume that each data point is an independent measurement, and thus the total probability, $P$, of getting the exact data set you've measured if the product of the individual probabilities, $p_{i}$:

$$ P = \prod_{i} p_{i} = \prod_{i} \frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp{\frac{(d_{i} - M(\theta))^{2}}{\sigma^{2}}}$$

Once we plug in our values for the data, then this will become a function that only depends on the values of the parameters: our likelihood. This is the function for which we would like to find the maximum.

Obviously, the most rigorous way to accomplish this would be to write out the full form of the likelihood, and solving for the maximum. However, there's a few tricks we can employ to make this faster. (These tricks won't give you sensible results for every case, but for us should be fine.)

Trick #1 - We don't really need to find the maximum of the likelihood, just its natural logarithm In fact, this is used so often that it is referred to as the log likelihood, and has its own notation: a cursive L, $ \mathcal{L} $.



In [3]:

    
fit_res, fit_model = sncosmo.fit_lc(sample_data, model, vparam_names=['t0', 'x0', 'x1', 'c'], 
                                    bounds={'c':(-0.3, 0.3), 'x1':(-3.0, 3.0)}, minsnr=3.0)

After the fit is run, it returns a tuple (just a pair of objects).

The first object is the dict that contains the metadata (information about the fit, that isn't the values of the fitted parameters. We can look at it like this:



In [4]:

    
fit_res









    Out[4]:





       errors: OrderedDict([('t0', 0.3901949305763992), ('x0', 3.830865767960497e-07), ('x1', 0.3261206420921381), ('c', 0.0279148755306238)])
   parameters: array([  5.00000000e-01,   5.51003256e+04,   1.19024366e-05,
         5.10838367e-01,   2.10226126e-01])
      success: True
         ndof: 36
   covariance: array([[  1.52253005e-01,  -1.81942707e-08,   3.04270767e-02,
          3.75117277e-04],
       [ -1.81942707e-08,   1.46755325e-13,  -7.03606771e-08,
         -7.54341683e-09],
       [  3.04270767e-02,  -7.03606771e-08,   1.06788943e-01,
          1.42031789e-03],
       [  3.75117277e-04,  -7.54341683e-09,   1.42031789e-03,
          7.83699659e-04]])
 vparam_names: ['t0', 'x0', 'x1', 'c']
        chisq: 34.360508398774016
  param_names: ['z', 't0', 'x0', 'x1', 'c']
      message: 'Minimization exited successfully.'
        ncall: 122

The second object is the new instance of the sncosmo.Model class which has all the varied parameters set to the values that were found with whatever fit you did.

We can look at it, and even compare it to the original model:



In [5]:

    
print fit_model, model









    



<Model at 0x108f1bf90>
source:
  class      : SALT2Source
  name       : 'salt2-extended'
  version    : 1.0
  phases     : [-20, .., 50] days
  wavelengths: [300, .., 18000] Angstroms
parameters:
  z  = 0.5
  t0 = 55100.325556689495
  x0 = 1.1902436597436016e-05
  x1 = 0.51083836650092129
  c  = 0.21022612556663939 <Model at 0x1081205d0>
source:
  class      : SALT2Source
  name       : 'salt2-extended'
  version    : 1.0
  phases     : [-20, .., 50] days
  wavelengths: [300, .., 18000] Angstroms
parameters:
  z  = 0.5
  t0 = 55100.0
  x0 = 1.20482820761e-05
  x1 = 0.5
  c  = 0.20000000000000001

Bayesian Statistics

Bayesian statistics treats the data, and parameters differently than the Frequentist paradigm.

In Bayesian statistics the data is considered a constant, and the parameter is the random variable. Thus, the conditional probability statement of interest becomes, $$P = \Pr ( \theta \mid D )$$ This quantity is called the posterior distribution, or often shortened to the posterior.

The posterior is related to the likelihood via Bayes' Theorem, which states that:

$$ \Pr ( \theta \mid D ) = \frac{\Pr ( \theta ) \Pr ( D \mid \theta )}{\Pr ( D ) }$$

Hopefully you recognize one thing on the right hand side of this expression: it's the probability distribution, $ \Pr ( D \mid \theta ) $, from which is the likelihood for the parameters.

In a high dimensional space sampling a distribution is often the best way to extract information from it.

In SNCosmo, there are two different Bayesian sampling algorithms, and they both sample the posterior as stated above. However, they differ in how they choose what points in the posterior to sample.

mcmc_lc()

The function runs a Monte Carlo Markov Chain (MCMC) using emcee under the hood.

Proposal density



In [6]:

    
mcmc_res, mcmc_model = sncosmo.mcmc_lc(sample_data, model, vparam_names=['t0', 'x0', 'x1', 'c'], 
                                    bounds={'c':(-0.3, 0.3), 'x1':(-3.0, 3.0)}, minsnr=3.0)

The MCMC method will generate a list of samples, which it can be helpful to bin. You can do this simply by using the hist function and the res attributes:



In [7]:

    
# this will histogram the first (zeroth index) varied parameter in the samples array
plt.hist(mcmc_res.samples[:,0])









    Out[7]:





(array([  9.00000000e+00,   9.00000000e+00,   3.00000000e+00,
          1.00000000e+01,   5.70000000e+01,   5.91000000e+02,
          3.11800000e+03,   4.54200000e+03,   1.57000000e+03,
          9.10000000e+01]),
 array([ 55096.67257557,  55097.17727053,  55097.68196549,  55098.18666045,
         55098.69135542,  55099.19605038,  55099.70074534,  55100.2054403 ,
         55100.71013526,  55101.21483022,  55101.71952519]),
 <a list of 10 Patch objects>)

You can also use the dict keys in any of your normal python functions that take that type of input parameter. For example, you can check the type and number of samples by doing:



In [8]:

    
type(mcmc_res.samples)









    Out[8]:





numpy.ndarray



In [9]:

    
len(mcmc_res.samples)









    Out[9]:





10000

nest_lc()

This function also uses Bayesian statistics, and samples the posterior distribution similarily to mcmc_lc(), however it differs in the method it uses to choose where in the distribution to sample. It uses a nested sampling algorithm.



In [10]:

    
nest_res, nest_model = sncosmo.nest_lc(sample_data, model, vparam_names=['t0', 'x0', 'x1', 'c'], 
                                    bounds={'c':(-0.3, 0.3), 'x1':(-3.0, 3.0)}, guess_amplitude_bound=True, minsnr=3.0, 
                                    verbose=True)









    



it=  1549 logz=-32.307644

If you want to compare the two Bayesian fits, then you'll need to multiply the nest samples by a weighting factor that is included in the nest_lc() res object.

Visualizing Your Fits

Once you've run one (or more) of the fitting functions, you can then choose to visualize the results in several ways.

Plot Photometric Data

You can plot the simulated data (w/ error bars), along with (predicted?) light curves with the parameters set to the values found by either the fits, or the "true" inputed parameters:



In [11]:

    
figure = sncosmo.plot_lc(sample_data, [model, mcmc_model, nest_model], 
                         pulls=True)



In [12]:

    
figure.autofmt_xdate()
figure.suptitle('Light Curve with two sets of Light curve parameters')
figure.tight_layout()
figure.savefig('../graphics/lcPlot.jpg')

Corner Plots (histograms and covariance plots packaged together)

For the two fitting algorithms based on Bayesian statistics, you can nicely view the histogrammed samples for all of the varied parameters (along with covariance plots), with a corner plot. We'll just do one here; let's pick the MCMC results:



In [13]:

    
# just syntactic sugar
mcmc = mcmc_res.vparam_names

# initialize triangle plot
mcmc_ndim, mcmc_nsamples = len(mcmc), len(mcmc_res.samples)
mcmc_samples = mcmc_res.samples

print "number of mcmc dimensions:", mcmc_ndim
print "number of mcmc samples:", mcmc_nsamples









    



number of mcmc dimensions: 4
number of mcmc samples: 10000

The above block of code creates new variables with more human-readable names for pre-existing variables. It is not strictly necessary; the corner function would understand len(mcmc) just as easily as mcmc_ndim, but it does make it easier for the user to understand what is going on.

Now that we have some neater variables, we can go ahead and plot the corner plot with the following function call:



In [14]:

    
figure_mcmc = triangle.corner(mcmc_samples, labels=[mcmc[0], mcmc[1], mcmc[2], mcmc[3]],
                         truths=[model.get(mcmc[0]), model.get(mcmc[1]),
                                 model.get(mcmc[2]), model.get(mcmc[3])],
                         range=mcmc_ndim*[0.9999],
                         show_titles=True, title_args={"fontsize": 12})

figure_mcmc.gca().annotate("mcmc sampling", xy=(0.5, 1.0), xycoords="figure fraction",
                      xytext=(0, -5), textcoords="offset points",
                      ha="center", va="top")









    Out[14]:





<matplotlib.text.Annotation at 0x109de92d0>



In [15]:

    
figure_mcmc.savefig('../graphics/LightCurveFit.jpg')

The first line might require a bit more explanation. We're calling the function corner(), which is a function in the triangle software package. It creates a set of the histograms of all the varied parameter samples, along with all the covariance plots between each unique pair of parameters, then puts all the plots together in a logical fashion.

To do this we're passing the function a few things, some of which might be confusing.

The truths parameter is equal to a list of the values of the original supernova model. This is the model that was used to created the simulated supernova. These values are the "true" parameters that we're trying to recover.

The range parameter.

If you want to histogram nest, the function is the same except for an additional parameter, weights=self._nest[0].weights

Trace Plots

Trace plots are another way to visualize your samples, and is sometimes usual to diagnose how well your fit is running.

A trace plot is a plot of the value of each sample as a function of the order in which is was sampled. Essentially, a trace plot will tell you what values your fitting algorithm was sampling, and when.

Why is this useful? We want our sampling function to eventually converge to a certain value (specifically we hope it's the "right" answer), and a trace plot is an easy way to see a) how well the fit converged, and b) how quickly.

Here's an example of a trace plot:



In [16]:

    
# plots the first varied parameter of mcmc samples (in this case I know it's t0)
# as a function of array index
plt.plot(mcmc_samples[:,0])









    Out[16]:





[<matplotlib.lines.Line2D at 0x10bdf36d0>]



In [16]: