This notebook was put together by Andrew Becker for UW's [Astro 599](http://www.astro.washington.edu/users/vanderplas/Astr599/) course. Source and license info is on [GitHub](https://github.com/jakevdp/2013_fall_ASTR599/).
by Andrew Becker
Q1) How do you fit a model to data?
Define some metric that represents the "goodness of fit" of the model to data (e.g. $\chi^2$).
$\chi^2 = \sum_{i=1}^{Ndata} \left(\frac{data_i - model}{uncertainty_i}\right)^2$
And then run optimization algorithm whose sole purpose is to minimize/maximize this value by tweaking the model parameters. This is the process of maximizing likelihood / minimizing $\chi^2$.
Likelihood $\propto$ $e^{(-\chi^2/2)}$
Q2) How do you know when to stop?
As you add more complexity to the model, the goodness of fit will always improve (unless your code has a bug, and until $\chi^2=0$). The rule of thumb is that the number of terms you add should be far greater than the improvement in $\chi^2$.
For more statistical rigor (Numerical Recipes section 15.6), when adding N parameters to a model, $\Delta \chi^2_N$ it distrubted as a $\chi^2$ distribution with N degrees of freedom. Below we outline $\Delta \chi^2_N$ as a function of confidence level and N:
| p | N=1 | N=2 | N=3 | 
|---|---|---|---|
| 68.3% | $\Delta \chi^2$ = 1.00 | $\Delta \chi^2$ = 2.30 | $\Delta \chi^2$ = 3.53 | 
| 95.4% | $\Delta \chi^2$ = 4.00 | $\Delta \chi^2$ = 6.17 | $\Delta \chi^2$ = 8.02 | 
| 99.73% | $\Delta \chi^2$ = 9.00 | $\Delta \chi^2$ = 11.8 | $\Delta \chi^2$ = 14.2 | 
Q3) How do you know when you've gone too far?
"Overfitting" is a common problem in model selection. You typically want a "parsimonious" model that captures the important information using a small number of parameters. This model is frequently selected by looking at the bias vs. variance trade-off.
Bias = < data - model >
Variance = < (data - model)$^2$ >
| Low Variance | High Variance | |
| Low Bias | ||
| High Bias | 
When determining model complexity, there are other metrics to use than $\Delta \chi^2$, especially when comparing different clases of models. E.g. mean squared error:
MSE = bias$^2$ + variance
There are principled methods that help you decide the order of your models by adding a penalty to the calculation of $\chi^2$. These include the Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC):
BIC = $\chi^2$ + Nterm $\cdot$ ln(Ndata)
AIC = $\chi^2$ + Nterm $\cdot$ 2
In [1]:
    
%matplotlib inline
import astroML.datasets
import numpy as np
import matplotlib.pyplot as plt
    
In [2]:
    
lcs = astroML.datasets.fetch_LINEAR_sample()
    
In [3]:
    
print lcs
    
    
In [4]:
    
time, flux, dflux = lcs[18525697].T
    
In [5]:
    
plt.errorbar(time, flux, yerr=dflux, fmt="ro")
plt.gca().invert_yaxis()
plt.xlabel("MJD")
plt.ylabel("Mag")
plt.show()
    
    
In [6]:
    
from astroML.time_series import lomb_scargle
periods = np.logspace(-2, 0, 10000, base=10)
periodogram = lomb_scargle(time, flux, dflux, omega=2 * np.pi / periods, generalized=True)
    
In [7]:
    
plt.plot(periods, periodogram)
plt.semilogx()
plt.xlabel("Period (days)")
plt.ylabel("Power")
plt.show()
    
    
In [8]:
    
idx = np.argsort(periodogram)
print periods[idx[-1]], periodogram[idx[-1]]
    
    
In [9]:
    
period = periods[idx[-1]]
phase = time / period - time // period 
print min(phase), max(phase)
    
    
In [10]:
    
plt.errorbar(phase, flux, yerr=dflux, fmt="ro")
plt.gca().invert_yaxis()
plt.xlabel("Phase")
plt.ylabel("Mag")
plt.show()
    
    
Model = A$_0$ + A$_1$ cos(2 $\pi$ phase + $\phi_1$) + A$_2$ cos(2 $\pi$ 2 $\cdot$ phase + $\phi_2$) + ... + A$_N$ * cos(2 $\pi$ N $\cdot$ phase + $\phi_N$)
with model parameters [ A$_0$ A$_1$ $\phi_1$ A$_2$ $\phi_2$ ... A$_N$ $\phi_N$ ]
In [11]:
    
class Fourier(object):
    def __init__(self, phase, flux, dflux, nterms):
        self.phase = phase
        self.flux = flux
        self.dflux = dflux
        self.nterms = nterms
        assert(self.nterms > 0)
        
    def evaluate(self, phase, terms):
        assert(len(terms) == 2 * self.nterms - 1)
        model = terms[0] * np.ones(len(phase))
        for i in range(self.nterms-1):
            model += terms[2*i + 1] * np.cos(2 * np.pi * (i+1) * phase + terms[2*i + 2])
        return model
    
    def chi2(self, args):
        model = self.evaluate(self.phase, args)
        chi   = (model - self.flux) / self.dflux
        return np.sum(chi**2)
    
In [12]:
    
fourier = Fourier(phase, flux, dflux, 1)
mphase = np.arange(0, 1, 0.01)
model = fourier.evaluate(mphase, [10])
    
In [13]:
    
plt.plot(mphase, model, "k-")
plt.gca().invert_yaxis()
plt.xlabel("Phase")
plt.ylabel("Model mag")
plt.show()
    
    
In [14]:
    
fourier = Fourier(phase, flux, dflux, 2)
mphase = np.arange(0, 1, 0.01)
model = fourier.evaluate(mphase, [10, 0.1])
    
    
In [15]:
    
fourier = Fourier(phase, flux, dflux, 2)
mphase = np.arange(0, 1, 0.01)
model = fourier.evaluate(mphase, [10, 0.1, 0.0])
    
In [16]:
    
plt.plot(mphase, model, "k-")
plt.gca().invert_yaxis()
plt.xlabel("Phase")
plt.ylabel("Model mag")
plt.show()
    
    
In [17]:
    
fourier = Fourier(phase, flux, dflux, 2)
mphase = np.arange(0, 1, 0.01)
model = fourier.evaluate(mphase, [10, 0.1, np.pi])
plt.plot(mphase, model, "k-")
plt.gca().invert_yaxis()
plt.xlabel("Phase")
plt.ylabel("Model mag")
plt.show()
    
    
In [18]:
    
import scipy.optimize as opt
    
In [19]:
    
help(opt.fmin)
    
    
In [20]:
    
fourier = Fourier(phase, flux, dflux, 2)
guess   = [] # this needs to be the number of paramters you want to fit for, here the mean brightness, the amplitude, and minimum
guess.append(np.mean(fourier.flux))
guess.append(np.std(fourier.flux))
    
In [21]:
    
print np.argsort(fourier.flux)[-1]
    
    
In [22]:
    
print fourier.flux[np.argsort(fourier.flux)[-1]]
    
    
In [23]:
    
print fourier.phase[np.argsort(fourier.flux)[-1]]
    
    
In [24]:
    
guess.append(2 * np.pi * (1 - fourier.phase[np.argsort(fourier.flux)[-1]]))
    
In [25]:
    
print guess
    
    
In [26]:
    
opt.fmin_bfgs(fourier.chi2, x0=[guess], disp=0, full_output=1)
    
    Out[26]:
In [27]:
    
result = opt.fmin_bfgs(fourier.chi2, x0=[guess], disp=0, full_output=1)
best_fit = result[0]
print best_fit
    
    
In [28]:
    
plt.errorbar(fourier.phase, fourier.flux, yerr=fourier.dflux, fmt="ro")
plt.plot(mphase, fourier.evaluate(mphase, guess), "b--", label="Initial guess")
plt.plot(mphase, fourier.evaluate(mphase, best_fit), "g-", label="Best fit")
plt.gca().invert_yaxis()
plt.xlabel("Phase")
plt.ylabel("Mag")
plt.legend()
plt.show()
    
    
$\chi^2_\nu$ = $\frac{\chi2}{Npts - Nterms - 1}$
In [29]:
    
chi2_1 = result[1]
chi2_2 = fourier.chi2(best_fit)
print chi2_1, chi2_2, chi2_1 / (len(flux) - len(best_fit) - 1)
    
    
In [30]:
    
fourier2 = Fourier(phase, flux, dflux, 3)
guess2   = [] # this needs to be the number of paramters you want to fit for, now 2*3-1 = 5
guess2.append(np.mean(fourier2.flux))
guess2.append(np.std(fourier2.flux))
guess2.append(2 * np.pi * (1 - fourier2.phase[np.argsort(fourier2.flux)[-1]]))
guess2.append(0.0)
guess2.append(2 * np.pi * (1 - fourier2.phase[np.argsort(fourier2.flux)[-1]]))
result2 = opt.fmin_bfgs(fourier2.chi2, x0=[guess2], disp=0, full_output=1)
best_fit2 = result2[0]
print best_fit2
    
    
In [31]:
    
plt.errorbar(fourier2.phase, fourier2.flux, yerr=fourier2.dflux, fmt="ro")
plt.plot(mphase, fourier2.evaluate(mphase, guess2), "b--", label="Initial guess")
plt.plot(mphase, fourier2.evaluate(mphase, best_fit2), "g-", label="Best fit")
plt.gca().invert_yaxis()
plt.xlabel("Phase")
plt.ylabel("Mag")
plt.show()
    
    
In [32]:
    
fourier5 = Fourier(phase, flux, dflux, 5)
guess5   = [] # this needs to be the number of paramters you want to fit for, now 9
guess5.append(np.mean(fourier5.flux))
guess5.append(np.std(fourier5.flux))
guess5.append(2 * np.pi * (1 - fourier5.phase[np.argsort(fourier5.flux)[-1]]))
for i in range(2, 5):
    guess5.append(0.0)
    guess5.append(2 * np.pi * (1 - fourier5.phase[np.argsort(fourier5.flux)[-1]]))
result5 = opt.fmin_bfgs(fourier5.chi2, x0=[guess5], disp=0, full_output=1)
best_fit5 = result5[0]
print best_fit5
    
    
In [33]:
    
plt.errorbar(fourier5.phase, fourier5.flux, yerr=fourier5.dflux, fmt="ro")
plt.plot(mphase, fourier5.evaluate(mphase, guess5), "b--", label="Initial guess")
plt.plot(mphase, fourier5.evaluate(mphase, best_fit5), "g-", label="Best fit")
plt.gca().invert_yaxis()
plt.xlabel("Phase")
plt.ylabel("Mag")
plt.show()
    
    
In [34]:
    
class Fourier2(Fourier):
    def __init__(self, phase, flux, dflux, nterms):
        Fourier.__init__(self, phase, flux, dflux, nterms)
                
    def makeGuess(self):
        guess = []
        guess.append(np.mean(self.flux))
        if self.nterms > 1:
            guess.append(np.std(self.flux))
            guess.append(2 * np.pi * (1 - self.phase[np.argsort(self.flux)[-1]]))
            for i in range(2, self.nterms):
                guess.append(0.0)
                guess.append(2 * np.pi * (1 - self.phase[np.argsort(self.flux)[-1]]))
        return guess
    
In [35]:
    
nterms = np.arange(1, 10)
chi2   = []
for nterm in nterms:
    fourier = Fourier2(phase, flux, dflux, nterm)
    guess   = fourier.makeGuess()
    print guess
    chi2.append(opt.fmin_bfgs(fourier.chi2, x0=[guess], disp=0, full_output=1)[1])
chi2 = np.array(chi2)
plt.plot(nterms, chi2)
plt.xlabel("N Terms")
plt.ylabel("Chi2")
plt.show()
    
    
    
    
In [36]:
    
delta_chi2 = chi2[:-1] - chi2[1:]
plt.plot(nterms[1:], delta_chi2)
plt.xlabel("N terms")
plt.ylabel("Delta chi2")
plt.show()
# You will need to figure out the significance of all these delta_chi2 as part of your homework!
    
    
In [37]:
    
nterms     = np.arange(1, 10)
chi2       = []
BICpenalty = []
AICpenalty = []
for nterm in nterms:
    fourier = Fourier2(phase, flux, dflux, nterm)
    guess   = fourier.makeGuess()
    BICpenalty.append(len(guess) * np.log(len(fourier.phase)))
    AICpenalty.append(len(guess) * 2.0)
    chi2.append(opt.fmin_bfgs(fourier.chi2, x0=[guess], disp=0, full_output=1)[1])
chi2 = np.array(chi2)
BICpenalty = np.array(BICpenalty)
AICpenalty = np.array(AICpenalty)
    
In [38]:
    
print chi2
print BICpenalty
print AICpenalty
    
    
In [39]:
    
plt.plot(nterms, chi2, label="chi2")
plt.plot(nterms, chi2 + BICpenalty, label="BIC")
plt.plot(nterms, chi2 + AICpenalty, label="AIC")
plt.xlabel("N Terms")
plt.ylabel("Chi2")
plt.semilogy()
plt.legend()
plt.show()
    
    
In [40]:
    
nterms     = np.arange(1, 30, 2)
chi2       = []
BICpenalty = []
AICpenalty = []
for nterm in nterms:
    fourier = Fourier2(phase, flux, dflux, nterm)
    guess   = fourier.makeGuess()
    BICpenalty.append(len(guess) * np.log(len(fourier.phase)))
    AICpenalty.append(len(guess) * 2.0)
    chi2.append(opt.fmin_bfgs(fourier.chi2, x0=[guess], disp=0, full_output=1)[1])
chi2 = np.array(chi2)
BICpenalty = np.array(BICpenalty)
AICpenalty = np.array(AICpenalty)
    
In [41]:
    
plt.plot(nterms, chi2, label="chi2")
plt.plot(nterms, chi2 + BICpenalty, label="BIC")
plt.plot(nterms, chi2 + AICpenalty, label="AIC")
plt.xlabel("N Terms")
plt.ylabel("Chi2")
plt.semilogy()
plt.legend()
plt.show()
    
    
In [42]:
    
print np.argsort(chi2+BICpenalty)[0],np.argsort(chi2+AICpenalty)[0]
    
    
In [43]:
    
print nterms[4], nterms[6]