This notebook was put together by Andrew Becker for UW's [Astro 599](http://www.astro.washington.edu/users/vanderplas/Astr599/) course. Source and license info is on [GitHub](https://github.com/jakevdp/2013_fall_ASTR599/).
by Andrew Becker
Q1) How do you fit a model to data?
Define some metric that represents the "goodness of fit" of the model to data (e.g. $\chi^2$).
$\chi^2 = \sum_{i=1}^{Ndata} \left(\frac{data_i - model}{uncertainty_i}\right)^2$
And then run optimization algorithm whose sole purpose is to minimize/maximize this value by tweaking the model parameters. This is the process of maximizing likelihood / minimizing $\chi^2$.
Likelihood $\propto$ $e^{(-\chi^2/2)}$
Q2) How do you know when to stop?
As you add more complexity to the model, the goodness of fit will always improve (unless your code has a bug, and until $\chi^2=0$). The rule of thumb is that the number of terms you add should be far greater than the improvement in $\chi^2$.
For more statistical rigor (Numerical Recipes section 15.6), when adding N parameters to a model, $\Delta \chi^2_N$ it distrubted as a $\chi^2$ distribution with N degrees of freedom. Below we outline $\Delta \chi^2_N$ as a function of confidence level and N:
p | N=1 | N=2 | N=3 |
---|---|---|---|
68.3% | $\Delta \chi^2$ = 1.00 | $\Delta \chi^2$ = 2.30 | $\Delta \chi^2$ = 3.53 |
95.4% | $\Delta \chi^2$ = 4.00 | $\Delta \chi^2$ = 6.17 | $\Delta \chi^2$ = 8.02 |
99.73% | $\Delta \chi^2$ = 9.00 | $\Delta \chi^2$ = 11.8 | $\Delta \chi^2$ = 14.2 |
Q3) How do you know when you've gone too far?
"Overfitting" is a common problem in model selection. You typically want a "parsimonious" model that captures the important information using a small number of parameters. This model is frequently selected by looking at the bias vs. variance trade-off.
Bias = < data - model >
Variance = < (data - model)$^2$ >
Low Variance | High Variance | |
Low Bias | ||
High Bias |
When determining model complexity, there are other metrics to use than $\Delta \chi^2$, especially when comparing different clases of models. E.g. mean squared error:
MSE = bias$^2$ + variance
There are principled methods that help you decide the order of your models by adding a penalty to the calculation of $\chi^2$. These include the Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC):
BIC = $\chi^2$ + Nterm $\cdot$ ln(Ndata)
AIC = $\chi^2$ + Nterm $\cdot$ 2
In [1]:
%matplotlib inline
import astroML.datasets
import numpy as np
import matplotlib.pyplot as plt
In [2]:
lcs = astroML.datasets.fetch_LINEAR_sample()
In [3]:
print lcs
In [4]:
time, flux, dflux = lcs[18525697].T
In [5]:
plt.errorbar(time, flux, yerr=dflux, fmt="ro")
plt.gca().invert_yaxis()
plt.xlabel("MJD")
plt.ylabel("Mag")
plt.show()
In [6]:
from astroML.time_series import lomb_scargle
periods = np.logspace(-2, 0, 10000, base=10)
periodogram = lomb_scargle(time, flux, dflux, omega=2 * np.pi / periods, generalized=True)
In [7]:
plt.plot(periods, periodogram)
plt.semilogx()
plt.xlabel("Period (days)")
plt.ylabel("Power")
plt.show()
In [8]:
idx = np.argsort(periodogram)
print periods[idx[-1]], periodogram[idx[-1]]
In [9]:
period = periods[idx[-1]]
phase = time / period - time // period
print min(phase), max(phase)
In [10]:
plt.errorbar(phase, flux, yerr=dflux, fmt="ro")
plt.gca().invert_yaxis()
plt.xlabel("Phase")
plt.ylabel("Mag")
plt.show()
Model = A$_0$ + A$_1$ cos(2 $\pi$ phase + $\phi_1$) + A$_2$ cos(2 $\pi$ 2 $\cdot$ phase + $\phi_2$) + ... + A$_N$ * cos(2 $\pi$ N $\cdot$ phase + $\phi_N$)
with model parameters [ A$_0$ A$_1$ $\phi_1$ A$_2$ $\phi_2$ ... A$_N$ $\phi_N$ ]
In [11]:
class Fourier(object):
def __init__(self, phase, flux, dflux, nterms):
self.phase = phase
self.flux = flux
self.dflux = dflux
self.nterms = nterms
assert(self.nterms > 0)
def evaluate(self, phase, terms):
assert(len(terms) == 2 * self.nterms - 1)
model = terms[0] * np.ones(len(phase))
for i in range(self.nterms-1):
model += terms[2*i + 1] * np.cos(2 * np.pi * (i+1) * phase + terms[2*i + 2])
return model
def chi2(self, args):
model = self.evaluate(self.phase, args)
chi = (model - self.flux) / self.dflux
return np.sum(chi**2)
In [12]:
fourier = Fourier(phase, flux, dflux, 1)
mphase = np.arange(0, 1, 0.01)
model = fourier.evaluate(mphase, [10])
In [13]:
plt.plot(mphase, model, "k-")
plt.gca().invert_yaxis()
plt.xlabel("Phase")
plt.ylabel("Model mag")
plt.show()
In [14]:
fourier = Fourier(phase, flux, dflux, 2)
mphase = np.arange(0, 1, 0.01)
model = fourier.evaluate(mphase, [10, 0.1])
In [15]:
fourier = Fourier(phase, flux, dflux, 2)
mphase = np.arange(0, 1, 0.01)
model = fourier.evaluate(mphase, [10, 0.1, 0.0])
In [16]:
plt.plot(mphase, model, "k-")
plt.gca().invert_yaxis()
plt.xlabel("Phase")
plt.ylabel("Model mag")
plt.show()
In [17]:
fourier = Fourier(phase, flux, dflux, 2)
mphase = np.arange(0, 1, 0.01)
model = fourier.evaluate(mphase, [10, 0.1, np.pi])
plt.plot(mphase, model, "k-")
plt.gca().invert_yaxis()
plt.xlabel("Phase")
plt.ylabel("Model mag")
plt.show()
In [18]:
import scipy.optimize as opt
In [19]:
help(opt.fmin)
In [20]:
fourier = Fourier(phase, flux, dflux, 2)
guess = [] # this needs to be the number of paramters you want to fit for, here the mean brightness, the amplitude, and minimum
guess.append(np.mean(fourier.flux))
guess.append(np.std(fourier.flux))
In [21]:
print np.argsort(fourier.flux)[-1]
In [22]:
print fourier.flux[np.argsort(fourier.flux)[-1]]
In [23]:
print fourier.phase[np.argsort(fourier.flux)[-1]]
In [24]:
guess.append(2 * np.pi * (1 - fourier.phase[np.argsort(fourier.flux)[-1]]))
In [25]:
print guess
In [26]:
opt.fmin_bfgs(fourier.chi2, x0=[guess], disp=0, full_output=1)
Out[26]:
In [27]:
result = opt.fmin_bfgs(fourier.chi2, x0=[guess], disp=0, full_output=1)
best_fit = result[0]
print best_fit
In [28]:
plt.errorbar(fourier.phase, fourier.flux, yerr=fourier.dflux, fmt="ro")
plt.plot(mphase, fourier.evaluate(mphase, guess), "b--", label="Initial guess")
plt.plot(mphase, fourier.evaluate(mphase, best_fit), "g-", label="Best fit")
plt.gca().invert_yaxis()
plt.xlabel("Phase")
plt.ylabel("Mag")
plt.legend()
plt.show()
$\chi^2_\nu$ = $\frac{\chi2}{Npts - Nterms - 1}$
In [29]:
chi2_1 = result[1]
chi2_2 = fourier.chi2(best_fit)
print chi2_1, chi2_2, chi2_1 / (len(flux) - len(best_fit) - 1)
In [30]:
fourier2 = Fourier(phase, flux, dflux, 3)
guess2 = [] # this needs to be the number of paramters you want to fit for, now 2*3-1 = 5
guess2.append(np.mean(fourier2.flux))
guess2.append(np.std(fourier2.flux))
guess2.append(2 * np.pi * (1 - fourier2.phase[np.argsort(fourier2.flux)[-1]]))
guess2.append(0.0)
guess2.append(2 * np.pi * (1 - fourier2.phase[np.argsort(fourier2.flux)[-1]]))
result2 = opt.fmin_bfgs(fourier2.chi2, x0=[guess2], disp=0, full_output=1)
best_fit2 = result2[0]
print best_fit2
In [31]:
plt.errorbar(fourier2.phase, fourier2.flux, yerr=fourier2.dflux, fmt="ro")
plt.plot(mphase, fourier2.evaluate(mphase, guess2), "b--", label="Initial guess")
plt.plot(mphase, fourier2.evaluate(mphase, best_fit2), "g-", label="Best fit")
plt.gca().invert_yaxis()
plt.xlabel("Phase")
plt.ylabel("Mag")
plt.show()
In [32]:
fourier5 = Fourier(phase, flux, dflux, 5)
guess5 = [] # this needs to be the number of paramters you want to fit for, now 9
guess5.append(np.mean(fourier5.flux))
guess5.append(np.std(fourier5.flux))
guess5.append(2 * np.pi * (1 - fourier5.phase[np.argsort(fourier5.flux)[-1]]))
for i in range(2, 5):
guess5.append(0.0)
guess5.append(2 * np.pi * (1 - fourier5.phase[np.argsort(fourier5.flux)[-1]]))
result5 = opt.fmin_bfgs(fourier5.chi2, x0=[guess5], disp=0, full_output=1)
best_fit5 = result5[0]
print best_fit5
In [33]:
plt.errorbar(fourier5.phase, fourier5.flux, yerr=fourier5.dflux, fmt="ro")
plt.plot(mphase, fourier5.evaluate(mphase, guess5), "b--", label="Initial guess")
plt.plot(mphase, fourier5.evaluate(mphase, best_fit5), "g-", label="Best fit")
plt.gca().invert_yaxis()
plt.xlabel("Phase")
plt.ylabel("Mag")
plt.show()
In [34]:
class Fourier2(Fourier):
def __init__(self, phase, flux, dflux, nterms):
Fourier.__init__(self, phase, flux, dflux, nterms)
def makeGuess(self):
guess = []
guess.append(np.mean(self.flux))
if self.nterms > 1:
guess.append(np.std(self.flux))
guess.append(2 * np.pi * (1 - self.phase[np.argsort(self.flux)[-1]]))
for i in range(2, self.nterms):
guess.append(0.0)
guess.append(2 * np.pi * (1 - self.phase[np.argsort(self.flux)[-1]]))
return guess
In [35]:
nterms = np.arange(1, 10)
chi2 = []
for nterm in nterms:
fourier = Fourier2(phase, flux, dflux, nterm)
guess = fourier.makeGuess()
print guess
chi2.append(opt.fmin_bfgs(fourier.chi2, x0=[guess], disp=0, full_output=1)[1])
chi2 = np.array(chi2)
plt.plot(nterms, chi2)
plt.xlabel("N Terms")
plt.ylabel("Chi2")
plt.show()
In [36]:
delta_chi2 = chi2[:-1] - chi2[1:]
plt.plot(nterms[1:], delta_chi2)
plt.xlabel("N terms")
plt.ylabel("Delta chi2")
plt.show()
# You will need to figure out the significance of all these delta_chi2 as part of your homework!
In [37]:
nterms = np.arange(1, 10)
chi2 = []
BICpenalty = []
AICpenalty = []
for nterm in nterms:
fourier = Fourier2(phase, flux, dflux, nterm)
guess = fourier.makeGuess()
BICpenalty.append(len(guess) * np.log(len(fourier.phase)))
AICpenalty.append(len(guess) * 2.0)
chi2.append(opt.fmin_bfgs(fourier.chi2, x0=[guess], disp=0, full_output=1)[1])
chi2 = np.array(chi2)
BICpenalty = np.array(BICpenalty)
AICpenalty = np.array(AICpenalty)
In [38]:
print chi2
print BICpenalty
print AICpenalty
In [39]:
plt.plot(nterms, chi2, label="chi2")
plt.plot(nterms, chi2 + BICpenalty, label="BIC")
plt.plot(nterms, chi2 + AICpenalty, label="AIC")
plt.xlabel("N Terms")
plt.ylabel("Chi2")
plt.semilogy()
plt.legend()
plt.show()
In [40]:
nterms = np.arange(1, 30, 2)
chi2 = []
BICpenalty = []
AICpenalty = []
for nterm in nterms:
fourier = Fourier2(phase, flux, dflux, nterm)
guess = fourier.makeGuess()
BICpenalty.append(len(guess) * np.log(len(fourier.phase)))
AICpenalty.append(len(guess) * 2.0)
chi2.append(opt.fmin_bfgs(fourier.chi2, x0=[guess], disp=0, full_output=1)[1])
chi2 = np.array(chi2)
BICpenalty = np.array(BICpenalty)
AICpenalty = np.array(AICpenalty)
In [41]:
plt.plot(nterms, chi2, label="chi2")
plt.plot(nterms, chi2 + BICpenalty, label="BIC")
plt.plot(nterms, chi2 + AICpenalty, label="AIC")
plt.xlabel("N Terms")
plt.ylabel("Chi2")
plt.semilogy()
plt.legend()
plt.show()
In [42]:
print np.argsort(chi2+BICpenalty)[0],np.argsort(chi2+AICpenalty)[0]
In [43]:
print nterms[4], nterms[6]