An example of applying MLE with 2 free parameters only (erosion rate and exposure time), using the grid search method.
For more info about the method used, see the notebook Inference_Notes.
This example (a test case) is based on a generic dataset of 10Be concentration vs. depth, which is drawn from a distribution with given "true" parameters.
This notebook has the following external dependencies:
In [1]:
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns
%matplotlib inline
The mathematical model is available in the models
Python module (see the notebook Models)
In [2]:
import models
The dataset is generated using the following true parameter values
In [3]:
# the true parameters
true_erosion = 1e-3
true_exposure = 1e5
# fixed parameters
fixed_density = 2.0
fixed_inheritance = 0.
# depths and sample size
depth_minmax = [50, 500]
N = 30
# perturbations
err_magnitude = 20.
err_variability = 5.
The gendata
Python module is used to generate the dataset (see the notebook Datasets).
In [4]:
import gendata
profile_data = gendata.generate_dataset(
models.C_10Be,
[true_erosion, true_exposure, fixed_density, fixed_inheritance],
zlimits=depth_minmax,
n=N,
err=[err_magnitude, err_variability]
)
profile_data
Out[4]:
The dataset is stored as a :class:pandas.DataFrame
object.
The grid search method is implemented in the gridsearch
module (see the notebook Grid-Search for more info).
In [5]:
import gridsearch
Create a new object for setup and results
In [6]:
gstest = gridsearch.CosmogenicInferenceGC(description='test case')
Set the data
In [7]:
gstest.set_profile_measured(
profile_data['depth'].values,
profile_data['C'].values,
profile_data['std'].values,
None,
)
Set the model
In [8]:
def C_10Be_2p(depth, erosion, exposure):
"""
A model for 10Be with 2 parameters.
"""
return models.C_10Be(depth, erosion, exposure,
fixed_density, fixed_inheritance)
gstest.set_profile_model(C_10Be_2p)
Define the parameters to fit and their search ranges / steps. The order must be the same than the order of the arguments of the function used for the model!
In [9]:
gstest.set_parameter(
'erosion_rate',
[0., 3e-3, 200j],
stats.uniform(loc=0, scale=3e-3).pdf
)
gstest.set_parameter(
'exposure_time',
[0., 2e5, 200j],
stats.uniform(loc=0, scale=2e5).pdf
)
Grid search setup summary
In [10]:
print gstest.setup_summary()
Perform Maximum likelihood estimation on the search grid
In [11]:
gstest.compute_mle()
Get the MLE (i.e., the parameter values at the maximum likelihood), in the same order than the definition of the parameters
In [12]:
gstest.mle
Out[12]:
Plot the profile log-likelihood for each parameter. The blue lines represent the difference between the profile log-likelihood and the maximum log-likelihood, The intersections between the blue line and the black lines define the confidence intervals at the given confidence levels (based on the likelihood ratio test). The red lines indicate the true values.
In [13]:
%matplotlib inline
def plot_proflike1d(cobj, pname, clevels=[0.68, 0.95, 0.997],
true_val=None, ax=None):
p = cobj.parameters[pname]
pindex = cobj.parameters.keys().index(pname)
x = cobj.grid[pindex].flatten()
proflike = cobj.proflike1d[pindex]
if ax is None:
ax = plt.subplot(111)
difflike = proflike - cobj.maxlike
ax.plot(x, difflike, label='profile loglike')
ccrit = gridsearch.profile_likelihood_crit(
cobj.proflike1d[pindex],
cobj.maxlike,
clevels=clevels
)
ccrit -= cobj.maxlike
for lev, cc in zip(clevels, ccrit):
l = ax.axhline(cc, color='k')
hpos = x.min() + (x.max() + x.min()) * 0.05
ax.text(hpos, cc, str(lev * 100),
size=9, color = l.get_color(),
ha="center", va="center",
bbox=dict(ec='1',fc='1'))
if true_val is not None:
ax.axvline(true_val, color='r')
plt.setp(ax, xlabel=pname,
ylabel='profile log-like - max log-like',
xlim=p['range'][0:2],
ylim=[ccrit[-1], 0.])
def plot_proflike1d_all(cobj, n_subplot_cols=2, **kwargs):
true_vals = [true_erosion, true_exposure]
n_subplots = len(cobj.parameters)
n_subplot_rows = int(math.ceil(1. *
n_subplots /
n_subplot_cols))
fig, aax = plt.subplots(nrows=n_subplot_rows,
ncols=n_subplot_cols,
**kwargs)
axes = aax.flatten()
fig.text(0.5, 0.975,
"Profile log-like: " + cobj.description,
horizontalalignment='center',
verticalalignment='top')
for i, pname in enumerate(cobj.parameters.keys()):
ax = axes[i]
plot_proflike1d(cobj, pname,
true_val=true_vals[i],
ax=ax)
plt.tight_layout()
plt.subplots_adjust(top=0.93)
plot_proflike1d_all(gstest, figsize=(8, 4))
Show the log-likelihood vs. erosion rate and exposure. Confidence regions are also shown (also based on the likelihood ratio test).
In [14]:
def plot_proflike2d(cobj, p1p2, ax=None,
cmap='Blues', show_colorbar=True):
pname1, pname2 = p1p2
idim = cobj.parameters.keys().index(pname2)
jdim = cobj.parameters.keys().index(pname1)
if ax is None:
ax = plt.subplot(111)
X, Y = np.meshgrid(cobj.grid[idim].flatten(),
cobj.grid[jdim].flatten())
difflike = cobj.proflike2d[idim][jdim] - cobj.maxlike
ccrit = gridsearch.profile_likelihood_crit(
cobj.proflike2d[idim][jdim],
cobj.maxlike,
clevels=[0.68, 0.95]
)
ccrit -= cobj.maxlike
contours = np.linspace(np.median(difflike),
0,
10)
P2D = ax.contourf(Y, X, difflike,
contours,
cmap=plt.get_cmap(cmap))
ci68 = ax.contour(Y, X, difflike,
[ccrit[0]], colors='w',
linestyles='solid')
plt.clabel(ci68, fontsize=8, inline=True,
fmt='68')
ci95 = ax.contour(Y, X, difflike,
[ccrit[1]], colors=['k'],
linestyles='solid')
plt.clabel(ci95, fontsize=8, inline=True,
fmt='95')
plt.setp(ax, xlabel=pname1, ylabel=pname2)
if show_colorbar:
plt.colorbar(P2D, ax=ax)
ax.axhline(true_exposure, color='r')
ax.axvline(true_erosion, color='r')
fig = plt.figure(figsize=(6, 6))
ax = plt.subplot(111)
plot_proflike2d(gstest, ('erosion_rate', 'exposure_time'), ax=ax)
Plot the measured concentrations and the predicted profile corresponding to the best fitted data model
In [15]:
sns.set_context('notebook')
depths = np.linspace(profile_data['depth'].min(),
profile_data['depth'].max(),
100)
Cm_fitted = models.C_10Be(depths,
gstest.mle[0], gstest.mle[1],
fixed_density,
fixed_inheritance)
Cm_true = models.C_10Be(depths, true_erosion, true_exposure,
fixed_density, fixed_inheritance)
plt.figure()
plt.plot(Cm_fitted, -depths, label='best-fitted model')
plt.plot(Cm_true, -depths, color='r', label='true model')
plt.errorbar(profile_data['C'],
-profile_data['depth'],
xerr=profile_data['std'],
fmt='o', markersize=4,
label='data')
plt.setp(plt.gca(),
xlabel='10Be concentration [atoms g-1]',
ylabel='-1 * depth [cm]',
xlim=[0, None], ylim=[None, 0])
plt.legend(loc='lower right')
Out[15]:
2 free parameters
As the sample size increases, keeping fixed the error magnitude, we can see that the shapes of the profile log-likelihoods tend to be closer to a parabola (asymptotic normality) and the confidence intervals narrow up. For very small sample sizes (< 10), the confidence intervals are larger and the assumption of the normality is not reasonable.
For small sample sizes, we can also see that the maximum likelihood estimator is not always close to the true parameter values (sometimes the true values lie outside the 65% - or even 95% - confidence intervals). The results change from one generated dataset to another, setting the same perturbation parameters. It shows that the maximum likelihood estimator is biased for finite sample sizes. This bias is visible even for sample sizes $\sim$ 100 to 500. With very large sample sizes, we can see that it converges towards the true values.
The shape of the likelihood and the confidence regions show the positive correlation (non-linear relationship) between erosion rates and exposure time.
In [ ]: