An example of applying MLE with 4 free parameters, using the grid search method.
For more info about the method used, see the notebook Inference_Notes.
This example (a test case) is based on a generic dataset of 10Be concentration vs. depth, which is drawn from a distribution with given "true" parameters.
This notebook has the following external dependencies:
In [1]:
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns
%matplotlib inline
The mathematical model is available in the models
Python module (see the notebook Models)
In [2]:
import models
The dataset is generated using the following true parameter values
In [3]:
# the true parameters
true_erosion = 1e-3
true_exposure = 1e5
true_density = 2.0
true_inheritance = 0.
# depths and sample size
depth_minmax = [50, 500]
N = 30
# perturbations
err_magnitude = 20.
err_variability = 5.
The gendata
Python module is used to generate the dataset (see the notebook Datasets).
In [4]:
import gendata
profile_data = gendata.generate_dataset(
models.C_10Be,
[true_erosion, true_exposure, true_density, true_inheritance],
zlimits=depth_minmax,
n=N,
err=[err_magnitude, err_variability]
)
profile_data
Out[4]:
The dataset is stored as a :class:pandas.DataFrame
object.
The grid search method is implemented in the gridsearch
module (see the notebook Grid-Search for more info).
In [5]:
import gridsearch
Create a new object for setup and results
In [6]:
gstest = gridsearch.CosmogenicInferenceGC(description='test case')
Set the data
In [7]:
gstest.set_profile_measured(
profile_data['depth'].values,
profile_data['C'].values,
profile_data['std'].values,
None,
)
Set the model
In [8]:
gstest.set_profile_model(models.C_10Be)
Define the parameters to fit and their search ranges / steps. The order must be the same than the order of the arguments of the function used for the model!
In [9]:
gstest.set_parameter(
'erosion_rate',
[0., 3e-3, 70j],
stats.uniform(loc=0, scale=3e-3).pdf
)
gstest.set_parameter(
'exposure_time',
[0., 2e5, 70j],
stats.uniform(loc=0, scale=2e5).pdf
)
gstest.set_parameter(
'soil_density',
[1.8, 2.8, 60j],
stats.uniform(loc=1.8, scale=2.8).pdf
)
gstest.set_parameter(
'inheritance',
[0, 5e3, 60j],
stats.uniform(loc=0, scale=1e5).pdf
)
Grid search setup summary
In [10]:
print gstest.setup_summary()
Perform Maximum likelihood estimation on the search grid
In [11]:
gstest.compute_mle()
Get the MLE (i.e., the parameter values at the maximum likelihood), in the same order than the definition of the parameters
In [12]:
gstest.mle
Out[12]:
Plot the profile log-likelihood for each parameter. The blue lines represent the difference between the profile log-likelihood and the maximum log-likelihood, The intersections between the blue line and the black lines define the confidence intervals at the given confidence levels (based on the likelihood ratio test). The red lines indicate the true values.
In [13]:
%matplotlib inline
def plot_proflike1d(cobj, pname, clevels=[0.68, 0.95, 0.997],
true_val=None, ax=None):
p = cobj.parameters[pname]
pindex = cobj.parameters.keys().index(pname)
x = cobj.grid[pindex].flatten()
proflike = cobj.proflike1d[pindex]
if ax is None:
ax = plt.subplot(111)
difflike = proflike - cobj.maxlike
ax.plot(x, difflike, label='profile loglike')
ccrit = gridsearch.profile_likelihood_crit(
cobj.proflike1d[pindex],
cobj.maxlike,
clevels=clevels
)
ccrit -= cobj.maxlike
for lev, cc in zip(clevels, ccrit):
l = ax.axhline(cc, color='k')
hpos = x.min() + (x.max() + x.min()) * 0.05
ax.text(hpos, cc, str(lev * 100),
size=9, color = l.get_color(),
ha="center", va="center",
bbox=dict(ec='1',fc='1'))
if true_val is not None:
ax.axvline(true_val, color='r')
plt.setp(ax, xlabel=pname,
ylabel='profile log-like - max log-like',
xlim=p['range'][0:2],
ylim=[ccrit[-1], 0.])
def plot_proflike1d_all(cobj, n_subplot_cols=2, **kwargs):
true_vals = [true_erosion, true_exposure,
true_density, true_inheritance]
n_subplots = len(cobj.parameters)
n_subplot_rows = int(math.ceil(1. *
n_subplots /
n_subplot_cols))
fig, aax = plt.subplots(nrows=n_subplot_rows,
ncols=n_subplot_cols,
**kwargs)
axes = aax.flatten()
fig.text(0.5, 0.975,
"Profile log-like: " + cobj.description,
horizontalalignment='center',
verticalalignment='top')
for i, pname in enumerate(cobj.parameters.keys()):
ax = axes[i]
plot_proflike1d(cobj, pname,
true_val=true_vals[i],
ax=ax)
plt.tight_layout()
plt.subplots_adjust(top=0.93)
plot_proflike1d_all(gstest, figsize=(8, 6))
Show the profile log-likelihood for couples of parameters. Confidence regions are also shown (also based on the likelihood ratio test).
In [14]:
def plot_proflike2d(cobj, p1p2, ax=None,
cmap='Blues', show_colorbar=True):
pname1, pname2 = p1p2
idim = cobj.parameters.keys().index(pname2)
jdim = cobj.parameters.keys().index(pname1)
if ax is None:
ax = plt.subplot(111)
X, Y = np.meshgrid(cobj.grid[idim].flatten(),
cobj.grid[jdim].flatten())
difflike = cobj.proflike2d[idim][jdim] - cobj.maxlike
ccrit = gridsearch.profile_likelihood_crit(
cobj.proflike2d[idim][jdim],
cobj.maxlike,
clevels=[0.68, 0.95]
)
ccrit -= cobj.maxlike
contours = np.linspace(np.median(difflike),
0,
10)
P2D = ax.contourf(Y, X, difflike,
contours,
cmap=plt.get_cmap(cmap))
ci68 = ax.contour(Y, X, difflike,
[ccrit[0]], colors='w',
linestyles='solid')
plt.clabel(ci68, fontsize=8, inline=True,
fmt='68')
ci95 = ax.contour(Y, X, difflike,
[ccrit[1]], colors=['k'],
linestyles='solid')
plt.clabel(ci95, fontsize=8, inline=True,
fmt='95')
plt.setp(ax, xlabel=pname1, ylabel=pname2)
if show_colorbar:
plt.colorbar(P2D, ax=ax)
#ax.axhline(true_exposure, color='r')
#ax.axvline(true_erosion, color='r')
fig = plt.figure(figsize=(9, 9))
ax = plt.subplot(321)
plot_proflike2d(gstest, ('erosion_rate', 'exposure_time'), ax=ax)
ax2 = plt.subplot(322)
plot_proflike2d(gstest, ('exposure_time', 'inheritance'), ax=ax2)
ax3 = plt.subplot(323)
plot_proflike2d(gstest, ('erosion_rate', 'inheritance'), ax=ax3)
ax4 = plt.subplot(324)
plot_proflike2d(gstest, ('exposure_time', 'soil_density'), ax=ax4)
ax5 = plt.subplot(325)
plot_proflike2d(gstest, ('erosion_rate', 'soil_density'), ax=ax5)
ax6 = plt.subplot(326)
plot_proflike2d(gstest, ('soil_density', 'inheritance'), ax=ax6)
plt.tight_layout()
Plot the measured concentrations and the predicted profile corresponding to the best fitted data model
In [15]:
sns.set_context('notebook')
depths = np.linspace(profile_data['depth'].min(),
profile_data['depth'].max(),
100)
Cm_fitted = models.C_10Be(depths, *gstest.mle)
Cm_true = models.C_10Be(depths, true_erosion, true_exposure,
true_density, true_inheritance)
plt.figure()
plt.plot(Cm_fitted, -depths, label='best-fitted model')
plt.plot(Cm_true, -depths, color='r', label='true model')
plt.errorbar(profile_data['C'],
-profile_data['depth'],
xerr=profile_data['std'],
fmt='o', markersize=4,
label='data')
plt.setp(plt.gca(),
xlabel='10Be concentration [atoms g-1]',
ylabel='-1 * depth [cm]',
xlim=[0, None], ylim=[None, 0])
plt.legend(loc='lower right')
Out[15]:
4 free parameters
Using the Grid Search with 4 parameters, we can see that the number of points were the concentrations are calculated can quickly become insane (hundreds of millions of points, > 10 Go RAM). We are very limited by the ressources available (note that this implemenation is not optimized for memory, see the notebook Grid-Search).
Compared to the results with 2 free parameters only, using the same settings (error magnitude, sample size), we can see here that the confidence intervals are much more wide and that the shape of the likelihood is far from Gaussian.
The 2d profile log-likelihoods show negative correlations between inhertitance and both erosion rate and exposure time.
In [15]: