An example of applying the MLE approach with 4 free parameters, using the lmfit package. This package implements chi-square minimization for non-linear models, which is equivalent to MLE for this problem.
For more info about the method used, see the notebook Inference_Notes.
This example (a test case) is based on a generic dataset of 10Be concentration vs. depth, which is drawn from a distribution with given "true" parameters.
This notebook has the following external dependencies:
In [1]:
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import lmfit
import seaborn as sns
%matplotlib inline
The mathematical model is available in the models
Python module (see the notebook Models)
In [2]:
import models
The dataset is generated using the following true parameter values
In [3]:
# the true parameters
true_erosion = 1e-3
true_exposure = 1e5
true_density = 2.0
true_inheritance = 0.
# depths and sample size
depth_minmax = [50, 500]
N = 30
# perturbations
err_magnitude = 20.
err_variability = 5.
The gendata
Python module is used to generate the dataset (see the notebook Datasets).
In [4]:
import gendata
profile_data = gendata.generate_dataset(
models.C_10Be,
[true_erosion, true_exposure, true_density, true_inheritance],
zlimits=depth_minmax,
n=N,
err=[err_magnitude, err_variability]
)
profile_data
Out[4]:
The dataset is stored as a :class:pandas.DataFrame
object.
The default optimization method used by lmfit is the “leastsq” function of the scipy's optimize package. This function is a wrapper around MINPACK’s lmdif and lmder algorithms. Using this method, we must provide the array of the normalized residuals rather than the chi-square.
In [5]:
def residuals(params):
"""
Normalized residuals (to be used
with the `lmfit` package).
"""
erosion = params['erosion_rate'].value
exposure = params['exposure_time'].value
density = params['soil_density'].value
inheritance = params['inheritance'].value
C_mod = models.C_10Be(profile_data['depth'],
erosion, exposure,
density, inheritance)
return (C_mod - profile_data['C']) / profile_data['std']
Define the parameters to fit, with an initial value (needed for optimization) and optionally bounds.
In [6]:
params = lmfit.Parameters()
params.add('erosion_rate', value=3e-3, min=0.)
params.add('exposure_time', value=1e4, min=0.)
params.add('soil_density', value=2.3)
params.add('inheritance', value=1e4, min=0.)
Fit the model
In [7]:
res = lmfit.minimize(residuals, params, ftol=1e-4, xtol=1e-4)
Report the results (the MLE values for each parameter and the estimation of standard errors / correlations based on the Gaussian approximation of the MLE distribution)
In [8]:
lmfit.report_fit(params)
Print some info and output messages from the optimization process
In [9]:
print res.success
print res.nfev # number of function evaluations during optimization
print res.lmdif_message
print res.message
Chi-square, degrees of freedom and reduced chi-square values for MLE
In [10]:
print res.chisqr
print res.nfree
print res.redchi
A more accurate, direct calculation of the confidence intervals (using a method similar to the profile likelihood). Note that -inf
values generally means that the CI limit has not been reached within the parameter space - if it is bounded by physically acceptable values - or within the range defined by MLE +/- 3 * standard error estimated by normal approximation (a warning messages appears because no root has been found during the inversion of the F-test).
In [11]:
ci, trace = lmfit.conf_interval(res, trace=True)
lmfit.report_ci(ci)
Calculate and Plot the $1-\alpha$ confidence regions (same method than the one used here above). Note that the interpolated contours of the confidence regions are approximative, depending on the (coarse) resolution of the grid used to calculate the p-values. The red lines indicate the true values.
In [14]:
x, y, cr = lmfit.conf_interval2d(res,'erosion_rate', 'exposure_time', 20, 30,
limits=((0., 2e-3), (0., 1.5e5)))
sns.set_style('darkgrid')
sns.set_context('notebook')
plt.contourf(x, y, cr, [0., 0.674, 0.95, 0.997, 1.],
cmap=plt.get_cmap('Blues'))
plt.colorbar()
plt.setp(plt.gca(),
xlabel='erosion rate [cm yr-1]',
ylabel='exposure time [yr]')
plt.axhline(true_exposure, color='r')
plt.axvline(true_erosion, color='r')
Out[14]:
Plot the measured concentrations and the predicted profile corresponding to the best fitted data model
In [13]:
depths = np.linspace(profile_data['depth'].min(),
profile_data['depth'].max(),
100)
Cm_fitted = models.C_10Be(depths,
res.values['erosion_rate'],
res.values['exposure_time'],
res.values['soil_density'],
res.values['inheritance'])
Cm_true = models.C_10Be(depths, true_erosion, true_exposure,
true_density, true_inheritance)
plt.figure()
plt.plot(Cm_fitted, -depths, label='best-fitted model')
plt.plot(Cm_true, -depths, color='r', label='true model')
plt.errorbar(profile_data['C'],
-profile_data['depth'],
xerr=profile_data['std'],
fmt='o', markersize=4,
label='data')
plt.setp(plt.gca(),
xlabel='10Be concentration [atoms g-1]',
ylabel='-1 * depth [cm]',
xlim=[0, None], ylim=[None, 0])
plt.legend(loc='lower right')
Out[13]:
4 free parameters
The results obtained here seem to be consistent with MLE with 4 parameters usign the Grid-Search method (see the notebook GS_test_4params).
In [13]: