Maximum Likelihood Estimation - Test case - 4 free parameters

An example of applying the MLE approach with 4 free parameters, using the lmfit package. This package implements chi-square minimization for non-linear models, which is equivalent to MLE for this problem.

For more info about the method used, see the notebook Inference_Notes.

This example (a test case) is based on a generic dataset of 10Be concentration vs. depth, which is drawn from a distribution with given "true" parameters.

This notebook has the following external dependencies:


In [1]:
import math

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import lmfit
import seaborn as sns

%matplotlib inline

The mathematical (Predictive) Model

The mathematical model is available in the models Python module (see the notebook Models)


In [2]:
import models

The Data

The dataset is generated using the following true parameter values


In [3]:
# the true parameters 
true_erosion = 1e-3
true_exposure = 1e5
true_density = 2.0
true_inheritance = 0.

# depths and sample size
depth_minmax = [50, 500]
N = 30

# perturbations
err_magnitude = 20.
err_variability = 5.

The gendata Python module is used to generate the dataset (see the notebook Datasets).


In [4]:
import gendata

profile_data = gendata.generate_dataset(
    models.C_10Be,
    [true_erosion, true_exposure, true_density, true_inheritance],
    zlimits=depth_minmax,
    n=N,
    err=[err_magnitude, err_variability]
)

profile_data


Out[4]:
depth C std
0 50.000000 155469.618000 6752.276784
1 65.517241 113704.632492 8602.120938
2 81.034483 111258.289174 6001.163571
3 96.551724 83426.194061 6080.164885
4 112.068966 66420.617873 4789.861576
5 127.586207 66817.344573 4964.878276
6 143.103448 57030.047273 3971.537260
7 158.620690 50635.825827 4704.462239
8 174.137931 36542.835488 3194.466227
9 189.655172 31405.210837 3235.768950
10 205.172414 26026.062664 3540.110577
11 220.689655 20047.882312 6035.322431
12 236.206897 18769.662840 3115.630127
13 251.724138 16041.142504 2508.001181
14 267.241379 21590.400488 3365.163361
15 282.758621 18528.310813 2854.109052
16 298.275862 13845.989247 2608.262756
17 313.793103 10851.278371 2427.391173
18 329.310345 10476.673153 2222.335469
19 344.827586 14263.489929 3077.981922
20 360.344828 9995.469751 1160.166028
21 375.862069 8409.197794 1518.436935
22 391.379310 9309.767658 1644.762778
23 406.896552 6822.902575 995.175705
24 422.413793 9061.656878 1729.211031
25 437.931034 4918.669980 2196.955213
26 453.448276 9286.904748 1797.553571
27 468.965517 5425.861625 2331.584368
28 484.482759 7366.253562 1465.869598
29 500.000000 6406.698210 2417.516899

30 rows × 3 columns

The dataset is stored as a :class:pandas.DataFrame object.

Fitting the model

The default optimization method used by lmfit is the “leastsq” function of the scipy's optimize package. This function is a wrapper around MINPACK’s lmdif and lmder algorithms. Using this method, we must provide the array of the normalized residuals rather than the chi-square.


In [5]:
def residuals(params):
    """
    Normalized residuals (to be used
    with the `lmfit` package).
    
    """
    erosion = params['erosion_rate'].value
    exposure = params['exposure_time'].value
    density = params['soil_density'].value
    inheritance = params['inheritance'].value
    
    C_mod = models.C_10Be(profile_data['depth'],
                          erosion, exposure,
                          density, inheritance)
    
    return (C_mod - profile_data['C']) / profile_data['std']

Define the parameters to fit, with an initial value (needed for optimization) and optionally bounds.


In [6]:
params = lmfit.Parameters()
params.add('erosion_rate',   value=3e-3, min=0.)
params.add('exposure_time', value=1e4, min=0.)
params.add('soil_density', value=2.3)
params.add('inheritance', value=1e4, min=0.)

Fit the model


In [7]:
res = lmfit.minimize(residuals, params, ftol=1e-4, xtol=1e-4)

Report the results (the MLE values for each parameter and the estimation of standard errors / correlations based on the Gaussian approximation of the MLE distribution)


In [8]:
lmfit.report_fit(params)


[[Variables]]
     erosion_rate:      5.509446e-09 +/- 0 (0.00%) initial =  0.003
     exposure_time:     55451.32 +/- 0 (0.00%) initial =  10000
     inheritance:       2565.013 +/- 0 (0.00%) initial =  10000
     soil_density:      1.996409 +/- 0 (0.00%) initial =  2.3
[[Correlations]] (unreported correlations are <  0.100)

Print some info and output messages from the optimization process


In [9]:
print res.success
print res.nfev      # number of function evaluations during optimization
print res.lmdif_message
print res.message


True
75
Both actual and predicted relative reductions in the sum of squares
  are at most 0.000100
%s. Could not estimate error-bars

Chi-square, degrees of freedom and reduced chi-square values for MLE


In [10]:
print res.chisqr
print res.nfree
print res.redchi


25.1473660908
26
0.967206388108

A more accurate, direct calculation of the confidence intervals (using a method similar to the profile likelihood). Note that -inf values generally means that the CI limit has not been reached within the parameter space - if it is bounded by physically acceptable values - or within the range defined by MLE +/- 3 * standard error estimated by normal approximation (a warning messages appears because no root has been found during the inversion of the F-test).


In [11]:
ci, trace = lmfit.conf_interval(res, trace=True)

lmfit.report_ci(ci)


                 99.70%    95.00%    67.40%     0.00%    67.40%    95.00%    99.70%
exposure_time46441.5115549607.2704752522.7620555451.2287358549.1478762016.3568866303.14015
 soil_density   1.72095   1.82071   1.90959   1.99641   2.08592   2.18351   2.30104
 erosion_rate      -inf      -inf      -inf   0.00000   0.00112   0.00127   0.00145
  inheritance 480.067811284.868371951.952872565.021173161.955183776.312424470.77124
/home/python/PythonEnvs/pygchem_py27_0/lib/python2.7/site-packages/lmfit/confidence.py:282: UserWarning: Warning, rel_change=0.0 < 0.01  at iteration 2 and prob(erosion_rate=-0.00199999449055) = 0.000127013518873 < max(sigmas).
  warn(errmsg)

Calculate and Plot the $1-\alpha$ confidence regions (same method than the one used here above). Note that the interpolated contours of the confidence regions are approximative, depending on the (coarse) resolution of the grid used to calculate the p-values. The red lines indicate the true values.


In [14]:
x, y, cr = lmfit.conf_interval2d(res,'erosion_rate', 'exposure_time', 20, 30,
                                 limits=((0., 2e-3), (0., 1.5e5)))

sns.set_style('darkgrid')
sns.set_context('notebook')

plt.contourf(x, y, cr, [0., 0.674, 0.95, 0.997, 1.],
             cmap=plt.get_cmap('Blues'))
plt.colorbar()
plt.setp(plt.gca(),
         xlabel='erosion rate [cm yr-1]',
         ylabel='exposure time [yr]')
plt.axhline(true_exposure, color='r')
plt.axvline(true_erosion, color='r')


Out[14]:
<matplotlib.lines.Line2D at 0x7f8b4e139910>

Plot the measured concentrations and the predicted profile corresponding to the best fitted data model


In [13]:
depths = np.linspace(profile_data['depth'].min(),
                     profile_data['depth'].max(),
                     100)
Cm_fitted = models.C_10Be(depths,
                          res.values['erosion_rate'],
                          res.values['exposure_time'],
                          res.values['soil_density'],
                          res.values['inheritance'])

Cm_true = models.C_10Be(depths, true_erosion, true_exposure,
                        true_density, true_inheritance)

plt.figure()
plt.plot(Cm_fitted, -depths, label='best-fitted model')
plt.plot(Cm_true, -depths, color='r', label='true model')
plt.errorbar(profile_data['C'],
             -profile_data['depth'],
             xerr=profile_data['std'],
             fmt='o', markersize=4,
             label='data')
plt.setp(plt.gca(),
         xlabel='10Be concentration [atoms g-1]',
         ylabel='-1 * depth [cm]',
         xlim=[0, None], ylim=[None, 0])

plt.legend(loc='lower right')


Out[13]:
<matplotlib.legend.Legend at 0x7f8b4e57d550>

Observations

4 free parameters

The results obtained here seem to be consistent with MLE with 4 parameters usign the Grid-Search method (see the notebook GS_test_4params).


In [13]: