Maximum Likelihood Estimation - Test case - 2 free parameters

An example of applying the MLE approach with 2 free parameters (soil density and inheritance are fixed), using the lmfit package. This package implements chi-square minimization for non-linear models, which is equivalent to MLE for this problem.

For more info about the method used, see the notebook Inference_Notes.

This example (a test case) is based on a generic dataset of 10Be concentration vs. depth, which is drawn from a distribution with given "true" parameters.

This notebook has the following external dependencies:


In [1]:
import math

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import lmfit
import seaborn as sns

%matplotlib inline

The mathematical (Predictive) Model

The dataset is generated using the following true parameter values


In [2]:
import models

The Data

The gendata Python module is used to generate the dataset (see the notebook Datasets).


In [3]:
# the true parameters 
true_erosion = 1e-3
true_exposure = 1e5

# fixed parameters
fixed_density = 2.0
fixed_inheritance = 0.

# depths and sample size
depth_minmax = [50, 500]
N = 30

# perturbations
err_magnitude = 20.
err_variability = 5.

The gendata Python module is used to generate the dataset (see the notebook Datasets).


In [4]:
import gendata

profile_data = gendata.generate_dataset(
    models.C_10Be,
    [true_erosion, true_exposure, fixed_density, fixed_inheritance],
    zlimits=depth_minmax,
    n=N,
    err=[err_magnitude, err_variability]
)

profile_data


Out[4]:
depth C std
0 50.000000 156014.242708 4227.672408
1 65.517241 126583.204995 3571.375199
2 81.034483 95759.286561 5402.679723
3 96.551724 104846.238712 6145.589660
4 112.068966 71148.598973 7347.499093
5 127.586207 71506.417829 4754.698483
6 143.103448 60387.142752 4588.513302
7 158.620690 46929.931191 4566.520236
8 174.137931 43047.274717 5474.154989
9 189.655172 34214.363147 4037.187316
10 205.172414 27657.094132 4321.698254
11 220.689655 25680.920984 2964.600724
12 236.206897 28235.147199 3432.169702
13 251.724138 12861.687525 1874.088853
14 267.241379 17386.712152 1848.073489
15 282.758621 17339.599282 1723.271906
16 298.275862 17279.628470 3238.563171
17 313.793103 15496.604824 1819.729012
18 329.310345 15133.305987 2076.297850
19 344.827586 14230.335335 3032.005288
20 360.344828 10680.077347 1618.137610
21 375.862069 7753.571063 1235.788503
22 391.379310 11005.429020 2034.369312
23 406.896552 8409.211399 1870.467510
24 422.413793 6448.310732 1823.961096
25 437.931034 5710.982585 1545.511152
26 453.448276 8149.608919 2509.640326
27 468.965517 7198.849297 851.008453
28 484.482759 7960.813410 1743.290417
29 500.000000 7008.695301 1936.426733

30 rows × 3 columns

The dataset is stored as a :class:pandas.DataFrame object.

Fitting the model

The default optimization method used by lmfit is the “leastsq” function of the scipy's optimize package. This function is a wrapper around MINPACK’s lmdif and lmder algorithms. Using this method, we must provide the array of the normalized residuals rather than the chi-square.


In [5]:
def residuals(params):
    """
    Normalized residuals (to be used
    with the `lmfit` package).
    
    """
    erosion = params['erosion_rate'].value
    exposure = params['exposure_time'].value
    
    C_mod = models.C_10Be(profile_data['depth'],
                          erosion, exposure,
                          fixed_density, fixed_inheritance)
    
    return (C_mod - profile_data['C']) / profile_data['std']

Define the parameters to fit, with an initial value (needed for optimization) and optionally bounds.


In [6]:
params = lmfit.Parameters()
params.add('erosion_rate',   value=1e-3, min=0.)
params.add('exposure_time', value=1e5, min=0.)

Fit the model


In [7]:
res = lmfit.minimize(residuals, params, ftol=1e-4, xtol=1e-4)

Report the results (the MLE values for each parameter and the estimation of standard errors / correlations based on the Gaussian approximation of the MLE distribution)


In [8]:
lmfit.report_fit(params)


[[Variables]]
     erosion_rate:      0.00106598 +/- 0.0001087535 (10.20%) initial =  0.001
     exposure_time:     105020.3 +/- 9199.986 (8.76%) initial =  100000
[[Correlations]] (unreported correlations are <  0.100)
    C(erosion_rate, exposure_time)  =  0.918 

Print some info and output messages from the optimization process


In [9]:
print res.success
print res.nfev      # number of function evaluations during optimization
print res.lmdif_message
print res.message


True
10
Both actual and predicted relative reductions in the sum of squares
  are at most 0.000100 and the relative error between two consecutive iterates is at 
  most 0.000100
Tolerance seems to be too small.

Chi-square, degrees of freedom and reduced chi-square values for MLE


In [10]:
print res.chisqr
print res.nfree
print res.redchi


46.5425973864
28
1.66223562094

A more accurate, direct calculation of the confidence intervals (using a method similar to the profile likelihood). Note that -inf values generally means that the CI limit has not been reached within the parameter space - if it is bounded by physically acceptable values - or within the range defined by MLE +/- 3 * standard error estimated by normal approximation (a warning messages appears because no root has been found during the inversion of the F-test).


In [11]:
ci, trace = lmfit.conf_interval(res, trace=True)

lmfit.report_ci(ci)


                 99.70%    95.00%    67.40%     0.00%    67.40%    95.00%    99.70%
exposure_time76083.6145786562.9452295915.8421297123.05146114313.73312124274.72844135967.65107
 erosion_rate   0.00057   0.00079   0.00095   0.00107   0.00117   0.00126   0.00135

Calculate and Plot the $1-\alpha$ confidence regions (same method than the one used here above). Note that the interpolated contours of the confidence regions are approximative, depending on the (coarse) resolution of the grid used to calculate the p-values! The red lines indicate the true values.


In [14]:
x, y, cr = lmfit.conf_interval2d(res,'erosion_rate', 'exposure_time', 30, 30,
                                 limits=((0., 1.5e-3), (0., 1.3e5)))

sns.set_style('darkgrid')
sns.set_context('notebook')

plt.contourf(x, y, cr, [0., 0.674, 0.95, 0.997, 1.],
             cmap=plt.get_cmap('Blues'))
plt.colorbar()
plt.setp(plt.gca(),
         xlabel='erosion rate [cm yr-1]',
         ylabel='exposure time [yr]')
plt.axhline(true_exposure, color='r')
plt.axvline(true_erosion, color='r')


Out[14]:
<matplotlib.collections.LineCollection at 0x7f3b92d2b1d0>

Plot the measured concentrations and the predicted profile corresponding to the best fitted data model


In [13]:
depths = np.linspace(profile_data['depth'].min(),
                     profile_data['depth'].max(),
                     100)
Cm_fitted = models.C_10Be(depths,
                          res.values['erosion_rate'],
                          res.values['exposure_time'],
                          fixed_density,
                          fixed_inheritance)

Cm_true = models.C_10Be(depths, true_erosion, true_exposure,
                        fixed_density, fixed_inheritance)

plt.figure()
plt.plot(Cm_fitted, -depths, label='best-fitted model')
plt.plot(Cm_true, -depths, color='r', label='true model')
plt.errorbar(profile_data['C'],
             -profile_data['depth'],
             xerr=profile_data['std'],
             fmt='o', markersize=4,
             label='data')
plt.setp(plt.gca(),
         xlabel='10Be concentration [atoms g-1]',
         ylabel='-1 * depth [cm]',
         xlim=[0, None], ylim=[None, 0])

plt.legend(loc='lower right')


Out[13]:
<matplotlib.legend.Legend at 0x7f3b9316e0d0>

Observations

2 free parameters

The results obtained here seem to be consistent with MLE with 2 parameters usign the Grid-Search method (see the notebook GS_test_2params).


In [ ]: