Maximum Likelihood Estimation - Test case - 2 free parameters

An example of applying the MLE approach with 2 free parameters (soil density and inheritance are fixed), using the lmfit package. This package implements chi-square minimization for non-linear models, which is equivalent to MLE for this problem.

For more info about the method used, see the notebook Inference_Notes.

This example (a test case) is based on a generic dataset of 10Be concentration vs. depth, which is drawn from a distribution with given "true" parameters.

This notebook has the following external dependencies:



In [1]:

    
import math

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import lmfit
import seaborn as sns

%matplotlib inline

The mathematical (Predictive) Model

The dataset is generated using the following true parameter values



In [2]:

    
import models

The Data

The gendata Python module is used to generate the dataset (see the notebook Datasets).



In [3]:

    
# the true parameters 
true_erosion = 1e-3
true_exposure = 1e5

# fixed parameters
fixed_density = 2.0
fixed_inheritance = 0.

# depths and sample size
depth_minmax = [50, 500]
N = 30

# perturbations
err_magnitude = 20.
err_variability = 5.

The gendata Python module is used to generate the dataset (see the notebook Datasets).



In [4]:

    
import gendata

profile_data = gendata.generate_dataset(
    models.C_10Be,
    [true_erosion, true_exposure, fixed_density, fixed_inheritance],
    zlimits=depth_minmax,
    n=N,
    err=[err_magnitude, err_variability]
)

profile_data









    Out[4]:






  
    
      
      depth
      C
      std
    
  
  
    
      0 
        50.000000
       156014.242708
       4227.672408
    
    
      1 
        65.517241
       126583.204995
       3571.375199
    
    
      2 
        81.034483
        95759.286561
       5402.679723
    
    
      3 
        96.551724
       104846.238712
       6145.589660
    
    
      4 
       112.068966
        71148.598973
       7347.499093
    
    
      5 
       127.586207
        71506.417829
       4754.698483
    
    
      6 
       143.103448
        60387.142752
       4588.513302
    
    
      7 
       158.620690
        46929.931191
       4566.520236
    
    
      8 
       174.137931
        43047.274717
       5474.154989
    
    
      9 
       189.655172
        34214.363147
       4037.187316
    
    
      10
       205.172414
        27657.094132
       4321.698254
    
    
      11
       220.689655
        25680.920984
       2964.600724
    
    
      12
       236.206897
        28235.147199
       3432.169702
    
    
      13
       251.724138
        12861.687525
       1874.088853
    
    
      14
       267.241379
        17386.712152
       1848.073489
    
    
      15
       282.758621
        17339.599282
       1723.271906
    
    
      16
       298.275862
        17279.628470
       3238.563171
    
    
      17
       313.793103
        15496.604824
       1819.729012
    
    
      18
       329.310345
        15133.305987
       2076.297850
    
    
      19
       344.827586
        14230.335335
       3032.005288
    
    
      20
       360.344828
        10680.077347
       1618.137610
    
    
      21
       375.862069
         7753.571063
       1235.788503
    
    
      22
       391.379310
        11005.429020
       2034.369312
    
    
      23
       406.896552
         8409.211399
       1870.467510
    
    
      24
       422.413793
         6448.310732
       1823.961096
    
    
      25
       437.931034
         5710.982585
       1545.511152
    
    
      26
       453.448276
         8149.608919
       2509.640326
    
    
      27
       468.965517
         7198.849297
        851.008453
    
    
      28
       484.482759
         7960.813410
       1743.290417
    
    
      29
       500.000000
         7008.695301
       1936.426733
    
  

30 rows × 3 columns

The dataset is stored as a :class:pandas.DataFrame object.

Fitting the model

The default optimization method used by lmfit is the “leastsq” function of the scipy's optimize package. This function is a wrapper around MINPACK’s lmdif and lmder algorithms. Using this method, we must provide the array of the normalized residuals rather than the chi-square.



In [5]:

    
def residuals(params):
    """
    Normalized residuals (to be used
    with the `lmfit` package).
    
    """
    erosion = params['erosion_rate'].value
    exposure = params['exposure_time'].value
    
    C_mod = models.C_10Be(profile_data['depth'],
                          erosion, exposure,
                          fixed_density, fixed_inheritance)
    
    return (C_mod - profile_data['C']) / profile_data['std']

Define the parameters to fit, with an initial value (needed for optimization) and optionally bounds.



In [6]:

    
params = lmfit.Parameters()
params.add('erosion_rate',   value=1e-3, min=0.)
params.add('exposure_time', value=1e5, min=0.)

Fit the model



In [7]:

    
res = lmfit.minimize(residuals, params, ftol=1e-4, xtol=1e-4)

Report the results (the MLE values for each parameter and the estimation of standard errors / correlations based on the Gaussian approximation of the MLE distribution)



In [8]:

    
lmfit.report_fit(params)









    



[[Variables]]
     erosion_rate:      0.00106598 +/- 0.0001087535 (10.20%) initial =  0.001
     exposure_time:     105020.3 +/- 9199.986 (8.76%) initial =  100000
[[Correlations]] (unreported correlations are <  0.100)
    C(erosion_rate, exposure_time)  =  0.918

Print some info and output messages from the optimization process



In [9]:

    
print res.success
print res.nfev      # number of function evaluations during optimization
print res.lmdif_message
print res.message









    



True
10
Both actual and predicted relative reductions in the sum of squares
  are at most 0.000100 and the relative error between two consecutive iterates is at 
  most 0.000100
Tolerance seems to be too small.

Chi-square, degrees of freedom and reduced chi-square values for MLE



In [10]:

    
print res.chisqr
print res.nfree
print res.redchi









    



46.5425973864
28
1.66223562094

A more accurate, direct calculation of the confidence intervals (using a method similar to the profile likelihood). Note that -inf values generally means that the CI limit has not been reached within the parameter space - if it is bounded by physically acceptable values - or within the range defined by MLE +/- 3 * standard error estimated by normal approximation (a warning messages appears because no root has been found during the inversion of the F-test).



In [11]:

    
ci, trace = lmfit.conf_interval(res, trace=True)

lmfit.report_ci(ci)









    



                 99.70%    95.00%    67.40%     0.00%    67.40%    95.00%    99.70%
exposure_time76083.6145786562.9452295915.8421297123.05146114313.73312124274.72844135967.65107
 erosion_rate   0.00057   0.00079   0.00095   0.00107   0.00117   0.00126   0.00135

Calculate and Plot the $1-\alpha$ confidence regions (same method than the one used here above). Note that the interpolated contours of the confidence regions are approximative, depending on the (coarse) resolution of the grid used to calculate the p-values! The red lines indicate the true values.



In [14]:

    
x, y, cr = lmfit.conf_interval2d(res,'erosion_rate', 'exposure_time', 30, 30,
                                 limits=((0., 1.5e-3), (0., 1.3e5)))

sns.set_style('darkgrid')
sns.set_context('notebook')

plt.contourf(x, y, cr, [0., 0.674, 0.95, 0.997, 1.],
             cmap=plt.get_cmap('Blues'))
plt.colorbar()
plt.setp(plt.gca(),
         xlabel='erosion rate [cm yr-1]',
         ylabel='exposure time [yr]')
plt.axhline(true_exposure, color='r')
plt.axvline(true_erosion, color='r')









    Out[14]:





<matplotlib.collections.LineCollection at 0x7f3b92d2b1d0>

Plot the measured concentrations and the predicted profile corresponding to the best fitted data model



In [13]:

    
depths = np.linspace(profile_data['depth'].min(),
                     profile_data['depth'].max(),
                     100)
Cm_fitted = models.C_10Be(depths,
                          res.values['erosion_rate'],
                          res.values['exposure_time'],
                          fixed_density,
                          fixed_inheritance)

Cm_true = models.C_10Be(depths, true_erosion, true_exposure,
                        fixed_density, fixed_inheritance)

plt.figure()
plt.plot(Cm_fitted, -depths, label='best-fitted model')
plt.plot(Cm_true, -depths, color='r', label='true model')
plt.errorbar(profile_data['C'],
             -profile_data['depth'],
             xerr=profile_data['std'],
             fmt='o', markersize=4,
             label='data')
plt.setp(plt.gca(),
         xlabel='10Be concentration [atoms g-1]',
         ylabel='-1 * depth [cm]',
         xlim=[0, None], ylim=[None, 0])

plt.legend(loc='lower right')









    Out[13]:





<matplotlib.legend.Legend at 0x7f3b9316e0d0>

Observations

2 free parameters

The results obtained here seem to be consistent with MLE with 2 parameters usign the Grid-Search method (see the notebook GS_test_2params).



In [ ]:

	depth	C	std
0	50.000000	156014.242708	4227.672408
1	65.517241	126583.204995	3571.375199
2	81.034483	95759.286561	5402.679723
3	96.551724	104846.238712	6145.589660
4	112.068966	71148.598973	7347.499093
5	127.586207	71506.417829	4754.698483
6	143.103448	60387.142752	4588.513302
7	158.620690	46929.931191	4566.520236
8	174.137931	43047.274717	5474.154989
9	189.655172	34214.363147	4037.187316
10	205.172414	27657.094132	4321.698254
11	220.689655	25680.920984	2964.600724
12	236.206897	28235.147199	3432.169702
13	251.724138	12861.687525	1874.088853
14	267.241379	17386.712152	1848.073489
15	282.758621	17339.599282	1723.271906
16	298.275862	17279.628470	3238.563171
17	313.793103	15496.604824	1819.729012
18	329.310345	15133.305987	2076.297850
19	344.827586	14230.335335	3032.005288
20	360.344828	10680.077347	1618.137610
21	375.862069	7753.571063	1235.788503
22	391.379310	11005.429020	2034.369312
23	406.896552	8409.211399	1870.467510
24	422.413793	6448.310732	1823.961096
25	437.931034	5710.982585	1545.511152
26	453.448276	8149.608919	2509.640326
27	468.965517	7198.849297	851.008453
28	484.482759	7960.813410	1743.290417
29	500.000000	7008.695301	1936.426733