Maximum Likelihood Estimation - Test case - 4 free parameters

An example of applying the MLE approach with 4 free parameters, using the lmfit package. This package implements chi-square minimization for non-linear models, which is equivalent to MLE for this problem.

For more info about the method used, see the notebook Inference_Notes.

This example (a test case) is based on a generic dataset of 10Be concentration vs. depth, which is drawn from a distribution with given "true" parameters.

This notebook has the following external dependencies:



In [1]:

    
import math

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import lmfit
import seaborn as sns

%matplotlib inline

The mathematical (Predictive) Model

The mathematical model is available in the models Python module (see the notebook Models)



In [2]:

    
import models

The Data

The dataset is generated using the following true parameter values



In [3]:

    
# the true parameters 
true_erosion = 1e-3
true_exposure = 1e5
true_density = 2.0
true_inheritance = 0.

# depths and sample size
depth_minmax = [50, 500]
N = 30

# perturbations
err_magnitude = 20.
err_variability = 5.

The gendata Python module is used to generate the dataset (see the notebook Datasets).



In [4]:

    
import gendata

profile_data = gendata.generate_dataset(
    models.C_10Be,
    [true_erosion, true_exposure, true_density, true_inheritance],
    zlimits=depth_minmax,
    n=N,
    err=[err_magnitude, err_variability]
)

profile_data









    Out[4]:






  
    
      
      depth
      C
      std
    
  
  
    
      0 
        50.000000
       155469.618000
       6752.276784
    
    
      1 
        65.517241
       113704.632492
       8602.120938
    
    
      2 
        81.034483
       111258.289174
       6001.163571
    
    
      3 
        96.551724
        83426.194061
       6080.164885
    
    
      4 
       112.068966
        66420.617873
       4789.861576
    
    
      5 
       127.586207
        66817.344573
       4964.878276
    
    
      6 
       143.103448
        57030.047273
       3971.537260
    
    
      7 
       158.620690
        50635.825827
       4704.462239
    
    
      8 
       174.137931
        36542.835488
       3194.466227
    
    
      9 
       189.655172
        31405.210837
       3235.768950
    
    
      10
       205.172414
        26026.062664
       3540.110577
    
    
      11
       220.689655
        20047.882312
       6035.322431
    
    
      12
       236.206897
        18769.662840
       3115.630127
    
    
      13
       251.724138
        16041.142504
       2508.001181
    
    
      14
       267.241379
        21590.400488
       3365.163361
    
    
      15
       282.758621
        18528.310813
       2854.109052
    
    
      16
       298.275862
        13845.989247
       2608.262756
    
    
      17
       313.793103
        10851.278371
       2427.391173
    
    
      18
       329.310345
        10476.673153
       2222.335469
    
    
      19
       344.827586
        14263.489929
       3077.981922
    
    
      20
       360.344828
         9995.469751
       1160.166028
    
    
      21
       375.862069
         8409.197794
       1518.436935
    
    
      22
       391.379310
         9309.767658
       1644.762778
    
    
      23
       406.896552
         6822.902575
        995.175705
    
    
      24
       422.413793
         9061.656878
       1729.211031
    
    
      25
       437.931034
         4918.669980
       2196.955213
    
    
      26
       453.448276
         9286.904748
       1797.553571
    
    
      27
       468.965517
         5425.861625
       2331.584368
    
    
      28
       484.482759
         7366.253562
       1465.869598
    
    
      29
       500.000000
         6406.698210
       2417.516899
    
  

30 rows × 3 columns

The dataset is stored as a :class:pandas.DataFrame object.

Fitting the model

The default optimization method used by lmfit is the “leastsq” function of the scipy's optimize package. This function is a wrapper around MINPACK’s lmdif and lmder algorithms. Using this method, we must provide the array of the normalized residuals rather than the chi-square.



In [5]:

    
def residuals(params):
    """
    Normalized residuals (to be used
    with the `lmfit` package).
    
    """
    erosion = params['erosion_rate'].value
    exposure = params['exposure_time'].value
    density = params['soil_density'].value
    inheritance = params['inheritance'].value
    
    C_mod = models.C_10Be(profile_data['depth'],
                          erosion, exposure,
                          density, inheritance)
    
    return (C_mod - profile_data['C']) / profile_data['std']

Define the parameters to fit, with an initial value (needed for optimization) and optionally bounds.



In [6]:

    
params = lmfit.Parameters()
params.add('erosion_rate',   value=3e-3, min=0.)
params.add('exposure_time', value=1e4, min=0.)
params.add('soil_density', value=2.3)
params.add('inheritance', value=1e4, min=0.)

Fit the model



In [7]:

    
res = lmfit.minimize(residuals, params, ftol=1e-4, xtol=1e-4)

Report the results (the MLE values for each parameter and the estimation of standard errors / correlations based on the Gaussian approximation of the MLE distribution)



In [8]:

    
lmfit.report_fit(params)









    



[[Variables]]
     erosion_rate:      5.509446e-09 +/- 0 (0.00%) initial =  0.003
     exposure_time:     55451.32 +/- 0 (0.00%) initial =  10000
     inheritance:       2565.013 +/- 0 (0.00%) initial =  10000
     soil_density:      1.996409 +/- 0 (0.00%) initial =  2.3
[[Correlations]] (unreported correlations are <  0.100)

Print some info and output messages from the optimization process



In [9]:

    
print res.success
print res.nfev      # number of function evaluations during optimization
print res.lmdif_message
print res.message









    



True
75
Both actual and predicted relative reductions in the sum of squares
  are at most 0.000100
%s. Could not estimate error-bars

Chi-square, degrees of freedom and reduced chi-square values for MLE



In [10]:

    
print res.chisqr
print res.nfree
print res.redchi









    



25.1473660908
26
0.967206388108

A more accurate, direct calculation of the confidence intervals (using a method similar to the profile likelihood). Note that -inf values generally means that the CI limit has not been reached within the parameter space - if it is bounded by physically acceptable values - or within the range defined by MLE +/- 3 * standard error estimated by normal approximation (a warning messages appears because no root has been found during the inversion of the F-test).



In [11]:

    
ci, trace = lmfit.conf_interval(res, trace=True)

lmfit.report_ci(ci)









    



                 99.70%    95.00%    67.40%     0.00%    67.40%    95.00%    99.70%
exposure_time46441.5115549607.2704752522.7620555451.2287358549.1478762016.3568866303.14015
 soil_density   1.72095   1.82071   1.90959   1.99641   2.08592   2.18351   2.30104
 erosion_rate      -inf      -inf      -inf   0.00000   0.00112   0.00127   0.00145
  inheritance 480.067811284.868371951.952872565.021173161.955183776.312424470.77124






    



/home/python/PythonEnvs/pygchem_py27_0/lib/python2.7/site-packages/lmfit/confidence.py:282: UserWarning: Warning, rel_change=0.0 < 0.01  at iteration 2 and prob(erosion_rate=-0.00199999449055) = 0.000127013518873 < max(sigmas).
  warn(errmsg)

Calculate and Plot the $1-\alpha$ confidence regions (same method than the one used here above). Note that the interpolated contours of the confidence regions are approximative, depending on the (coarse) resolution of the grid used to calculate the p-values. The red lines indicate the true values.



In [14]:

    
x, y, cr = lmfit.conf_interval2d(res,'erosion_rate', 'exposure_time', 20, 30,
                                 limits=((0., 2e-3), (0., 1.5e5)))

sns.set_style('darkgrid')
sns.set_context('notebook')

plt.contourf(x, y, cr, [0., 0.674, 0.95, 0.997, 1.],
             cmap=plt.get_cmap('Blues'))
plt.colorbar()
plt.setp(plt.gca(),
         xlabel='erosion rate [cm yr-1]',
         ylabel='exposure time [yr]')
plt.axhline(true_exposure, color='r')
plt.axvline(true_erosion, color='r')









    Out[14]:





<matplotlib.lines.Line2D at 0x7f8b4e139910>

Plot the measured concentrations and the predicted profile corresponding to the best fitted data model



In [13]:

    
depths = np.linspace(profile_data['depth'].min(),
                     profile_data['depth'].max(),
                     100)
Cm_fitted = models.C_10Be(depths,
                          res.values['erosion_rate'],
                          res.values['exposure_time'],
                          res.values['soil_density'],
                          res.values['inheritance'])

Cm_true = models.C_10Be(depths, true_erosion, true_exposure,
                        true_density, true_inheritance)

plt.figure()
plt.plot(Cm_fitted, -depths, label='best-fitted model')
plt.plot(Cm_true, -depths, color='r', label='true model')
plt.errorbar(profile_data['C'],
             -profile_data['depth'],
             xerr=profile_data['std'],
             fmt='o', markersize=4,
             label='data')
plt.setp(plt.gca(),
         xlabel='10Be concentration [atoms g-1]',
         ylabel='-1 * depth [cm]',
         xlim=[0, None], ylim=[None, 0])

plt.legend(loc='lower right')









    Out[13]:





<matplotlib.legend.Legend at 0x7f8b4e57d550>

Observations

4 free parameters

The results obtained here seem to be consistent with MLE with 4 parameters usign the Grid-Search method (see the notebook GS_test_4params).



In [13]:

	depth	C	std
0	50.000000	155469.618000	6752.276784
1	65.517241	113704.632492	8602.120938
2	81.034483	111258.289174	6001.163571
3	96.551724	83426.194061	6080.164885
4	112.068966	66420.617873	4789.861576
5	127.586207	66817.344573	4964.878276
6	143.103448	57030.047273	3971.537260
7	158.620690	50635.825827	4704.462239
8	174.137931	36542.835488	3194.466227
9	189.655172	31405.210837	3235.768950
10	205.172414	26026.062664	3540.110577
11	220.689655	20047.882312	6035.322431
12	236.206897	18769.662840	3115.630127
13	251.724138	16041.142504	2508.001181
14	267.241379	21590.400488	3365.163361
15	282.758621	18528.310813	2854.109052
16	298.275862	13845.989247	2608.262756
17	313.793103	10851.278371	2427.391173
18	329.310345	10476.673153	2222.335469
19	344.827586	14263.489929	3077.981922
20	360.344828	9995.469751	1160.166028
21	375.862069	8409.197794	1518.436935
22	391.379310	9309.767658	1644.762778
23	406.896552	6822.902575	995.175705
24	422.413793	9061.656878	1729.211031
25	437.931034	4918.669980	2196.955213
26	453.448276	9286.904748	1797.553571
27	468.965517	5425.861625	2331.584368
28	484.482759	7366.253562	1465.869598
29	500.000000	6406.698210	2417.516899