Grid Search - Test case - 2 free parameters

An example of applying MLE with 2 free parameters only (erosion rate and exposure time), using the grid search method.

For more info about the method used, see the notebook Inference_Notes.

This example (a test case) is based on a generic dataset of 10Be concentration vs. depth, which is drawn from a distribution with given "true" parameters.

This notebook has the following external dependencies:



In [1]:

    
import math

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns

%matplotlib inline

The mathematical (Predictive) Model

The mathematical model is available in the models Python module (see the notebook Models)



In [2]:

    
import models

The Data

The dataset is generated using the following true parameter values



In [3]:

    
# the true parameters 
true_erosion = 1e-3
true_exposure = 1e5

# fixed parameters
fixed_density = 2.0
fixed_inheritance = 0.

# depths and sample size
depth_minmax = [50, 500]
N = 30

# perturbations
err_magnitude = 20.
err_variability = 5.

The gendata Python module is used to generate the dataset (see the notebook Datasets).



In [4]:

    
import gendata

profile_data = gendata.generate_dataset(
    models.C_10Be,
    [true_erosion, true_exposure, fixed_density, fixed_inheritance],
    zlimits=depth_minmax,
    n=N,
    err=[err_magnitude, err_variability]
)

profile_data









    Out[4]:






  
    
      
      depth
      C
      std
    
  
  
    
      0 
        50.000000
       155977.908401
       7089.943059
    
    
      1 
        65.517241
       124417.222527
       7556.556188
    
    
      2 
        81.034483
       103784.107834
       5947.080217
    
    
      3 
        96.551724
        90589.550440
       5600.494598
    
    
      4 
       112.068966
        84495.110605
       6459.338508
    
    
      5 
       127.586207
        68094.675136
       3935.587768
    
    
      6 
       143.103448
        56243.581776
       5512.315485
    
    
      7 
       158.620690
        41172.090732
       3472.183237
    
    
      8 
       174.137931
        40208.032148
       3819.413371
    
    
      9 
       189.655172
        32389.742749
       1828.566268
    
    
      10
       205.172414
        28302.172991
       3306.757831
    
    
      11
       220.689655
        20248.003436
       3683.114280
    
    
      12
       236.206897
        21371.717110
       2610.280918
    
    
      13
       251.724138
        18847.053588
       2341.029296
    
    
      14
       267.241379
        22287.325468
       2947.374344
    
    
      15
       282.758621
        17714.355369
       1453.071392
    
    
      16
       298.275862
        15452.685560
       1388.449329
    
    
      17
       313.793103
        11105.194525
       2612.945488
    
    
      18
       329.310345
         6311.091736
       2867.388163
    
    
      19
       344.827586
         7791.938453
       1880.184986
    
    
      20
       360.344828
         9908.565314
       1850.770860
    
    
      21
       375.862069
        10438.750690
       2363.221780
    
    
      22
       391.379310
         9699.409217
       1338.286404
    
    
      23
       406.896552
         7693.693952
       1991.855197
    
    
      24
       422.413793
         6364.890931
       2661.509273
    
    
      25
       437.931034
         6573.955062
       1925.582892
    
    
      26
       453.448276
         8675.610268
       2023.326955
    
    
      27
       468.965517
         6741.533742
       1873.011451
    
    
      28
       484.482759
         3132.369619
       1658.034144
    
    
      29
       500.000000
         7269.100898
       2019.150967
    
  

30 rows × 3 columns

The dataset is stored as a :class:pandas.DataFrame object.

Fitting the model

The grid search method is implemented in the gridsearch module (see the notebook Grid-Search for more info).



In [5]:

    
import gridsearch

Create a new object for setup and results



In [6]:

    
gstest = gridsearch.CosmogenicInferenceGC(description='test case')

Set the data



In [7]:

    
gstest.set_profile_measured(
    profile_data['depth'].values,
    profile_data['C'].values,
    profile_data['std'].values,
    None,
)

Set the model



In [8]:

    
def C_10Be_2p(depth, erosion, exposure):
    """
    A model for 10Be with 2 parameters.
    """
    return models.C_10Be(depth, erosion, exposure,
                         fixed_density, fixed_inheritance)

gstest.set_profile_model(C_10Be_2p)

Define the parameters to fit and their search ranges / steps. The order must be the same than the order of the arguments of the function used for the model!



In [9]:

    
gstest.set_parameter(
    'erosion_rate',
    [0., 3e-3, 200j],
    stats.uniform(loc=0, scale=3e-3).pdf
)

gstest.set_parameter(
    'exposure_time',
    [0., 2e5, 200j],
    stats.uniform(loc=0, scale=2e5).pdf
)

Grid search setup summary



In [10]:

    
print gstest.setup_summary()









    



Modelling C profile (Bayes, Grid-Search)

DESCRIPTION:
test case

MEASURED PROFILE (30 samples):
                C       depth nucleide          std
0   155977.908401   50.000000     None  7089.943059
1   124417.222527   65.517241     None  7556.556188
2   103784.107834   81.034483     None  5947.080217
3    90589.550440   96.551724     None  5600.494598
4    84495.110605  112.068966     None  6459.338508
5    68094.675136  127.586207     None  3935.587768
6    56243.581776  143.103448     None  5512.315485
7    41172.090732  158.620690     None  3472.183237
8    40208.032148  174.137931     None  3819.413371
9    32389.742749  189.655172     None  1828.566268
10   28302.172991  205.172414     None  3306.757831
11   20248.003436  220.689655     None  3683.114280
12   21371.717110  236.206897     None  2610.280918
13   18847.053588  251.724138     None  2341.029296
14   22287.325468  267.241379     None  2947.374344
15   17714.355369  282.758621     None  1453.071392
16   15452.685560  298.275862     None  1388.449329
17   11105.194525  313.793103     None  2612.945488
18    6311.091736  329.310345     None  2867.388163
19    7791.938453  344.827586     None  1880.184986
20    9908.565314  360.344828     None  1850.770860
21   10438.750690  375.862069     None  2363.221780
22    9699.409217  391.379310     None  1338.286404
23    7693.693952  406.896552     None  1991.855197
24    6364.890931  422.413793     None  2661.509273
25    6573.955062  437.931034     None  1925.582892
26    8675.610268  453.448276     None  2023.326955
27    6741.533742  468.965517     None  1873.011451
28    3132.369619  484.482759     None  1658.034144
29    7269.100898  500.000000     None  2019.150967

[30 rows x 4 columns]

PROFILE MODEL:
C_10Be_2p
A model for 10Be with 2 parameters.

'UNKNOWN' PARAMETERS (2):
erosion_rate:
	prior: <bound method rv_frozen.pdf of <scipy.stats._distn_infrastructure.rv_frozen object at 0x7fc3cd329c90>>
	range: [0.0, 0.003, 200j]
exposure_time:
	prior: <bound method rv_frozen.pdf of <scipy.stats._distn_infrastructure.rv_frozen object at 0x7fc3cd329c50>>
	range: [0.0, 200000.0, 200j]

degrees of freedom: 28

GRID SEARCH:
nb. of nodes per parameter: [200, 200]
total nb. of nodes: 40000

Perform Maximum likelihood estimation on the search grid



In [11]:

    
gstest.compute_mle()

Get the MLE (i.e., the parameter values at the maximum likelihood), in the same order than the definition of the parameters



In [12]:

    
gstest.mle









    Out[12]:





[array([ 0.0009799]), array([ 97487.43718593])]

Plot the profile log-likelihood for each parameter. The blue lines represent the difference between the profile log-likelihood and the maximum log-likelihood, The intersections between the blue line and the black lines define the confidence intervals at the given confidence levels (based on the likelihood ratio test). The red lines indicate the true values.



In [13]:

    
%matplotlib inline

def plot_proflike1d(cobj, pname, clevels=[0.68, 0.95, 0.997],
                    true_val=None, ax=None):
    
    p = cobj.parameters[pname]
    pindex = cobj.parameters.keys().index(pname)
    
    x = cobj.grid[pindex].flatten()
    proflike = cobj.proflike1d[pindex]

    if ax is None:
        ax = plt.subplot(111)
    
    difflike = proflike - cobj.maxlike
    
    ax.plot(x, difflike, label='profile loglike')
    
    ccrit = gridsearch.profile_likelihood_crit(
        cobj.proflike1d[pindex],
        cobj.maxlike,
        clevels=clevels
    )
    ccrit -= cobj.maxlike
    
    for lev, cc in zip(clevels, ccrit):
        l = ax.axhline(cc, color='k')
        hpos = x.min() + (x.max() + x.min()) * 0.05
        ax.text(hpos, cc, str(lev * 100),
                size=9, color = l.get_color(),
                ha="center", va="center",
                bbox=dict(ec='1',fc='1'))
    
    if true_val is not None:
        ax.axvline(true_val, color='r')
    
    plt.setp(ax, xlabel=pname,
             ylabel='profile log-like - max log-like',
             xlim=p['range'][0:2],
             ylim=[ccrit[-1], 0.])
    

def plot_proflike1d_all(cobj, n_subplot_cols=2, **kwargs):
    
    true_vals = [true_erosion, true_exposure]
    
    n_subplots = len(cobj.parameters)
    n_subplot_rows = int(math.ceil(1. * 
                                   n_subplots /
                                   n_subplot_cols))
    
    fig, aax = plt.subplots(nrows=n_subplot_rows,
                            ncols=n_subplot_cols,
                            **kwargs)
    axes = aax.flatten()
    fig.text(0.5, 0.975,
             "Profile log-like: " + cobj.description,
             horizontalalignment='center',
             verticalalignment='top')
    
    for i, pname in enumerate(cobj.parameters.keys()):
        ax = axes[i]
        plot_proflike1d(cobj, pname,
                        true_val=true_vals[i],
                        ax=ax)
    
    plt.tight_layout()
    plt.subplots_adjust(top=0.93)


plot_proflike1d_all(gstest, figsize=(8, 4))

Show the log-likelihood vs. erosion rate and exposure. Confidence regions are also shown (also based on the likelihood ratio test).



In [14]:

    
def plot_proflike2d(cobj, p1p2, ax=None,
                    cmap='Blues', show_colorbar=True):
    
    pname1, pname2 = p1p2
    idim = cobj.parameters.keys().index(pname2)
    jdim = cobj.parameters.keys().index(pname1)
    
    if ax is None:
        ax = plt.subplot(111)
    
    X, Y = np.meshgrid(cobj.grid[idim].flatten(),
                       cobj.grid[jdim].flatten())
    
    difflike = cobj.proflike2d[idim][jdim] - cobj.maxlike
    
    ccrit = gridsearch.profile_likelihood_crit(
        cobj.proflike2d[idim][jdim],
        cobj.maxlike,
        clevels=[0.68, 0.95]
    )
    ccrit -= cobj.maxlike
    
    contours = np.linspace(np.median(difflike),
                           0,
                           10)
    
    P2D = ax.contourf(Y, X, difflike,
                      contours,
                      cmap=plt.get_cmap(cmap))
    
    ci68 = ax.contour(Y, X, difflike,
                      [ccrit[0]], colors='w',
                      linestyles='solid')
    plt.clabel(ci68, fontsize=8, inline=True,
               fmt='68')
    ci95 = ax.contour(Y, X, difflike,
                      [ccrit[1]], colors=['k'],
                      linestyles='solid')
    plt.clabel(ci95, fontsize=8, inline=True,
               fmt='95')
    
    plt.setp(ax, xlabel=pname1, ylabel=pname2)
    
    if show_colorbar:
        plt.colorbar(P2D, ax=ax)
    
    ax.axhline(true_exposure, color='r')
    ax.axvline(true_erosion, color='r')
    

fig = plt.figure(figsize=(6, 6))
ax = plt.subplot(111)
plot_proflike2d(gstest, ('erosion_rate', 'exposure_time'), ax=ax)

Plot the measured concentrations and the predicted profile corresponding to the best fitted data model



In [15]:

    
sns.set_context('notebook')

depths = np.linspace(profile_data['depth'].min(),
                     profile_data['depth'].max(),
                     100)
Cm_fitted = models.C_10Be(depths,
                          gstest.mle[0], gstest.mle[1],
                          fixed_density,
                          fixed_inheritance)

Cm_true = models.C_10Be(depths, true_erosion, true_exposure,
                        fixed_density, fixed_inheritance)

plt.figure()
plt.plot(Cm_fitted, -depths, label='best-fitted model')
plt.plot(Cm_true, -depths, color='r', label='true model')
plt.errorbar(profile_data['C'],
             -profile_data['depth'],
             xerr=profile_data['std'],
             fmt='o', markersize=4,
             label='data')
plt.setp(plt.gca(),
         xlabel='10Be concentration [atoms g-1]',
         ylabel='-1 * depth [cm]',
         xlim=[0, None], ylim=[None, 0])

plt.legend(loc='lower right')









    Out[15]:





<matplotlib.legend.Legend at 0x7fc3cd01e6d0>

Observations

2 free parameters

As the sample size increases, keeping fixed the error magnitude, we can see that the shapes of the profile log-likelihoods tend to be closer to a parabola (asymptotic normality) and the confidence intervals narrow up. For very small sample sizes (< 10), the confidence intervals are larger and the assumption of the normality is not reasonable.

For small sample sizes, we can also see that the maximum likelihood estimator is not always close to the true parameter values (sometimes the true values lie outside the 65% - or even 95% - confidence intervals). The results change from one generated dataset to another, setting the same perturbation parameters. It shows that the maximum likelihood estimator is biased for finite sample sizes. This bias is visible even for sample sizes $\sim$ 100 to 500. With very large sample sizes, we can see that it converges towards the true values.

The shape of the likelihood and the confidence regions show the positive correlation (non-linear relationship) between erosion rates and exposure time.



In [ ]:

	depth	C	std
0	50.000000	155977.908401	7089.943059
1	65.517241	124417.222527	7556.556188
2	81.034483	103784.107834	5947.080217
3	96.551724	90589.550440	5600.494598
4	112.068966	84495.110605	6459.338508
5	127.586207	68094.675136	3935.587768
6	143.103448	56243.581776	5512.315485
7	158.620690	41172.090732	3472.183237
8	174.137931	40208.032148	3819.413371
9	189.655172	32389.742749	1828.566268
10	205.172414	28302.172991	3306.757831
11	220.689655	20248.003436	3683.114280
12	236.206897	21371.717110	2610.280918
13	251.724138	18847.053588	2341.029296
14	267.241379	22287.325468	2947.374344
15	282.758621	17714.355369	1453.071392
16	298.275862	15452.685560	1388.449329
17	313.793103	11105.194525	2612.945488
18	329.310345	6311.091736	2867.388163
19	344.827586	7791.938453	1880.184986
20	360.344828	9908.565314	1850.770860
21	375.862069	10438.750690	2363.221780
22	391.379310	9699.409217	1338.286404
23	406.896552	7693.693952	1991.855197
24	422.413793	6364.890931	2661.509273
25	437.931034	6573.955062	1925.582892
26	453.448276	8675.610268	2023.326955
27	468.965517	6741.533742	1873.011451
28	484.482759	3132.369619	1658.034144
29	500.000000	7269.100898	2019.150967