Grid Search - Test case - 4 free parameters

An example of applying MLE with 4 free parameters, using the grid search method.

For more info about the method used, see the notebook Inference_Notes.

This example (a test case) is based on a generic dataset of 10Be concentration vs. depth, which is drawn from a distribution with given "true" parameters.

This notebook has the following external dependencies:



In [1]:

    
import math

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns

%matplotlib inline

The mathematical (Predictive) Model

The mathematical model is available in the models Python module (see the notebook Models)



In [2]:

    
import models

The Data

The dataset is generated using the following true parameter values



In [3]:

    
# the true parameters 
true_erosion = 1e-3
true_exposure = 1e5
true_density = 2.0
true_inheritance = 0.

# depths and sample size
depth_minmax = [50, 500]
N = 30

# perturbations
err_magnitude = 20.
err_variability = 5.

The gendata Python module is used to generate the dataset (see the notebook Datasets).



In [4]:

    
import gendata

profile_data = gendata.generate_dataset(
    models.C_10Be,
    [true_erosion, true_exposure, true_density, true_inheritance],
    zlimits=depth_minmax,
    n=N,
    err=[err_magnitude, err_variability]
)

profile_data









    Out[4]:






  
    
      
      depth
      C
      std
    
  
  
    
      0 
        50.000000
       156626.821774
       7749.543512
    
    
      1 
        65.517241
       123919.403917
       7213.175913
    
    
      2 
        81.034483
       104505.786724
       8111.885660
    
    
      3 
        96.551724
        80017.441588
       6567.957863
    
    
      4 
       112.068966
        78566.533375
       6140.961042
    
    
      5 
       127.586207
        65710.813824
       3053.194191
    
    
      6 
       143.103448
        56406.609998
       3650.970497
    
    
      7 
       158.620690
        45346.101535
       4150.799179
    
    
      8 
       174.137931
        35667.886342
       5079.609734
    
    
      9 
       189.655172
        35215.818412
       3910.516542
    
    
      10
       205.172414
        29470.408622
       3489.491532
    
    
      11
       220.689655
        38638.642543
       4188.867244
    
    
      12
       236.206897
        16892.514856
       2861.456051
    
    
      13
       251.724138
        23654.465350
       2821.076472
    
    
      14
       267.241379
        13643.944138
       1797.530788
    
    
      15
       282.758621
        13238.045446
       2272.686927
    
    
      16
       298.275862
        15318.485722
       2717.923964
    
    
      17
       313.793103
        14633.527178
       2279.484509
    
    
      18
       329.310345
         9992.866623
       2053.152058
    
    
      19
       344.827586
        10321.275957
       1996.442897
    
    
      20
       360.344828
        12174.971281
       1946.264602
    
    
      21
       375.862069
        10051.382591
       1998.556164
    
    
      22
       391.379310
         7210.385555
       2033.823386
    
    
      23
       406.896552
         8836.790935
       1607.732513
    
    
      24
       422.413793
         6119.038194
       2637.213019
    
    
      25
       437.931034
         6495.342737
       1656.264988
    
    
      26
       453.448276
         7944.926844
       1426.831749
    
    
      27
       468.965517
         6423.883991
       1888.676196
    
    
      28
       484.482759
         7901.360129
       2292.055723
    
    
      29
       500.000000
         4794.831139
       1357.388836
    
  

30 rows × 3 columns

The dataset is stored as a :class:pandas.DataFrame object.

Fitting the model

The grid search method is implemented in the gridsearch module (see the notebook Grid-Search for more info).



In [5]:

    
import gridsearch

Create a new object for setup and results



In [6]:

    
gstest = gridsearch.CosmogenicInferenceGC(description='test case')

Set the data



In [7]:

    
gstest.set_profile_measured(
    profile_data['depth'].values,
    profile_data['C'].values,
    profile_data['std'].values,
    None,
)

Set the model



In [8]:

    
gstest.set_profile_model(models.C_10Be)

Define the parameters to fit and their search ranges / steps. The order must be the same than the order of the arguments of the function used for the model!



In [9]:

    
gstest.set_parameter(
    'erosion_rate',
    [0., 3e-3, 70j],
    stats.uniform(loc=0, scale=3e-3).pdf
)

gstest.set_parameter(
    'exposure_time',
    [0., 2e5, 70j],
    stats.uniform(loc=0, scale=2e5).pdf
)

gstest.set_parameter(
    'soil_density',
    [1.8, 2.8, 60j],
    stats.uniform(loc=1.8, scale=2.8).pdf
)

gstest.set_parameter(
    'inheritance',
    [0, 5e3, 60j],
    stats.uniform(loc=0, scale=1e5).pdf
)

Grid search setup summary



In [10]:

    
print gstest.setup_summary()









    



Modelling C profile (Bayes, Grid-Search)

DESCRIPTION:
test case

MEASURED PROFILE (30 samples):
                C       depth nucleide          std
0   156626.821774   50.000000     None  7749.543512
1   123919.403917   65.517241     None  7213.175913
2   104505.786724   81.034483     None  8111.885660
3    80017.441588   96.551724     None  6567.957863
4    78566.533375  112.068966     None  6140.961042
5    65710.813824  127.586207     None  3053.194191
6    56406.609998  143.103448     None  3650.970497
7    45346.101535  158.620690     None  4150.799179
8    35667.886342  174.137931     None  5079.609734
9    35215.818412  189.655172     None  3910.516542
10   29470.408622  205.172414     None  3489.491532
11   38638.642543  220.689655     None  4188.867244
12   16892.514856  236.206897     None  2861.456051
13   23654.465350  251.724138     None  2821.076472
14   13643.944138  267.241379     None  1797.530788
15   13238.045446  282.758621     None  2272.686927
16   15318.485722  298.275862     None  2717.923964
17   14633.527178  313.793103     None  2279.484509
18    9992.866623  329.310345     None  2053.152058
19   10321.275957  344.827586     None  1996.442897
20   12174.971281  360.344828     None  1946.264602
21   10051.382591  375.862069     None  1998.556164
22    7210.385555  391.379310     None  2033.823386
23    8836.790935  406.896552     None  1607.732513
24    6119.038194  422.413793     None  2637.213019
25    6495.342737  437.931034     None  1656.264988
26    7944.926844  453.448276     None  1426.831749
27    6423.883991  468.965517     None  1888.676196
28    7901.360129  484.482759     None  2292.055723
29    4794.831139  500.000000     None  1357.388836

[30 rows x 4 columns]

PROFILE MODEL:
C_10Be
A model for 10Be nucleide concentration profiles.

Notes
-----
The following parameters are used
(Braucher et al. 2003)

10Be radioactive decay: log 2 / 1.36e6

Contribution of
- neutrons
    - production rate: 0.9785 * P_0
    - damping depth: 160
- slow muons
    - production rate: 0.015 * P_0
    - damping depth: 1500
- fast muons
    - production rate: 0.0065 * P_0
    - damping depth: 5300

See Also
--------
C_nucleide

'UNKNOWN' PARAMETERS (4):
erosion_rate:
	prior: <bound method rv_frozen.pdf of <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f9be0d4b510>>
	range: [0.0, 0.003, 70j]
exposure_time:
	prior: <bound method rv_frozen.pdf of <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f9be0d4b4d0>>
	range: [0.0, 200000.0, 70j]
soil_density:
	prior: <bound method rv_frozen.pdf of <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f9be0d4b690>>
	range: [1.8, 2.8, 60j]
inheritance:
	prior: <bound method rv_frozen.pdf of <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f9be0d4b6d0>>
	range: [0, 5000.0, 60j]

degrees of freedom: 26

GRID SEARCH:
nb. of nodes per parameter: [70, 70, 60, 60]
total nb. of nodes: 17640000

Perform Maximum likelihood estimation on the search grid



In [11]:

    
gstest.compute_mle()

Get the MLE (i.e., the parameter values at the maximum likelihood), in the same order than the definition of the parameters



In [12]:

    
gstest.mle









    Out[12]:





[array([ 0.00095652]),
 array([ 86956.52173913]),
 array([ 1.93559322]),
 array([ 84.74576271])]

Plot the profile log-likelihood for each parameter. The blue lines represent the difference between the profile log-likelihood and the maximum log-likelihood, The intersections between the blue line and the black lines define the confidence intervals at the given confidence levels (based on the likelihood ratio test). The red lines indicate the true values.



In [13]:

    
%matplotlib inline

def plot_proflike1d(cobj, pname, clevels=[0.68, 0.95, 0.997],
                    true_val=None, ax=None):
    
    p = cobj.parameters[pname]
    pindex = cobj.parameters.keys().index(pname)
    
    x = cobj.grid[pindex].flatten()
    proflike = cobj.proflike1d[pindex]

    if ax is None:
        ax = plt.subplot(111)
    
    difflike = proflike - cobj.maxlike
    
    ax.plot(x, difflike, label='profile loglike')
    
    ccrit = gridsearch.profile_likelihood_crit(
        cobj.proflike1d[pindex],
        cobj.maxlike,
        clevels=clevels
    )
    ccrit -= cobj.maxlike
    
    for lev, cc in zip(clevels, ccrit):
        l = ax.axhline(cc, color='k')
        hpos = x.min() + (x.max() + x.min()) * 0.05
        ax.text(hpos, cc, str(lev * 100),
                size=9, color = l.get_color(),
                ha="center", va="center",
                bbox=dict(ec='1',fc='1'))
    
    if true_val is not None:
        ax.axvline(true_val, color='r')
    
    plt.setp(ax, xlabel=pname,
             ylabel='profile log-like - max log-like',
             xlim=p['range'][0:2],
             ylim=[ccrit[-1], 0.])
    

def plot_proflike1d_all(cobj, n_subplot_cols=2, **kwargs):
    
    true_vals = [true_erosion, true_exposure,
                 true_density, true_inheritance]
    
    n_subplots = len(cobj.parameters)
    n_subplot_rows = int(math.ceil(1. * 
                                   n_subplots /
                                   n_subplot_cols))
    
    fig, aax = plt.subplots(nrows=n_subplot_rows,
                            ncols=n_subplot_cols,
                            **kwargs)
    axes = aax.flatten()
    fig.text(0.5, 0.975,
             "Profile log-like: " + cobj.description,
             horizontalalignment='center',
             verticalalignment='top')
    
    for i, pname in enumerate(cobj.parameters.keys()):
        ax = axes[i]
        plot_proflike1d(cobj, pname,
                        true_val=true_vals[i],
                        ax=ax)
    
    plt.tight_layout()
    plt.subplots_adjust(top=0.93)


plot_proflike1d_all(gstest, figsize=(8, 6))

Show the profile log-likelihood for couples of parameters. Confidence regions are also shown (also based on the likelihood ratio test).



In [14]:

    
def plot_proflike2d(cobj, p1p2, ax=None,
                    cmap='Blues', show_colorbar=True):
    
    pname1, pname2 = p1p2
    idim = cobj.parameters.keys().index(pname2)
    jdim = cobj.parameters.keys().index(pname1)
    
    if ax is None:
        ax = plt.subplot(111)
    
    X, Y = np.meshgrid(cobj.grid[idim].flatten(),
                       cobj.grid[jdim].flatten())
    
    difflike = cobj.proflike2d[idim][jdim] - cobj.maxlike
    
    ccrit = gridsearch.profile_likelihood_crit(
        cobj.proflike2d[idim][jdim],
        cobj.maxlike,
        clevels=[0.68, 0.95]
    )
    ccrit -= cobj.maxlike
    
    contours = np.linspace(np.median(difflike),
                           0,
                           10)
    
    P2D = ax.contourf(Y, X, difflike,
                      contours,
                      cmap=plt.get_cmap(cmap))
    
    ci68 = ax.contour(Y, X, difflike,
                      [ccrit[0]], colors='w',
                      linestyles='solid')
    plt.clabel(ci68, fontsize=8, inline=True,
               fmt='68')
    ci95 = ax.contour(Y, X, difflike,
                      [ccrit[1]], colors=['k'],
                      linestyles='solid')
    plt.clabel(ci95, fontsize=8, inline=True,
               fmt='95')
    
    plt.setp(ax, xlabel=pname1, ylabel=pname2)
    
    if show_colorbar:
        plt.colorbar(P2D, ax=ax)
    
    #ax.axhline(true_exposure, color='r')
    #ax.axvline(true_erosion, color='r')
    

fig = plt.figure(figsize=(9, 9))
ax = plt.subplot(321)
plot_proflike2d(gstest, ('erosion_rate', 'exposure_time'), ax=ax)
ax2 = plt.subplot(322)
plot_proflike2d(gstest, ('exposure_time', 'inheritance'), ax=ax2)
ax3 = plt.subplot(323)
plot_proflike2d(gstest, ('erosion_rate', 'inheritance'), ax=ax3)
ax4 = plt.subplot(324)
plot_proflike2d(gstest, ('exposure_time', 'soil_density'), ax=ax4)
ax5 = plt.subplot(325)
plot_proflike2d(gstest, ('erosion_rate', 'soil_density'), ax=ax5)
ax6 = plt.subplot(326)
plot_proflike2d(gstest, ('soil_density', 'inheritance'), ax=ax6)
plt.tight_layout()

Plot the measured concentrations and the predicted profile corresponding to the best fitted data model



In [15]:

    
sns.set_context('notebook')

depths = np.linspace(profile_data['depth'].min(),
                     profile_data['depth'].max(),
                     100)
Cm_fitted = models.C_10Be(depths, *gstest.mle)

Cm_true = models.C_10Be(depths, true_erosion, true_exposure,
                        true_density, true_inheritance)

plt.figure()
plt.plot(Cm_fitted, -depths, label='best-fitted model')
plt.plot(Cm_true, -depths, color='r', label='true model')
plt.errorbar(profile_data['C'],
             -profile_data['depth'],
             xerr=profile_data['std'],
             fmt='o', markersize=4,
             label='data')
plt.setp(plt.gca(),
         xlabel='10Be concentration [atoms g-1]',
         ylabel='-1 * depth [cm]',
         xlim=[0, None], ylim=[None, 0])

plt.legend(loc='lower right')









    Out[15]:





<matplotlib.legend.Legend at 0x7f9bdeb46110>

Observations

4 free parameters

Using the Grid Search with 4 parameters, we can see that the number of points were the concentrations are calculated can quickly become insane (hundreds of millions of points, > 10 Go RAM). We are very limited by the ressources available (note that this implemenation is not optimized for memory, see the notebook Grid-Search).

Compared to the results with 2 free parameters only, using the same settings (error magnitude, sample size), we can see here that the confidence intervals are much more wide and that the shape of the likelihood is far from Gaussian.

The 2d profile log-likelihoods show negative correlations between inhertitance and both erosion rate and exposure time.



In [15]:

	depth	C	std
0	50.000000	156626.821774	7749.543512
1	65.517241	123919.403917	7213.175913
2	81.034483	104505.786724	8111.885660
3	96.551724	80017.441588	6567.957863
4	112.068966	78566.533375	6140.961042
5	127.586207	65710.813824	3053.194191
6	143.103448	56406.609998	3650.970497
7	158.620690	45346.101535	4150.799179
8	174.137931	35667.886342	5079.609734
9	189.655172	35215.818412	3910.516542
10	205.172414	29470.408622	3489.491532
11	220.689655	38638.642543	4188.867244
12	236.206897	16892.514856	2861.456051
13	251.724138	23654.465350	2821.076472
14	267.241379	13643.944138	1797.530788
15	282.758621	13238.045446	2272.686927
16	298.275862	15318.485722	2717.923964
17	313.793103	14633.527178	2279.484509
18	329.310345	9992.866623	2053.152058
19	344.827586	10321.275957	1996.442897
20	360.344828	12174.971281	1946.264602
21	375.862069	10051.382591	1998.556164
22	391.379310	7210.385555	2033.823386
23	406.896552	8836.790935	1607.732513
24	422.413793	6119.038194	2637.213019
25	437.931034	6495.342737	1656.264988
26	453.448276	7944.926844	1426.831749
27	468.965517	6423.883991	1888.676196
28	484.482759	7901.360129	2292.055723
29	500.000000	4794.831139	1357.388836