Grid Search - Test case - 4 free parameters

An example of applying MLE with 4 free parameters, using the grid search method.

For more info about the method used, see the notebook Inference_Notes.

This example (a test case) is based on a generic dataset of 10Be concentration vs. depth, which is drawn from a distribution with given "true" parameters.

This notebook has the following external dependencies:


In [1]:
import math

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns

%matplotlib inline

The mathematical (Predictive) Model

The mathematical model is available in the models Python module (see the notebook Models)


In [2]:
import models

The Data

The dataset is generated using the following true parameter values


In [3]:
# the true parameters 
true_erosion = 1e-3
true_exposure = 1e5
true_density = 2.0
true_inheritance = 0.

# depths and sample size
depth_minmax = [50, 500]
N = 30

# perturbations
err_magnitude = 20.
err_variability = 5.

The gendata Python module is used to generate the dataset (see the notebook Datasets).


In [4]:
import gendata

profile_data = gendata.generate_dataset(
    models.C_10Be,
    [true_erosion, true_exposure, true_density, true_inheritance],
    zlimits=depth_minmax,
    n=N,
    err=[err_magnitude, err_variability]
)

profile_data


Out[4]:
depth C std
0 50.000000 156626.821774 7749.543512
1 65.517241 123919.403917 7213.175913
2 81.034483 104505.786724 8111.885660
3 96.551724 80017.441588 6567.957863
4 112.068966 78566.533375 6140.961042
5 127.586207 65710.813824 3053.194191
6 143.103448 56406.609998 3650.970497
7 158.620690 45346.101535 4150.799179
8 174.137931 35667.886342 5079.609734
9 189.655172 35215.818412 3910.516542
10 205.172414 29470.408622 3489.491532
11 220.689655 38638.642543 4188.867244
12 236.206897 16892.514856 2861.456051
13 251.724138 23654.465350 2821.076472
14 267.241379 13643.944138 1797.530788
15 282.758621 13238.045446 2272.686927
16 298.275862 15318.485722 2717.923964
17 313.793103 14633.527178 2279.484509
18 329.310345 9992.866623 2053.152058
19 344.827586 10321.275957 1996.442897
20 360.344828 12174.971281 1946.264602
21 375.862069 10051.382591 1998.556164
22 391.379310 7210.385555 2033.823386
23 406.896552 8836.790935 1607.732513
24 422.413793 6119.038194 2637.213019
25 437.931034 6495.342737 1656.264988
26 453.448276 7944.926844 1426.831749
27 468.965517 6423.883991 1888.676196
28 484.482759 7901.360129 2292.055723
29 500.000000 4794.831139 1357.388836

30 rows × 3 columns

The dataset is stored as a :class:pandas.DataFrame object.

Fitting the model

The grid search method is implemented in the gridsearch module (see the notebook Grid-Search for more info).


In [5]:
import gridsearch

Create a new object for setup and results


In [6]:
gstest = gridsearch.CosmogenicInferenceGC(description='test case')

Set the data


In [7]:
gstest.set_profile_measured(
    profile_data['depth'].values,
    profile_data['C'].values,
    profile_data['std'].values,
    None,
)

Set the model


In [8]:
gstest.set_profile_model(models.C_10Be)

Define the parameters to fit and their search ranges / steps. The order must be the same than the order of the arguments of the function used for the model!


In [9]:
gstest.set_parameter(
    'erosion_rate',
    [0., 3e-3, 70j],
    stats.uniform(loc=0, scale=3e-3).pdf
)

gstest.set_parameter(
    'exposure_time',
    [0., 2e5, 70j],
    stats.uniform(loc=0, scale=2e5).pdf
)

gstest.set_parameter(
    'soil_density',
    [1.8, 2.8, 60j],
    stats.uniform(loc=1.8, scale=2.8).pdf
)

gstest.set_parameter(
    'inheritance',
    [0, 5e3, 60j],
    stats.uniform(loc=0, scale=1e5).pdf
)

Grid search setup summary


In [10]:
print gstest.setup_summary()


Modelling C profile (Bayes, Grid-Search)

DESCRIPTION:
test case

MEASURED PROFILE (30 samples):
                C       depth nucleide          std
0   156626.821774   50.000000     None  7749.543512
1   123919.403917   65.517241     None  7213.175913
2   104505.786724   81.034483     None  8111.885660
3    80017.441588   96.551724     None  6567.957863
4    78566.533375  112.068966     None  6140.961042
5    65710.813824  127.586207     None  3053.194191
6    56406.609998  143.103448     None  3650.970497
7    45346.101535  158.620690     None  4150.799179
8    35667.886342  174.137931     None  5079.609734
9    35215.818412  189.655172     None  3910.516542
10   29470.408622  205.172414     None  3489.491532
11   38638.642543  220.689655     None  4188.867244
12   16892.514856  236.206897     None  2861.456051
13   23654.465350  251.724138     None  2821.076472
14   13643.944138  267.241379     None  1797.530788
15   13238.045446  282.758621     None  2272.686927
16   15318.485722  298.275862     None  2717.923964
17   14633.527178  313.793103     None  2279.484509
18    9992.866623  329.310345     None  2053.152058
19   10321.275957  344.827586     None  1996.442897
20   12174.971281  360.344828     None  1946.264602
21   10051.382591  375.862069     None  1998.556164
22    7210.385555  391.379310     None  2033.823386
23    8836.790935  406.896552     None  1607.732513
24    6119.038194  422.413793     None  2637.213019
25    6495.342737  437.931034     None  1656.264988
26    7944.926844  453.448276     None  1426.831749
27    6423.883991  468.965517     None  1888.676196
28    7901.360129  484.482759     None  2292.055723
29    4794.831139  500.000000     None  1357.388836

[30 rows x 4 columns]

PROFILE MODEL:
C_10Be
A model for 10Be nucleide concentration profiles.

Notes
-----
The following parameters are used
(Braucher et al. 2003)

10Be radioactive decay: log 2 / 1.36e6

Contribution of
- neutrons
    - production rate: 0.9785 * P_0
    - damping depth: 160
- slow muons
    - production rate: 0.015 * P_0
    - damping depth: 1500
- fast muons
    - production rate: 0.0065 * P_0
    - damping depth: 5300

See Also
--------
C_nucleide

'UNKNOWN' PARAMETERS (4):
erosion_rate:
	prior: <bound method rv_frozen.pdf of <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f9be0d4b510>>
	range: [0.0, 0.003, 70j]
exposure_time:
	prior: <bound method rv_frozen.pdf of <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f9be0d4b4d0>>
	range: [0.0, 200000.0, 70j]
soil_density:
	prior: <bound method rv_frozen.pdf of <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f9be0d4b690>>
	range: [1.8, 2.8, 60j]
inheritance:
	prior: <bound method rv_frozen.pdf of <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f9be0d4b6d0>>
	range: [0, 5000.0, 60j]

degrees of freedom: 26

GRID SEARCH:
nb. of nodes per parameter: [70, 70, 60, 60]
total nb. of nodes: 17640000


Perform Maximum likelihood estimation on the search grid


In [11]:
gstest.compute_mle()

Get the MLE (i.e., the parameter values at the maximum likelihood), in the same order than the definition of the parameters


In [12]:
gstest.mle


Out[12]:
[array([ 0.00095652]),
 array([ 86956.52173913]),
 array([ 1.93559322]),
 array([ 84.74576271])]

Plot the profile log-likelihood for each parameter. The blue lines represent the difference between the profile log-likelihood and the maximum log-likelihood, The intersections between the blue line and the black lines define the confidence intervals at the given confidence levels (based on the likelihood ratio test). The red lines indicate the true values.


In [13]:
%matplotlib inline

def plot_proflike1d(cobj, pname, clevels=[0.68, 0.95, 0.997],
                    true_val=None, ax=None):
    
    p = cobj.parameters[pname]
    pindex = cobj.parameters.keys().index(pname)
    
    x = cobj.grid[pindex].flatten()
    proflike = cobj.proflike1d[pindex]

    if ax is None:
        ax = plt.subplot(111)
    
    difflike = proflike - cobj.maxlike
    
    ax.plot(x, difflike, label='profile loglike')
    
    ccrit = gridsearch.profile_likelihood_crit(
        cobj.proflike1d[pindex],
        cobj.maxlike,
        clevels=clevels
    )
    ccrit -= cobj.maxlike
    
    for lev, cc in zip(clevels, ccrit):
        l = ax.axhline(cc, color='k')
        hpos = x.min() + (x.max() + x.min()) * 0.05
        ax.text(hpos, cc, str(lev * 100),
                size=9, color = l.get_color(),
                ha="center", va="center",
                bbox=dict(ec='1',fc='1'))
    
    if true_val is not None:
        ax.axvline(true_val, color='r')
    
    plt.setp(ax, xlabel=pname,
             ylabel='profile log-like - max log-like',
             xlim=p['range'][0:2],
             ylim=[ccrit[-1], 0.])
    

def plot_proflike1d_all(cobj, n_subplot_cols=2, **kwargs):
    
    true_vals = [true_erosion, true_exposure,
                 true_density, true_inheritance]
    
    n_subplots = len(cobj.parameters)
    n_subplot_rows = int(math.ceil(1. * 
                                   n_subplots /
                                   n_subplot_cols))
    
    fig, aax = plt.subplots(nrows=n_subplot_rows,
                            ncols=n_subplot_cols,
                            **kwargs)
    axes = aax.flatten()
    fig.text(0.5, 0.975,
             "Profile log-like: " + cobj.description,
             horizontalalignment='center',
             verticalalignment='top')
    
    for i, pname in enumerate(cobj.parameters.keys()):
        ax = axes[i]
        plot_proflike1d(cobj, pname,
                        true_val=true_vals[i],
                        ax=ax)
    
    plt.tight_layout()
    plt.subplots_adjust(top=0.93)


plot_proflike1d_all(gstest, figsize=(8, 6))


Show the profile log-likelihood for couples of parameters. Confidence regions are also shown (also based on the likelihood ratio test).


In [14]:
def plot_proflike2d(cobj, p1p2, ax=None,
                    cmap='Blues', show_colorbar=True):
    
    pname1, pname2 = p1p2
    idim = cobj.parameters.keys().index(pname2)
    jdim = cobj.parameters.keys().index(pname1)
    
    if ax is None:
        ax = plt.subplot(111)
    
    X, Y = np.meshgrid(cobj.grid[idim].flatten(),
                       cobj.grid[jdim].flatten())
    
    difflike = cobj.proflike2d[idim][jdim] - cobj.maxlike
    
    ccrit = gridsearch.profile_likelihood_crit(
        cobj.proflike2d[idim][jdim],
        cobj.maxlike,
        clevels=[0.68, 0.95]
    )
    ccrit -= cobj.maxlike
    
    contours = np.linspace(np.median(difflike),
                           0,
                           10)
    
    P2D = ax.contourf(Y, X, difflike,
                      contours,
                      cmap=plt.get_cmap(cmap))
    
    ci68 = ax.contour(Y, X, difflike,
                      [ccrit[0]], colors='w',
                      linestyles='solid')
    plt.clabel(ci68, fontsize=8, inline=True,
               fmt='68')
    ci95 = ax.contour(Y, X, difflike,
                      [ccrit[1]], colors=['k'],
                      linestyles='solid')
    plt.clabel(ci95, fontsize=8, inline=True,
               fmt='95')
    
    plt.setp(ax, xlabel=pname1, ylabel=pname2)
    
    if show_colorbar:
        plt.colorbar(P2D, ax=ax)
    
    #ax.axhline(true_exposure, color='r')
    #ax.axvline(true_erosion, color='r')
    

fig = plt.figure(figsize=(9, 9))
ax = plt.subplot(321)
plot_proflike2d(gstest, ('erosion_rate', 'exposure_time'), ax=ax)
ax2 = plt.subplot(322)
plot_proflike2d(gstest, ('exposure_time', 'inheritance'), ax=ax2)
ax3 = plt.subplot(323)
plot_proflike2d(gstest, ('erosion_rate', 'inheritance'), ax=ax3)
ax4 = plt.subplot(324)
plot_proflike2d(gstest, ('exposure_time', 'soil_density'), ax=ax4)
ax5 = plt.subplot(325)
plot_proflike2d(gstest, ('erosion_rate', 'soil_density'), ax=ax5)
ax6 = plt.subplot(326)
plot_proflike2d(gstest, ('soil_density', 'inheritance'), ax=ax6)
plt.tight_layout()


Plot the measured concentrations and the predicted profile corresponding to the best fitted data model


In [15]:
sns.set_context('notebook')

depths = np.linspace(profile_data['depth'].min(),
                     profile_data['depth'].max(),
                     100)
Cm_fitted = models.C_10Be(depths, *gstest.mle)

Cm_true = models.C_10Be(depths, true_erosion, true_exposure,
                        true_density, true_inheritance)

plt.figure()
plt.plot(Cm_fitted, -depths, label='best-fitted model')
plt.plot(Cm_true, -depths, color='r', label='true model')
plt.errorbar(profile_data['C'],
             -profile_data['depth'],
             xerr=profile_data['std'],
             fmt='o', markersize=4,
             label='data')
plt.setp(plt.gca(),
         xlabel='10Be concentration [atoms g-1]',
         ylabel='-1 * depth [cm]',
         xlim=[0, None], ylim=[None, 0])

plt.legend(loc='lower right')


Out[15]:
<matplotlib.legend.Legend at 0x7f9bdeb46110>

Observations

4 free parameters

Using the Grid Search with 4 parameters, we can see that the number of points were the concentrations are calculated can quickly become insane (hundreds of millions of points, > 10 Go RAM). We are very limited by the ressources available (note that this implemenation is not optimized for memory, see the notebook Grid-Search).

Compared to the results with 2 free parameters only, using the same settings (error magnitude, sample size), we can see here that the confidence intervals are much more wide and that the shape of the likelihood is far from Gaussian.

The 2d profile log-likelihoods show negative correlations between inhertitance and both erosion rate and exposure time.


In [15]: