Tools for creating generic datasets for testing + some real datasets.
Generate nuclide concentration vs. depth values from a model and add random perturbations.
Realistic perturbations have each a stochastic component and a deterministic component (depends on the concentration value). Each perturbation is thus generated using a Gaussian of $\mu_p = 0$ and $\sigma_p$ given by another Gaussian:
err_magnitudeerr_variabilityThe generated dataset is returned in a :class:pandas.DataFrame object. The following code is saved as the Python module gendata so that it can be re-used in other notebooks.
In [1]:
%%writefile gendata.py
"""
Create generic datasets of
nuclide concentrations vs. depth.
"""
import numpy as np
import pandas as pd
import models
def generate_dataset(model, model_args, model_kwargs=None,
zlimits=[50, 500], n=10,
err=[20., 5.]):
"""
Create a generic dataset of nuclide concentrations
vs. depth (for testing).
Parameters
----------
model : callable
the model to use for generating the data
model_args : list, tuple
arguments to pass to `model`
model_kwargs : dict
keyword arguments to pass to `model`
zlimits : [float, float]
depths min and max values
n : int
sample size
err : float or [float, float]
fixed error (one value given) or
error magnitude and error variability
(two values given, see below)
Returns
-------
:class:`pandas.DataFrame` object
Notes
-----
The returned dataset corresponds to
concentration values predicted by
the model + random perturbations.
When one value is given for `err`, the
parturbations are all generated using a
Gaussian of mu=0 and sigma=fixed error.
When two values are given for `err`, each
perturbation is generated using a Gaussian
of mu=0 and sigma given by another Gaussian:
mu = sqrt(concentration) * error magnitude
sigma = sqrt(concentration) * error variability
"""
zmin, zmax = zlimits
model_kwargs = model_kwargs or dict()
depths = np.linspace(zmin, zmax, n)
profile_data = pd.DataFrame()
profile_data['depth'] = depths
profile_data['C'] = model(profile_data['depth'],
*model_args,
**model_kwargs)
try:
err_magn, err_var = err
err_mu = err_magn * np.sqrt(profile_data['C'])
err_sigma = err_var * np.sqrt(profile_data['C'])
profile_data['std'] = np.array(
[np.random.normal(loc=mu, scale=sigma)
for mu, sigma in zip(err_mu, err_sigma)]
)
except TypeError:
profile_data['std'] = np.ones_like(depths) * err
error = np.array([np.random.normal(scale=std)
for std in profile_data['std']])
profile_data['C'] += error
return profile_data
Create a folder to save the dataset files.
In [2]:
%mkdir profiles_data
Dataset collected in the upper Amblève valley (NE Belgium). See Rixhon et al., 2011
In [3]:
%%writefile profiles_data/lodomez_10Be_profile_data.csv
"sample" "depth" "depth_g-cm-2" "C" "std" "nuclide"
"s01" 250 451 43005 1695 "10Be"
"s02" 200 361 94800 2024 "10Be"
"s03" 165 298 148569 3621 "10Be"
"s04" 100 181 269566 5038 "10Be"
"s05" 50 90 432800 11714 "10Be"
In [4]:
%%writefile profiles_data/lodomez_10Be_settings.yaml
# 10Be surface production rate
P_0: 6.13
# sample site latitude [degrees]
latitude: 50.39
# sample site altitude [meters]
altitude: 283.0
# pressure [hPa]
pressure: 979.711
Dataset collected in the lower Amblève valley (NE Belgium). See Rixhon et al., 2011
In [5]:
%%writefile profiles_data/belleroche_10Be_profile_data.csv
"sample" "depth" "depth_g-cm-2" "C" "std" "nuclide"
"s01" 300 643 46216 1728 "10Be"
"s02" 200 429 64965 2275 "10Be"
"s03" 150 322 128570 2766 "10Be"
"s04" 100 214 191825 3303 "10Be"
"s05" 60 129 323967 5454 "10Be"
In [6]:
%%writefile profiles_data/belleroche_10Be_settings.yaml
# 10Be surface production rate
P_0: 5.3
# sample site latitude [degrees]
latitude: 50.48
# sample site altitude [meters]
altitude: 153.0
# pressure [hPa]
pressure: 995.004
Dataset collected in the Ourthe valley (NE Belgium). See Rixhon et al., 2011
In [7]:
%%writefile profiles_data/colonster_10Be_profile_data.csv
"sample" "depth" "depth_g-cm-2" "C" "std" "nuclide"
"s01" 450 886 118424 6928 "10Be"
"s02" 400 788 81698 5991 "10Be"
"s03" 350 689 133908 4949 "10Be"
"s04" 300 591 133243 8756 "10Be"
"s06" 200 394 255119 9940 "10Be"
"s07" 175 345 333152 11792 "10Be"
"s08" 150 295 387154 14811 "10Be"
"s09" 125 246 436636 17066 "10Be"
"s10" 100 197 710515 24670 "10Be"
In [8]:
%%writefile profiles_data/colonster_10Be_settings.yaml
# 10Be surface production rate
P_0: 5.21
# sample site latitude [degrees]
latitude: 50.58
# sample site altitude [meters]
altitude: 134.0
# pressure [hPa]
pressure: 997.256
Dataset collected in the Meuse valley (NE Belgium). See Rixhon et al., 2011
In [9]:
%%writefile profiles_data/romont_10Be_26Al_profile_data.csv
"sample" "depth" "depth_g-cm-2" "C" "std" "nuclide"
"s01" 750 1500 193732 5114 "10Be"
"s02" 650 1300 261858 6039 "10Be"
"s03" 550 1100 136098 13147 "10Be"
"s04" 450 900 186859 5153 "10Be"
"s05" 350 700 333915 7973 "10Be"
"s06" 310 620 654394 10387 "10Be"
"s07" 750 1500 702042 33437 "26Al"
"s08" 650 1300 992018 49251 "26Al"
"s09" 550 1100 467655 39998 "26Al"
"s10" 450 900 923354 45139 "26Al"
"s11" 350 700 1489555 126714 "26Al"
"s12" 310 620 2573447 301870 "26Al"
In [10]:
%%writefile profiles_data/romont_10Be_26Al_settings.yaml
# 10Be surface production rate
P_0: 5.09
# sample site latitude [degrees]
latitude: 50.78
# sample site altitude [meters]
altitude: 109.0
# pressure [hPa]
pressure: 1000.224