What is the probability that core PSD rises by >2x as a function of maximum seed PSD

This uses the data from Boyd et al 2016 Figure 5 left.

Boyd, A. J., H. E. Spence, C‐L. Huang, G. D. Reeves, D. N. Baker, D. L. Turner, S. G. Claudepierre, J. F. Fennell, J. B. Blake, and Y. Y. Shprits. "Statistical properties of the radiation belt seed population." Journal of Geophysical Research: Space Physics (2016).

http://onlinelibrary.wiley.com/doi/10.1002/2016JA022652/full

Method

Using a statistical model of the relationship between the change of Core PSD being >2x.

We model the number of Core PSD events being >2x as a random sample from a binomial distribution, where a success is >x2 and failure is <2x.

We are to belive that the increase is dependent on the log of the maximum seed poopulation.

$ change \sim Bin(n,p) \\ logit(p) = \alpha + \beta x \\ a \sim N(0,5) \\ \beta \sim N(0,10) $

where we set vague priors for $\alpha$ and $\beta$, the parameters for the logistic model.

This is the same technique used in the estimation of deaths due to a concentration of a chemical.


In [1]:
# http://onlinelibrary.wiley.com/doi/10.1002/2016JA022652/epdf


import pymc3
from pprint import pprint
import numpy as np
import matplotlib.pyplot as plt
import spacepy.plot as spp
import seaborn as sns
import tqdm
sns.set(font_scale=1.5)


Bad key "axes.color_cycle" on line 17 in
/Users/balarsen/Library/Python/3.6/lib/python/site-packages/spacepy/data/spacepy.mplstyle.
You probably need to get an updated matplotlibrc file from
http://github.com/matplotlib/matplotlib/blob/master/matplotlibrc.template
or from the matplotlib source distribution

Data we manually extracted by clicking on the plot


In [2]:
# fogure 5 left

data = np.asarray([1.3510754E-5,0.19167218,
1.3095075E-5, 0.35914207,
1.7969156E-5, 0.5315465,
2.2680122E-5, 1.088768,
2.658636E-5, 0.8725469,
2.1551059E-5, 0.37572044,
2.4141058E-5, 0.32416967,
2.7690547E-5, 0.15037514,
3.138562E-5, 0.9466703,
4.2754244E-5, 1.2086405,
4.828105E-5, 0.7589996,
5.0161325E-5, 0.5774985,
3.9698883E-5, 0.5401454,
6.525233E-5, 1.1917208,
9.534515E-5, 0.41458386,
9.037694E-5, 0.6177862,
8.269145E-5, 0.22119954,
7.3339E-5, 0.13988428,
8.5937325E-5, 0.14095742,
1.252545E-4, 0.21971767,
1.4911826E-4, 0.14004362,
1.745245E-4, 0.2889438,
3.0742266E-4, 0.29788056,
3.4140234E-4, 0.49602887,
6.10676E-4, 0.44113842,
6.0555077E-4, 0.7453803,
3.8783226E-4, 0.81389874,
2.741248E-4, 0.68632746,
2.680259E-4, 0.62807685,
2.121458E-4, 0.54966176,
1.79686E-4, 0.5414554,
1.5327778E-4, 0.69589466,
2.2869675E-4, 0.686128,
2.1195684E-4, 0.9356477,
1.1749988E-4, 1.7908369,
2.4633156E-4, 1.4154592,
3.088986E-4, 1.547241,
5.2398414E-4, 1.5600363,
6.370312E-4, 2.7159016,
4.570722E-4, 2.4118235,
4.6060426E-4, 2.207232,
3.7848152E-4, 2.3236525,
3.2301247E-4, 2.2388134,
2.8843267E-4, 2.2219303,
3.0629092E-4, 2.6927452,
2.1488604E-4, 2.124584,
1.3252324E-4, 2.3717327,
9.4346244E-5, 2.459642,
1.4278099E-4, 4.158814,
2.0521968E-4, 3.285094,
2.9920225E-4, 4.2570686,
3.4014977E-4, 4.4509373,
6.175848E-4, 4.389852,
6.759791E-4, 5.089608,
7.91336E-4, 9.125814,
4.80572E-4, 12.163501,
4.4243436E-4, 9.816401,
3.7477125E-4, 9.182468,
3.2482846E-4, 7.141151,
2.8556274E-4, 9.594481,
2.1116872E-4, 8.6475115,
2.8320463E-4, 14.946182,
3.6879437E-4, 16.459846,
5.023681E-4, 21.327564,
3.4694388E-4, 24.708984,
3.7682228E-4, 32.004616,
5.4524664E-4, 42.090816,
])
data = data.reshape((-1,2))
seed = np.log10(data[:,0])
change = data[:,1]
change_bin = (change > 2).astype(int)
# need to compute a probability of change in max seed bins
seed_bins = np.linspace(np.log10(1e-5),np.log10(1e-3), 10)
seed_digs = np.digitize(seed, seed_bins)
seed_bins_centers = np.diff(seed_bins)/2 + seed_bins[:-1]

print('seed_bins_centers', len(seed_bins_centers), seed_bins_centers)
print(seed_digs)
n_pts = []
for v in np.unique(seed_digs):
    n_pts.append((v, (seed_digs==v).sum() ))
print('n_pts', n_pts)
successes = []
for ind,v in n_pts:
    mask = seed_digs == ind
    successes.append(change_bin[mask].sum())
    

print('successes', len(successes), successes)

n_pts = np.asarray([v[1] for v in n_pts])
print('n_pts', len(n_pts), n_pts)
seed_bins_centers = seed_bins_centers[seed_digs.min()-1:]


seed_bins_centers 9 [-4.88888889 -4.66666667 -4.44444444 -4.22222222 -4.         -3.77777778
 -3.55555556 -3.33333333 -3.11111111]
[1 1 2 2 2 2 2 2 3 3 4 4 3 4 5 5 5 4 5 5 6 6 7 7 9 9 8 7 7 6 6 6 7 6 5 7 7
 8 9 8 8 8 7 7 7 6 6 5 6 6 7 7 9 9 9 8 8 8 7 7 6 7 8 8 7 8 8]
n_pts [(1, 2), (2, 6), (3, 3), (4, 4), (5, 7), (6, 11), (7, 16), (8, 12), (9, 6)]
successes 9 [0, 0, 0, 0, 1, 5, 9, 10, 4]
n_pts 9 [ 2  6  3  4  7 11 16 12  6]

Oberved Data:

  • n_pts : the number of points in each of the 10 log spaced Maximum Seed PSD bins
  • successes : the number of events in each Maximum Seed PSD bins where the increase was >2x
  • seed_bins_centers : the centers of the Maximum Seed PSD bins

Setup the Bayesian model accorind to the description above. Run the Markov chain monte carlo (MCMC) to sample the posterior distributions for $\alpha$ and $\beta$


In [5]:
for i, j in zip(successes, n_pts):
    print(i,j)


0 2
0 6
0 3
0 4
1 7
5 11
9 16
10 12
4 6

In [3]:
# define priors
# these are wide uninformative priors

with pymc3.Model() as model:
    alpha = pymc3.Uniform('alpha', 0,40)
    beta = pymc3.Uniform('beta', 0,40)

    # define likelihood
    p = pymc3.math.invlogit(alpha + beta*seed_bins_centers)
    print('n_pts', n_pts, 'p', p, 'successes', successes, )

    y = pymc3.Binomial('y_obs', n=n_pts, p=p, observed=successes)

    # inference
    trace = pymc3.sample(iter=501000, njobs=7, burn=1000, burn_till_tuned=True, thin=80)


n_pts [ 2  6  3  4  7 11 16 12  6] p Elemwise{add,no_inplace}.0 successes [0, 0, 0, 0, 1, 5, 9, 10, 4]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (7 chains in 7 jobs)
NUTS: [beta, alpha]
Sampling 7 chains: 100%|██████████| 7000/7000 [00:07<00:00, 916.58draws/s] 
The acceptance probability does not match the target. It is 0.8967986069084861, but should be close to 0.8. Try to increase the number of tuning steps.
The acceptance probability does not match the target. It is 0.8918099041047935, but should be close to 0.8. Try to increase the number of tuning steps.
The number of effective samples is smaller than 25% for some parameters.

Make diagnostic plots of the posteriour distributions as created using MCMC.


In [4]:
pymc3.traceplot(trace, combined=True);



In [5]:
pymc3.summary(trace)


Out[5]:
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
alpha 15.037148 3.919277 0.141247 6.608678 22.288026 617.087353 1.002546
beta 4.156929 1.080334 0.038981 1.862529 6.195809 613.758261 1.002663

In [6]:
xp = np.linspace(seed_bins_centers.min(), seed_bins_centers.max(), 100)
a = trace['alpha'].mean()
b = trace['beta'].mean()
plt.plot(xp, pymc3.math.invlogit(a + b*xp).eval())
plt.scatter(seed_bins_centers, successes/n_pts, s=50);
plt.xlabel('Log Maximum Seed PSD')
plt.ylabel('Chance of >2x change\n in Core PSD')
plt.gca().ticklabel_format(useOffset=False)


This is the answer. For the collected data there is a weak cutoff in the Maximum Seed PSD where the probability increases. This speaks to the "necessary but not sufficient" nature of the relationship. Other parameters can be added to the model to improve in the "necessary and sufficient" direction as they are theorized.

Also the deviation between the data and the model at low seed PSD speaks to this being more of a cutoff at the low end than a system with the more seed PSD the greater chance of a Core PSD change. However the model fits well at higher seed PSD which may well inform that once the threshold is met the probability does scale as a Logit system with binomial probability.


In [8]:
# one should be able to get estimates of the line uncertainity
ilu = np.empty((1000, len(xp)), dtype=float)
for ii, v in tqdm.tqdm(enumerate(np.random.randint(0, len(trace), 1000))):
    ilu[ii] = [pymc3.math.invlogit(trace['alpha'][v] + trace['beta'][v]*xx).eval() for xx in xp]


0it [00:00, ?it/s]
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-8-e0cf67a5a23a> in <module>()
      2 ilu = np.empty((1000, len(xp)), dtype=float)
      3 for ii, v in tqdm.tqdm(enumerate(np.random.randint(0, len(trace), 1000))):
----> 4     ilu[ii] = [pymc3.math.invlogit(trace['alpha'][v] + trace['beta'][v]*xx).eval() for xx in xp]

<ipython-input-8-e0cf67a5a23a> in <listcomp>(.0)
      2 ilu = np.empty((1000, len(xp)), dtype=float)
      3 for ii, v in tqdm.tqdm(enumerate(np.random.randint(0, len(trace), 1000))):
----> 4     ilu[ii] = [pymc3.math.invlogit(trace['alpha'][v] + trace['beta'][v]*xx).eval() for xx in xp]

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymc3/math.py in invlogit(x, eps)
    126 def invlogit(x, eps=sys.float_info.epsilon):
    127     """The inverse of the logit function, 1 / (1 + exp(-x))."""
--> 128     return (1. - 2. * eps) / (1. + tt.exp(-x)) + eps
    129 
    130 

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    668                 # compute output value once with test inputs to validate graph
    669                 thunk = node.op.make_thunk(node, storage_map, compute_map,
--> 670                                            no_recycling=[])
    671                 thunk.inputs = [storage_map[v] for v in node.inputs]
    672                 thunk.outputs = [storage_map[v] for v in node.outputs]

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/op.py in make_thunk(self, node, storage_map, compute_map, no_recycling, impl)
    953             try:
    954                 return self.make_c_thunk(node, storage_map, compute_map,
--> 955                                          no_recycling)
    956             except (NotImplementedError, utils.MethodNotDefined):
    957                 # We requested the c code, so don't catch the error.

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/op.py in make_c_thunk(self, node, storage_map, compute_map, no_recycling)
    856         _logger.debug('Trying CLinker.make_thunk')
    857         outputs = cl.make_thunk(input_storage=node_input_storage,
--> 858                                 output_storage=node_output_storage)
    859         thunk, node_input_filters, node_output_filters = outputs
    860 

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/cc.py in make_thunk(self, input_storage, output_storage, storage_map, keep_lock)
   1215         cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
   1216             input_storage, output_storage, storage_map,
-> 1217             keep_lock=keep_lock)
   1218 
   1219         res = _CThunk(cthunk, init_tasks, tasks, error_storage, module)

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/cc.py in __compile__(self, input_storage, output_storage, storage_map, keep_lock)
   1155                                             output_storage,
   1156                                             storage_map,
-> 1157                                             keep_lock=keep_lock)
   1158         return (thunk,
   1159                 module,

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/cc.py in cthunk_factory(self, error_storage, in_storage, out_storage, storage_map, keep_lock)
   1618                 node.op.prepare_node(node, storage_map, None, 'c')
   1619             module = get_module_cache().module_from_key(
-> 1620                 key=key, lnk=self, keep_lock=keep_lock)
   1621 
   1622         vars = self.inputs + self.outputs + self.orphans

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/cmodule.py in module_from_key(self, key, lnk, keep_lock)
   1145         # Is the source code already in the cache?
   1146         module_hash = get_module_hash(src_code, key)
-> 1147         module = self._get_from_hash(module_hash, key, keep_lock=keep_lock)
   1148         if module is not None:
   1149             return module

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/cmodule.py in _get_from_hash(self, module_hash, key, keep_lock)
   1055                 if (key[0] and not key_broken and
   1056                         self.check_for_broken_eq):
-> 1057                     self.check_key(key, key_data.key_pkl)
   1058             self._update_mappings(key, key_data, module.__file__, check_in_keys=not key_broken)
   1059             return module

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/cmodule.py in check_key(self, key, key_pkl)
   1238                 time.sleep(2)
   1239 
-> 1240         found = sum(key == other_key for other_key in key_data.keys)
   1241         msg = ''
   1242         if found == 0:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/cmodule.py in <genexpr>(.0)
   1238                 time.sleep(2)
   1239 
-> 1240         found = sum(key == other_key for other_key in key_data.keys)
   1241         msg = ''
   1242         if found == 0:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/utils.py in __eq__(self, other)
    196                     return (type(self) == type(other) and
    197                             tuple(getattr(self, a) for a in props) ==
--> 198                             tuple(getattr(other, a) for a in props))
    199                 dct['__eq__'] = __eq__
    200 

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_collections_abc.py in __eq__(self, other)
    685         if not isinstance(other, Mapping):
    686             return NotImplemented
--> 687         return dict(self.items()) == dict(other.items())
    688 
    689     __reversed__ = None

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_collections_abc.py in items(self)
    676     def items(self):
    677         "D.items() -> a set-like object providing a view on D's items"
--> 678         return ItemsView(self)
    679 
    680     def values(self):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_collections_abc.py in __init__(self, mapping)
    696     __slots__ = '_mapping',
    697 
--> 698     def __init__(self, mapping):
    699         self._mapping = mapping
    700 

KeyboardInterrupt: 

In [18]:
[pymc3.math.invlogit(trace['alpha'][v] + trace['beta'][v]*xp[i]).eval() for i in range(len(xp))]


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-18-65eb161bd28b> in <module>()
----> 1 [pymc3.math.invlogit(trace['alpha'][v] + trace['beta'][v]*xp[i]).eval() for i in range(len(xp))]

<ipython-input-18-65eb161bd28b> in <listcomp>(.0)
----> 1 [pymc3.math.invlogit(trace['alpha'][v] + trace['beta'][v]*xp[i]).eval() for i in range(len(xp))]

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pymc3/math.py in invlogit(x, eps)
    126 def invlogit(x, eps=sys.float_info.epsilon):
    127     """The inverse of the logit function, 1 / (1 + exp(-x))."""
--> 128     return (1. - 2. * eps) / (1. + tt.exp(-x)) + eps
    129 
    130 

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    668                 # compute output value once with test inputs to validate graph
    669                 thunk = node.op.make_thunk(node, storage_map, compute_map,
--> 670                                            no_recycling=[])
    671                 thunk.inputs = [storage_map[v] for v in node.inputs]
    672                 thunk.outputs = [storage_map[v] for v in node.outputs]

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/op.py in make_thunk(self, node, storage_map, compute_map, no_recycling, impl)
    953             try:
    954                 return self.make_c_thunk(node, storage_map, compute_map,
--> 955                                          no_recycling)
    956             except (NotImplementedError, utils.MethodNotDefined):
    957                 # We requested the c code, so don't catch the error.

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/op.py in make_c_thunk(self, node, storage_map, compute_map, no_recycling)
    856         _logger.debug('Trying CLinker.make_thunk')
    857         outputs = cl.make_thunk(input_storage=node_input_storage,
--> 858                                 output_storage=node_output_storage)
    859         thunk, node_input_filters, node_output_filters = outputs
    860 

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/cc.py in make_thunk(self, input_storage, output_storage, storage_map, keep_lock)
   1215         cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
   1216             input_storage, output_storage, storage_map,
-> 1217             keep_lock=keep_lock)
   1218 
   1219         res = _CThunk(cthunk, init_tasks, tasks, error_storage, module)

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/cc.py in __compile__(self, input_storage, output_storage, storage_map, keep_lock)
   1155                                             output_storage,
   1156                                             storage_map,
-> 1157                                             keep_lock=keep_lock)
   1158         return (thunk,
   1159                 module,

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/cc.py in cthunk_factory(self, error_storage, in_storage, out_storage, storage_map, keep_lock)
   1618                 node.op.prepare_node(node, storage_map, None, 'c')
   1619             module = get_module_cache().module_from_key(
-> 1620                 key=key, lnk=self, keep_lock=keep_lock)
   1621 
   1622         vars = self.inputs + self.outputs + self.orphans

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/cmodule.py in module_from_key(self, key, lnk, keep_lock)
   1145         # Is the source code already in the cache?
   1146         module_hash = get_module_hash(src_code, key)
-> 1147         module = self._get_from_hash(module_hash, key, keep_lock=keep_lock)
   1148         if module is not None:
   1149             return module

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/cmodule.py in _get_from_hash(self, module_hash, key, keep_lock)
   1045             with compilelock.lock_ctx(keep_lock=keep_lock):
   1046                 try:
-> 1047                     key_data.add_key(key, save_pkl=bool(key[0]))
   1048                     key_broken = False
   1049                 except pickle.PicklingError:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/theano/gof/cmodule.py in add_key(self, key, save_pkl)
    506 
    507         """
--> 508         assert key not in self.keys
    509         self.keys.add(key)
    510         if save_pkl:

AssertionError: 

In [ ]:


In [ ]:
xp = np.linspace(seed_bins_centers.min(), seed_bins_centers.max(), 100)
for v in ilu:
    plt.plot(xp, v, alpha=.01, c='r')

    
xp = np.linspace(seed_bins_centers.min(), seed_bins_centers.max(), 100)
a = trace['alpha'].mean()
b = trace['beta'].mean()
plt.plot(xp, pymc3.math.invlogit(a + b*xp).eval())
plt.scatter(seed_bins_centers, successes/n_pts, s=50);
plt.xlabel('Log Maximum Seed PSD')
plt.ylabel('Chance of >2x change\n in Core PSD')
plt.gca().ticklabel_format(useOffset=False)

Same as pervios figure with the red lines overlayed as 100 joint draws from the model posterior in order to show spread.


In [ ]:


In [ ]: