I have started 11 IPython engines in the terminal with the following command:

ipcluster start -n 11 &

import dadi on all engines


In [1]:
from ipyparallel import Client

cl = Client()

cl.ids


Out[1]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [2]:
# clear the namespace in engines
cl.clear()


Out[2]:
<tornado.concurrent.Future at 0x7facc0736a50>

In [3]:
%px %who


[stdout:0] Interactive namespace is empty.
[stdout:1] Interactive namespace is empty.
[stdout:2] Interactive namespace is empty.
[stdout:3] Interactive namespace is empty.
[stdout:4] Interactive namespace is empty.
[stdout:5] Interactive namespace is empty.
[stdout:6] Interactive namespace is empty.
[stdout:7] Interactive namespace is empty.
[stdout:8] Interactive namespace is empty.
[stdout:9] Interactive namespace is empty.
[stdout:10] Interactive namespace is empty.

In [4]:
%%px --local

# run whole cell on all engines a well as in the local IPython session

import numpy as np

import sys

sys.path.insert(0, '/home/claudius/Downloads/dadi')

import dadi

load 1D folded spectra


In [5]:
%%px --local

# import 1D spectrum of ery on all engines:
fs_ery = dadi.Spectrum.from_file('dadiExercises/ERY.FOLDED.sfs.dadi_format')

# import 1D spectrum of ery on all engines:
fs_par = dadi.Spectrum.from_file('dadiExercises/PAR.FOLDED.sfs.dadi_format')

In [7]:
fs_par


Out[7]:
Spectrum([-- 8409.25697 12008.479728 5476.826032 3090.791731 2755.941992 1804.37826
 357.199759 2552.273719 832.442184 873.647652 0.001111 2420.101426 0.010094
 910.498821 0.725863 1506.959669 232.289352 523.880826 -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- --], folded=True, pop_ids=['par'])

In [6]:
%%px --local

ns = fs_ery.sample_sizes # both populations have the same sample size

fs_ery.pop_ids = ['ery']
fs_par.pop_ids = ['par']

# setting the smallest grid size slightly larger than the largest population sample size (36)
pts_l = [50, 60, 70]

define range of initial parameters


In [7]:
%psource dadi.Demographics1D.two_epoch

The built-in two epoch model defines a piecewise history, where at some time $T$ in the past the ancestral population instantaneously changed in size to the contemporary population size, which has ratio of $\nu$ to the ancestral population size.

Note, that $\nu$ and $T$ are confounded parameters, meaning that different combinations of $\nu$ and $T$ can lead to very similar site frequency spectra. For instance, a strong but recent bottleneck can produce a similar SFS to an ancient but weak bottleneck.


In [7]:
%%px --local

# create link to function that specifies a simple growth or decline model
func = dadi.Demographics1D.two_epoch

# create extrapolating version of the function
func_ex = dadi.Numerics.make_extrap_log_func(func)

# set lower and upper bounds to nu and T
upper_bound = [1e3, 4]
lower_bound = [1e-3, 1e-4]

In [8]:
# create range of starting values for nu and T, evenly distributed in log space
p0_nu = np.logspace(-3, 3, base=10.0, num=14)
p0_T = np.logspace(-4, np.log10(4), base=10, num=14)

In [9]:
print p0_nu[0], p0_nu[-1]
print p0_T[0], p0_T[-1]


0.001 1000.0
0.0001 4.0

Note, this also includes parameters at the boundaries I set above.


In [10]:
print "There are %d parameter combinations." % (14*14)


There are 196 parameter combinations.

In [11]:
%px %who


[stdout:0] 
dadi	 fs_ery	 fs_par	 func	 func_ex	 lower_bound	 np	 ns	 pts_l	 
sys	 upper_bound	 
[stdout:1] 
dadi	 fs_ery	 fs_par	 func	 func_ex	 lower_bound	 np	 ns	 pts_l	 
sys	 upper_bound	 
[stdout:2] 
dadi	 fs_ery	 fs_par	 func	 func_ex	 lower_bound	 np	 ns	 pts_l	 
sys	 upper_bound	 
[stdout:3] 
dadi	 fs_ery	 fs_par	 func	 func_ex	 lower_bound	 np	 ns	 pts_l	 
sys	 upper_bound	 
[stdout:4] 
dadi	 fs_ery	 fs_par	 func	 func_ex	 lower_bound	 np	 ns	 pts_l	 
sys	 upper_bound	 
[stdout:5] 
dadi	 fs_ery	 fs_par	 func	 func_ex	 lower_bound	 np	 ns	 pts_l	 
sys	 upper_bound	 
[stdout:6] 
dadi	 fs_ery	 fs_par	 func	 func_ex	 lower_bound	 np	 ns	 pts_l	 
sys	 upper_bound	 
[stdout:7] 
dadi	 fs_ery	 fs_par	 func	 func_ex	 lower_bound	 np	 ns	 pts_l	 
sys	 upper_bound	 
[stdout:8] 
dadi	 fs_ery	 fs_par	 func	 func_ex	 lower_bound	 np	 ns	 pts_l	 
sys	 upper_bound	 
[stdout:9] 
dadi	 fs_ery	 fs_par	 func	 func_ex	 lower_bound	 np	 ns	 pts_l	 
sys	 upper_bound	 
[stdout:10] 
dadi	 fs_ery	 fs_par	 func	 func_ex	 lower_bound	 np	 ns	 pts_l	 
sys	 upper_bound	 

define parallel functions


In [12]:
# create load balanced view of engines

lbview = cl.load_balanced_view()

In [13]:
%who


Client	 cl	 dadi	 fs_ery	 fs_par	 func	 func_ex	 lbview	 lower_bound	 
np	 ns	 p0_T	 p0_nu	 pts_l	 sys	 upper_bound	 

In [14]:
def run_dadi(p_init): # for the function to be called with map, it needs to have one input variable
    """
    p_init: initial parameter values to run optimisation from
    """
    if perturb == True:
        p_init = dadi.Misc.perturb_params(p_init, fold=fold, 
                                      upper_bound=upper_bound, lower_bound=lower_bound)
        # note upper_bound and lower_bound variables are expected to be in the namespace of each engine
    # run optimisation of paramters
    popt = dadi_opt_func(p0=p_init, data=sfs, model_func=func_ex, pts=pts_l, \
                                   lower_bound=lower_bound, upper_bound=upper_bound, \
                                   verbose=verbose, maxiter=maxiter, full_output=full_output)
    return popt

run optimisation


In [15]:
from itertools import product

In [16]:
%%px

# set up global variables on engines required for run_dadi function call

dadi_opt_func = dadi.Inference.optimize_log # uses gradient based BFGS algorithm
sfs = fs_ery # use ERY spectrum
perturb = False
fold = 1
maxiter = 10 # run a maximum of 10 iterations
verbose = 0
full_output = True # need to have full output to get the warnflags (see below)

In [17]:
%px who


[stdout:0] 
dadi	 dadi_opt_func	 fold	 fs_ery	 fs_par	 full_output	 func	 func_ex	 lower_bound	 
maxiter	 np	 ns	 perturb	 pts_l	 sfs	 sys	 upper_bound	 verbose	 

[stdout:1] 
dadi	 dadi_opt_func	 fold	 fs_ery	 fs_par	 full_output	 func	 func_ex	 lower_bound	 
maxiter	 np	 ns	 perturb	 pts_l	 sfs	 sys	 upper_bound	 verbose	 

[stdout:2] 
dadi	 dadi_opt_func	 fold	 fs_ery	 fs_par	 full_output	 func	 func_ex	 lower_bound	 
maxiter	 np	 ns	 perturb	 pts_l	 sfs	 sys	 upper_bound	 verbose	 

[stdout:3] 
dadi	 dadi_opt_func	 fold	 fs_ery	 fs_par	 full_output	 func	 func_ex	 lower_bound	 
maxiter	 np	 ns	 perturb	 pts_l	 sfs	 sys	 upper_bound	 verbose	 

[stdout:4] 
dadi	 dadi_opt_func	 fold	 fs_ery	 fs_par	 full_output	 func	 func_ex	 lower_bound	 
maxiter	 np	 ns	 perturb	 pts_l	 sfs	 sys	 upper_bound	 verbose	 

[stdout:5] 
dadi	 dadi_opt_func	 fold	 fs_ery	 fs_par	 full_output	 func	 func_ex	 lower_bound	 
maxiter	 np	 ns	 perturb	 pts_l	 sfs	 sys	 upper_bound	 verbose	 

[stdout:6] 
dadi	 dadi_opt_func	 fold	 fs_ery	 fs_par	 full_output	 func	 func_ex	 lower_bound	 
maxiter	 np	 ns	 perturb	 pts_l	 sfs	 sys	 upper_bound	 verbose	 

[stdout:7] 
dadi	 dadi_opt_func	 fold	 fs_ery	 fs_par	 full_output	 func	 func_ex	 lower_bound	 
maxiter	 np	 ns	 perturb	 pts_l	 sfs	 sys	 upper_bound	 verbose	 

[stdout:8] 
dadi	 dadi_opt_func	 fold	 fs_ery	 fs_par	 full_output	 func	 func_ex	 lower_bound	 
maxiter	 np	 ns	 perturb	 pts_l	 sfs	 sys	 upper_bound	 verbose	 

[stdout:9] 
dadi	 dadi_opt_func	 fold	 fs_ery	 fs_par	 full_output	 func	 func_ex	 lower_bound	 
maxiter	 np	 ns	 perturb	 pts_l	 sfs	 sys	 upper_bound	 verbose	 

[stdout:10] 
dadi	 dadi_opt_func	 fold	 fs_ery	 fs_par	 full_output	 func	 func_ex	 lower_bound	 
maxiter	 np	 ns	 perturb	 pts_l	 sfs	 sys	 upper_bound	 verbose	 


In [31]:
# run optimnisations with all combinations of starting values

#ar_ery = lbview.map(run_dadi, product(p0_nu, p0_T), block=False, order=True)

In [34]:
ar_ery.progress


Out[34]:
196

There were 196 combinations of starting values.


In [35]:
ar_ery.ready()


Out[35]:
True

In [36]:
ar_ery.wall_time


Out[36]:
114.139403

In [25]:
from collections import defaultdict

In [38]:
warnflag = defaultdict(int)

for res in ar_ery:
    if res[6] == 1:
        warnflag[1] +=1
    elif res[6] == 2:
        warnflag[2] += 1
    elif res[6] == 0:
        warnflag[0] += 1
    else:
        warnflag[999] +=1

In [39]:
print "success", warnflag[0]
print "Maximum number of iterations exceeded.", warnflag[1]
print "Gradient and/or function calls not changing.", warnflag[2]
print "unknown flag", warnflag[999]


success 21
Maximum number of iterations exceeded. 28
Gradient and/or function calls not changing. 147
unknown flag 0

In [21]:
from scipy.optimize import fmin_bfgs

In [22]:
# this is the function that dadi calls

?fmin_bfgs

# the warnflag has index 6 in the returned array

Only 21 of the 196 starting parameter combinations lead to a successfull optimisation. I could increase that by 28 if I allowed enough iterations. However, I don't know what to do about the 147 combinations for which the gradient didn't change. Maybe they are in a very flat part of the likelihood surface?


In [45]:
from itertools import izip

In [48]:
for p_init, popt in izip(product(p0_nu, p0_T), ar_ery):
    if popt[6] == 0:
        print p_init, popt[0], popt[6]


(0.001, 0.030062999042780213) [ 0.001     0.030063] 0
(0.001, 0.067926137161718539) [ 0.001       0.06792614] 0
(0.001, 0.15347637483362347) [ 0.001       0.15347637] 0
(0.001, 0.34677369590428986) [ 0.001      0.3467737] 0
(0.001, 0.78352121817758968) [ 0.001       0.78352122] 0
(0.001, 1.770334678164093) [  1.00000000e-03   1.77033468e+00] 0
(0.0028942661247167516, 0.15347637483362347) [ 0.00289427  0.15347637] 0
(0.0028942661247167516, 0.34677369590428986) [ 0.00289427  0.3467737 ] 0
(0.0028942661247167516, 0.78352121817758968) [ 0.00289427  0.78352122] 0
(0.0028942661247167516, 1.770334678164093) [ 0.00289427  1.77033468] 0
(0.008376776400682925, 0.013305392433761934) [ 0.00172635  1.29841674] 0
(0.008376776400682925, 0.34677369590428986) [ 0.00837678  0.3467737 ] 0
(0.008376776400682925, 0.78352121817758968) [ 0.00837678  0.78352122] 0
(0.008376776400682925, 1.770334678164093) [ 0.00837678  1.77033468] 0
(0.024244620170823284, 0.030062999042780213) [ 0.01737541  0.80792208] 0
(0.024244620170823284, 0.067926137161718539) [ 0.00438425  1.01053785] 0
(0.024244620170823284, 0.78352121817758968) [ 0.02424462  0.78352122] 0
(0.024244620170823284, 1.770334678164093) [ 0.02424462  1.77033468] 0
(0.070170382867038292, 1.770334678164093) [ 0.07017038  1.77033468] 0
(4.9238826317067419, 0.78352121817758968) [ 0.05039376  1.66891358] 0
(14.251026703029993, 0.78352121817758968) [ 0.00347941  0.13659009] 0

Of the runs that did return an optimisation result, most just return the initial parameter values.


In [20]:
?dadi.Inference.optimize_lbfgsb

In [ ]:
?dadi.Inference.optimize_grid

In [21]:
?dadi.Inference.optimize_log

In [22]:
?dadi.Inference.optimize_log_fmin

In [24]:
?dadi.Inference.optimize_log_powell

I have decided to change to the Nelder-Mead optimisation algorithm.


In [25]:
%%px

# use Nelder-Mead algorithm
dadi_opt_func = dadi.Inference.optimize_log_fmin

In [72]:
# do same optimisation as above, but now with Nelder-Mead algorithm

#ar_ery = lbview.map(run_dadi, product(p0_nu, p0_T), block=False, order=True)

In [82]:
ar_ery.progress


Out[82]:
196

In [90]:
ar_ery.wall_time


Out[90]:
577.460715

Note, that Nelder-Mead is much slower.


In [23]:
%psource dadi.Inference.optimize_log_fmin

In [ ]:
warnflag = defaultdict(int)

for res in ar_ery:
    if res[4] == 1:
        warnflag[1] +=1
    elif res[4] == 2:
        warnflag[2] += 1
    elif res[4] == 0:
        warnflag[0] += 1
    else:
        warnflag[999] +=1
        
print "success", warnflag[0]
print "Maximum number of function evaluations made.", warnflag[1]
print "Maximum number of iterations reached.", warnflag[2]
print "unknown flag", warnflag[999]

None of the runs reached convergence and returned a result.


In [91]:
cl[:]['maxiter']


Out[91]:
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10]

maxiter still set to 10 on all engines.


In [92]:
%%px

# increase maximum number of iterations
maxiter = 100

In [93]:
# run optimisation as above, but now with up to 100 iterations per optimisation

#ar_ery = lbview.map(run_dadi, product(p0_nu, p0_T), block=False, order=True)

In [97]:
ar_ery.progress


Out[97]:
196

In [103]:
ar_ery.wall_time/60


Out[103]:
17.11536005

This took 17 minutes to finish.


In [146]:
def get_flag_count(ar, NM=True)
    """
    ar: asyncresult object from BFGS or Nelder-Mead optimisation
    """
    if NM: # if ar from Nelder-Mead
        i = 4 # the warnflag is reported at index position 4 in the output array
    else: # ar from BFGS optimisation
        i = 6
    
    warnflag = defaultdict(int)

    for res in ar:
        if res[i] == 1:
            warnflag[1] +=1
        elif res[i] == 2:
            warnflag[2] += 1
        elif res[i] == 0:
            warnflag[0] += 1
        else:
            warnflag[999] +=1
    if NM:
        print "success", warnflag[0]
        print "Maximum number of function evaluations made.", warnflag[1]
        print "Maximum number of iterations reached.", warnflag[2]
        print "unknown flag", warnflag[999]
    else:
        print "success", warnflag[0]
        print "Maximum number of iterations exceeded.", warnflag[1]
        print "Gradient and/or function calls not changing.", warnflag[2]
        print "unknown flag", warnflag[999]


success 155
Maximum number of function evaluations made. 0
Maximum number of iterations reached. 41
unknown flag 0

In [ ]:
get_flag_count(ar, NM=True)

This looks much better. I don't want to increase maxiter even more, due to the already long running time.


In [102]:
for p_init, popt in izip(product(p0_nu, p0_T), ar_ery):
    print p_init, popt[0], popt[4]


(0.001, 0.0001) [ 0.001   0.0001] 0
(0.001, 0.00022594597786155119) [ 0.0046087  0.0001   ] 0
(0.001, 0.00051051584911812533) [ 0.00460935  0.0001    ] 0
(0.001, 0.0011534900274281497) [ 0.0046101  0.0001   ] 0
(0.001, 0.002606264322008005) [ 0.0010587   0.01041429] 0
(0.001, 0.0058887494080177146) [ 0.00109955  0.01077432] 0
(0.001, 0.013305392433761934) [ 0.00114933  0.01121111] 0
(0.001, 0.030062999042780213) [ 0.00220919  0.02010065] 0
(0.001, 0.067926137161718539) [ 0.00561338  0.04580983] 0
(0.001, 0.15347637483362347) [ 0.00153996  0.14817624] 0
(0.001, 0.34677369590428986) [ 0.00152288  0.30668712] 0
(0.001, 0.78352121817758968) [ 0.00139997  0.78033341] 0
(0.001, 1.770334678164093) [  1.38237223e-03   1.77369523e+00] 0
(0.001, 4.0) [  1.00000000e-03   4.00000000e+00] 0
(0.0028942661247167516, 0.0001) [ 0.00460909  0.0001    ] 0
(0.0028942661247167516, 0.00022594597786155119) [ 0.00461009  0.0001    ] 0
(0.0028942661247167516, 0.00051051584911812533) [ 0.00460658  0.0001    ] 0
(0.0028942661247167516, 0.0011534900274281497) [ 0.00460937  0.0001    ] 0
(0.0028942661247167516, 0.002606264322008005) [ 0.00460888  0.0001    ] 0
(0.0028942661247167516, 0.0058887494080177146) [ 0.00132047  0.01269659] 0
(0.0028942661247167516, 0.013305392433761934) [ 0.00181888  0.01690434] 0
(0.0028942661247167516, 0.030062999042780213) [ 0.00305913  0.02683358] 0
(0.0028942661247167516, 0.067926137161718539) [ 0.00592643  0.04804047] 0
(0.0028942661247167516, 0.15347637483362347) [ 0.00161322  0.17280243] 0
(0.0028942661247167516, 0.34677369590428986) [ 0.00161299  0.35778062] 0
(0.0028942661247167516, 0.78352121817758968) [ 0.00161322  0.78711341] 0
(0.0028942661247167516, 1.770334678164093) [  1.61321521e-03   1.67317177e+00] 0
(0.0028942661247167516, 4.0) [  1.61298502e-03   3.63581873e+00] 0
(0.008376776400682925, 0.0001) [ 0.00460976  0.0001    ] 0
(0.008376776400682925, 0.00022594597786155119) [ 0.00460781  0.0001    ] 0
(0.008376776400682925, 0.00051051584911812533) [ 0.004611  0.0001  ] 0
(0.008376776400682925, 0.0011534900274281497) [ 0.00461008  0.0001    ] 0
(0.008376776400682925, 0.002606264322008005) [ 0.00461037  0.0001    ] 0
(0.008376776400682925, 0.0058887494080177146) [ 0.00460997  0.0001    ] 0
(0.008376776400682925, 0.013305392433761934) [ 0.00385172  0.03289365] 0
(0.008376776400682925, 0.030062999042780213) [ 0.00458696  0.03836625] 0
(0.008376776400682925, 0.067926137161718539) [ 0.00824806  0.06410873] 0
(0.008376776400682925, 0.15347637483362347) [ 0.01690938  0.11911749] 0
(0.008376776400682925, 0.34677369590428986) [ 0.00659524  0.35548912] 0
(0.008376776400682925, 0.78352121817758968) [ 0.00659166  0.7931554 ] 0
(0.008376776400682925, 1.770334678164093) [ 0.00659563  1.7454361 ] 0
(0.008376776400682925, 4.0) [ 0.00659524  3.99567018] 0
(0.024244620170823284, 0.0001) [ 0.0046098  0.0001   ] 0
(0.024244620170823284, 0.00022594597786155119) [ 0.00460968  0.0001    ] 0
(0.024244620170823284, 0.00051051584911812533) [ 0.00460963  0.0001    ] 0
(0.024244620170823284, 0.0011534900274281497) [ 0.00459288  0.0001    ] 0
(0.024244620170823284, 0.002606264322008005) [ 0.00460991  0.0001    ] 0
(0.024244620170823284, 0.0058887494080177146) [ 0.0046099  0.0001   ] 0
(0.024244620170823284, 0.013305392433761934) [ 0.00460929  0.0001    ] 0
(0.024244620170823284, 0.030062999042780213) [ 0.00972447  0.07396519] 0
(0.024244620170823284, 0.067926137161718539) [ 0.01435976  0.10354574] 0
(0.024244620170823284, 0.15347637483362347) [ 0.02121163  0.14451351] 0
(0.024244620170823284, 0.34677369590428986) [ 0.04033533  0.24803044] 2
(0.024244620170823284, 0.78352121817758968) [ 0.11919076  0.59328204] 2
(0.024244620170823284, 1.770334678164093) [ 0.01524634  1.72145173] 0
(0.024244620170823284, 4.0) [ 0.01264512  3.4223801 ] 0
(0.070170382867038292, 0.0001) [ 0.00460958  0.0001    ] 0
(0.070170382867038292, 0.00022594597786155119) [ 0.00460893  0.0001    ] 0
(0.070170382867038292, 0.00051051584911812533) [ 0.00460931  0.0001    ] 0
(0.070170382867038292, 0.0011534900274281497) [ 0.00461038  0.0001    ] 0
(0.070170382867038292, 0.002606264322008005) [ 0.00460984  0.0001    ] 0
(0.070170382867038292, 0.0058887494080177146) [ 0.0046097  0.0001   ] 0
(0.070170382867038292, 0.013305392433761934) [ 0.00461111  0.0001    ] 0
(0.070170382867038292, 0.030062999042780213) [ 0.00460973  0.0001    ] 0
(0.070170382867038292, 0.067926137161718539) [ 0.02812575  0.18346831] 2
(0.070170382867038292, 0.15347637483362347) [ 0.03356945  0.21283811] 2
(0.070170382867038292, 0.34677369590428986) [ 0.05075328  0.29985432] 2
(0.070170382867038292, 0.78352121817758968) [ 0.12105528  0.60043451] 2
(0.070170382867038292, 1.770334678164093) [ 0.60519695  1.54151396] 2
(0.070170382867038292, 4.0) [ 0.06933788  3.8868293 ] 0
(0.20309176209047369, 0.0001) [ 0.00460969  0.0001    ] 0
(0.20309176209047369, 0.00022594597786155119) [ 0.00461108  0.0001    ] 0
(0.20309176209047369, 0.00051051584911812533) [ 0.00460986  0.0001    ] 0
(0.20309176209047369, 0.0011534900274281497) [ 0.00460958  0.0001    ] 0
(0.20309176209047369, 0.002606264322008005) [ 0.00460949  0.0001    ] 0
(0.20309176209047369, 0.0058887494080177146) [ 0.00461013  0.0001    ] 0
(0.20309176209047369, 0.013305392433761934) [ 0.00460974  0.0001    ] 0
(0.20309176209047369, 0.030062999042780213) [ 0.00460967  0.0001    ] 0
(0.20309176209047369, 0.067926137161718539) [ 0.00461159  0.0001    ] 0
(0.20309176209047369, 0.15347637483362347) [ 0.00461314  0.0001    ] 0
(0.20309176209047369, 0.34677369590428986) [ 0.08532162  0.4565059 ] 2
(0.20309176209047369, 0.78352121817758968) [ 0.13396667  0.6488751 ] 2
(0.20309176209047369, 1.770334678164093) [ 0.46341921  0.01674003] 2
(0.20309176209047369, 4.0) [ 0.00460987  0.0001    ] 2
(0.5878016072274912, 0.0001) [ 0.00460956  0.0001    ] 0
(0.5878016072274912, 0.00022594597786155119) [ 0.00460971  0.0001    ] 0
(0.5878016072274912, 0.00051051584911812533) [ 0.00460994  0.0001    ] 0
(0.5878016072274912, 0.0011534900274281497) [ 0.00460966  0.0001    ] 0
(0.5878016072274912, 0.002606264322008005) [ 0.00460996  0.0001    ] 0
(0.5878016072274912, 0.0058887494080177146) [ 0.00460972  0.0001    ] 0
(0.5878016072274912, 0.013305392433761934) [ 0.00460984  0.0001    ] 0
(0.5878016072274912, 0.030062999042780213) [ 0.00460995  0.0001    ] 0
(0.5878016072274912, 0.067926137161718539) [ 0.00460976  0.0001    ] 0
(0.5878016072274912, 0.15347637483362347) [ 0.0046094  0.0001   ] 0
(0.5878016072274912, 0.34677369590428986) [ 0.00460943  0.0001    ] 0
(0.5878016072274912, 0.78352121817758968) [ 0.00460858  0.0001    ] 0
(0.5878016072274912, 1.770334678164093) [ 0.92977709  0.10031859] 2
(0.5878016072274912, 4.0) [ 0.72886285  1.44876146] 2
(1.7012542798525891, 0.0001) [ 0.00461018  0.0001    ] 0
(1.7012542798525891, 0.00022594597786155119) [ 0.00460951  0.0001    ] 0
(1.7012542798525891, 0.00051051584911812533) [ 0.00460997  0.0001    ] 0
(1.7012542798525891, 0.0011534900274281497) [ 0.00460992  0.0001    ] 2
(1.7012542798525891, 0.002606264322008005) [ 0.00460978  0.0001    ] 0
(1.7012542798525891, 0.0058887494080177146) [ 0.00460938  0.0001    ] 0
(1.7012542798525891, 0.013305392433761934) [ 0.00461005  0.0001    ] 0
(1.7012542798525891, 0.030062999042780213) [ 0.00460988  0.0001    ] 0
(1.7012542798525891, 0.067926137161718539) [ 0.00460943  0.0001    ] 0
(1.7012542798525891, 0.15347637483362347) [ 0.00461027  0.0001    ] 0
(1.7012542798525891, 0.34677369590428986) [ 0.00460916  0.0001    ] 2
(1.7012542798525891, 0.78352121817758968) [ 0.00461028  0.0001    ] 2
(1.7012542798525891, 1.770334678164093) [ 0.0046096  0.0001   ] 0
(1.7012542798525891, 4.0) [ 0.41860894  1.3824916 ] 2
(4.9238826317067419, 0.0001) [ 0.00460697  0.0001    ] 0
(4.9238826317067419, 0.00022594597786155119) [ 0.00460943  0.0001    ] 0
(4.9238826317067419, 0.00051051584911812533) [ 0.00461355  0.0001    ] 2
(4.9238826317067419, 0.0011534900274281497) [ 0.00460984  0.0001    ] 0
(4.9238826317067419, 0.002606264322008005) [ 0.0045938  0.0001   ] 2
(4.9238826317067419, 0.0058887494080177146) [ 0.00462099  0.0001    ] 2
(4.9238826317067419, 0.013305392433761934) [ 0.00460959  0.0001    ] 0
(4.9238826317067419, 0.030062999042780213) [ 0.00460947  0.0001    ] 0
(4.9238826317067419, 0.067926137161718539) [ 0.00461048  0.0001    ] 0
(4.9238826317067419, 0.15347637483362347) [ 0.00460579  0.0001    ] 0
(4.9238826317067419, 0.34677369590428986) [ 0.00460935  0.0001    ] 0
(4.9238826317067419, 0.78352121817758968) [ 0.00466086  0.00010066] 2
(4.9238826317067419, 1.770334678164093) [ 0.00471884  0.0001025 ] 2
(4.9238826317067419, 4.0) [ 0.6954071   1.49326078] 2
(14.251026703029993, 0.0001) [ 0.0046093  0.0001   ] 0
(14.251026703029993, 0.00022594597786155119) [ 0.00460974  0.0001    ] 0
(14.251026703029993, 0.00051051584911812533) [ 0.00460972  0.0001    ] 2
(14.251026703029993, 0.0011534900274281497) [ 0.0046099  0.0001   ] 0
(14.251026703029993, 0.002606264322008005) [ 0.00460921  0.0001    ] 0
(14.251026703029993, 0.0058887494080177146) [ 0.00460734  0.0001    ] 0
(14.251026703029993, 0.013305392433761934) [ 0.00461017  0.0001    ] 0
(14.251026703029993, 0.030062999042780213) [ 0.00461259  0.0001    ] 0
(14.251026703029993, 0.067926137161718539) [ 0.00460922  0.0001    ] 0
(14.251026703029993, 0.15347637483362347) [ 0.00460994  0.0001    ] 0
(14.251026703029993, 0.34677369590428986) [ 0.00460992  0.0001    ] 0
(14.251026703029993, 0.78352121817758968) [ 0.09037031  0.47781   ] 2
(14.251026703029993, 1.770334678164093) [ 0.80092763  1.27287394] 2
(14.251026703029993, 4.0) [ 0.3856807   1.32675829] 2
(41.246263829013564, 0.0001) [ 0.00460919  0.0001    ] 0
(41.246263829013564, 0.00022594597786155119) [ 0.00462563  0.00010035] 0
(41.246263829013564, 0.00051051584911812533) [ 0.00460956  0.00010001] 2
(41.246263829013564, 0.0011534900274281497) [ 0.00460957  0.0001    ] 0
(41.246263829013564, 0.002606264322008005) [ 0.00461025  0.0001    ] 0
(41.246263829013564, 0.0058887494080177146) [ 0.00460862  0.0001    ] 0
(41.246263829013564, 0.013305392433761934) [ 0.00460966  0.0001    ] 0
(41.246263829013564, 0.030062999042780213) [ 0.00460971  0.0001    ] 0
(41.246263829013564, 0.067926137161718539) [ 0.00460971  0.0001    ] 0
(41.246263829013564, 0.15347637483362347) [ 0.00461003  0.0001    ] 0
(41.246263829013564, 0.34677369590428986) [ 0.00458619  0.0001    ] 0
(41.246263829013564, 0.78352121817758968) [ 0.09569484  0.49991478] 2
(41.246263829013564, 1.770334678164093) [ 0.03666728  0.00085197] 2
(41.246263829013564, 4.0) [ 0.07022516  0.00147391] 2
(119.37766417144383, 0.0001) [ 0.0046094  0.0001   ] 0
(119.37766417144383, 0.00022594597786155119) [ 0.00461883  0.00010019] 0
(119.37766417144383, 0.00051051584911812533) [ 0.00461174  0.00010005] 0
(119.37766417144383, 0.0011534900274281497) [ 0.00460987  0.00010001] 2
(119.37766417144383, 0.002606264322008005) [ 0.00461266  0.00010007] 0
(119.37766417144383, 0.0058887494080177146) [ 0.00459816  0.0001    ] 0
(119.37766417144383, 0.013305392433761934) [ 0.00460983  0.0001    ] 0
(119.37766417144383, 0.030062999042780213) [ 0.00460965  0.0001    ] 0
(119.37766417144383, 0.067926137161718539) [ 0.00464787  0.00010001] 0
(119.37766417144383, 0.15347637483362347) [ 0.00460959  0.0001    ] 0
(119.37766417144383, 0.34677369590428986) [ 0.00842437  0.06529905] 0
(119.37766417144383, 0.78352121817758968) [ 0.11279454  0.56849984] 0
(119.37766417144383, 1.770334678164093) [ 0.04750126  0.00112095] 2
(119.37766417144383, 4.0) [ 0.00461446  0.00010008] 2
(345.51072945922181, 0.0001) [ 0.00460985  0.0001    ] 0
(345.51072945922181, 0.00022594597786155119) [ 0.00461696  0.00010017] 0
(345.51072945922181, 0.00051051584911812533) [ 0.00461403  0.0001001 ] 0
(345.51072945922181, 0.0011534900274281497) [  6.12964514e+02   1.00000014e-04] 0
(345.51072945922181, 0.002606264322008005) [  5.53225162e+02   1.00000401e-04] 0
(345.51072945922181, 0.0058887494080177146) [ 0.00462517  0.00010034] 0
(345.51072945922181, 0.013305392433761934) [ 0.00460936  0.0001    ] 2
(345.51072945922181, 0.030062999042780213) [ 0.00461457  0.0001    ] 0
(345.51072945922181, 0.067926137161718539) [ 0.0046093  0.0001   ] 0
(345.51072945922181, 0.15347637483362347) [ 0.00460986  0.0001    ] 0
(345.51072945922181, 0.34677369590428986) [ 0.01006346  0.07619456] 0
(345.51072945922181, 0.78352121817758968) [ 0.09896109  0.51329225] 2
(345.51072945922181, 1.770334678164093) [ 0.40720314  1.3640523 ] 2
(345.51072945922181, 4.0) [ 0.33220095  1.22029137] 2
(1000.0, 0.0001) [  1.00000000e+03   1.00000000e-04] 0
(1000.0, 0.00022594597786155119) [ 0.00462707  0.00010038] 0
(1000.0, 0.00051051584911812533) [ 0.00461721  0.00010017] 0
(1000.0, 0.0011534900274281497) [ 0.00463091  0.00010046] 0
(1000.0, 0.002606264322008005) [ 0.00460933  0.0001    ] 0
(1000.0, 0.0058887494080177146) [  5.36821850e+02   1.00000061e-04] 0
(1000.0, 0.013305392433761934) [  6.76235824e+02   1.00000186e-04] 0
(1000.0, 0.030062999042780213) [ 0.00460882  0.0001    ] 2
(1000.0, 0.067926137161718539) [ 0.0046097  0.0001   ] 0
(1000.0, 0.15347637483362347) [ 0.00460935  0.0001    ] 0
(1000.0, 0.34677369590428986) [ 0.01342882  0.09774743] 0
(1000.0, 0.78352121817758968) [ 0.09970475  0.51631597] 2
(1000.0, 1.770334678164093) [ 0.46758516  0.02348675] 2
(1000.0, 4.0) [1000.    4.] 0

check convergence


In [130]:
# get optimised parameters (if optimisation was successfull)
popt = [p[0] for p in ar_ery if p[4] == 0]
len(popt)


Out[130]:
155

get log likelihood values


In [26]:
# create parallel function with load balancing

@lbview.parallel(block=True)
def get_ll(p):
    """
    p: parameter combination
    
    First, calculates the best-fit model SFS given paramter combination p.
    Then returns the log likelihood of the expected SFS given the observed SFS.
    """
    expected_sfs = func_ex(p, ns, pts_l)
    return dadi.Inference.ll_multinom(expected_sfs, sfs)
    # expected_sfs does not need to be folded, ll_multinom does that automatically
    # make sure that sfs points to right spectrum

In [27]:
cl[0]['sfs']


Out[27]:
Spectrum([-- 7833.03869 7414.699839 4109.279415 3614.717256 3095.973324 2031.460887
 1584.656928 2583.652317 1142.075255 1052.346021 1765.773415 1255.138799
 1072.516527 1417.916128 395.75047 1947.087637 367.072082 966.622924 -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --], folded=True, pop_ids=['ery'])

In [155]:
ar_ery.get()


Out[155]:
(array([ 0.001 ,  0.0001]), 3729.0988442929811, 52, 202, 0)

In [156]:
# get log likelihood values for all parameters returned

ll = get_ll.map([p[0] for p in ar_ery])

In [157]:
len(ll)


Out[157]:
196

In [168]:
# get all starting parameter combinations, corresponding optimised parameters and log likelihood values
# if the optimisation was successful:

rows = [(p0[0], p0[1], xopt[0][0], xopt[0][1], logL) for p0, xopt, logL in izip(product(p0_nu, p0_T), ar_ery, ll) if xopt[4] == 0]

In [169]:
len(rows)


Out[169]:
155

In [137]:
import pandas as pd

In [174]:
df = pd.DataFrame(data=rows, columns=['nu_0', 'T_0', 'nu_opt', 'T_opt', 'logL'])

In [175]:
df.sort_values(by='logL', ascending=False)


Out[175]:
nu_0 T_0 nu_opt T_opt logL
119 41.246264 0.030063 0.004610 0.000100 -2041.093532
28 0.008377 0.000100 0.004610 0.000100 -2041.093532
59 0.070170 0.005889 0.004610 0.000100 -2041.093532
142 345.510729 0.153476 0.004610 0.000100 -2041.093532
102 4.923883 0.346774 0.004609 0.000100 -2041.093532
141 345.510729 0.067926 0.004609 0.000100 -2041.093532
56 0.070170 0.000511 0.004609 0.000100 -2041.093532
69 0.203092 0.013305 0.004610 0.000100 -2041.093532
44 0.024245 0.000511 0.004610 0.000100 -2041.093533
115 41.246264 0.001153 0.004610 0.000100 -2041.093533
61 0.070170 0.030063 0.004610 0.000100 -2041.093533
131 119.377664 0.153476 0.004610 0.000100 -2041.093533
104 14.251027 0.000226 0.004610 0.000100 -2041.093533
110 14.251027 0.067926 0.004609 0.000100 -2041.093534
118 41.246264 0.013305 0.004610 0.000100 -2041.093534
97 4.923883 0.001153 0.004610 0.000100 -2041.093534
99 4.923883 0.030063 0.004609 0.000100 -2041.093534
54 0.070170 0.000100 0.004610 0.000100 -2041.093534
92 1.701254 0.067926 0.004609 0.000100 -2041.093534
112 14.251027 0.346774 0.004610 0.000100 -2041.093534
152 1000.000000 0.153476 0.004609 0.000100 -2041.093534
17 0.002894 0.001153 0.004609 0.000100 -2041.093534
75 0.587802 0.000511 0.004610 0.000100 -2041.093534
58 0.070170 0.002606 0.004610 0.000100 -2041.093534
148 1000.000000 0.002606 0.004609 0.000100 -2041.093534
98 4.923883 0.013305 0.004610 0.000100 -2041.093535
88 1.701254 0.002606 0.004610 0.000100 -2041.093535
128 119.377664 0.013305 0.004610 0.000100 -2041.093535
134 345.510729 0.000100 0.004610 0.000100 -2041.093535
105 14.251027 0.001153 0.004610 0.000100 -2041.093535
... ... ... ... ... ...
143 345.510729 0.346774 0.010063 0.076195 -2450.294044
153 1000.000000 0.346774 0.013429 0.097747 -2450.294044
50 0.024245 0.067926 0.014360 0.103546 -2450.294044
37 0.008377 0.153476 0.016909 0.119117 -2450.294044
51 0.024245 0.153476 0.021212 0.144514 -2450.294044
133 119.377664 0.783521 0.112795 0.568500 -2450.294046
11 0.001000 0.783521 0.001400 0.780333 -2460.844592
62 0.070170 4.000000 0.069338 3.886829 -2460.844592
12 0.001000 1.770335 0.001382 1.773695 -2460.844592
10 0.001000 0.346774 0.001523 0.306687 -2460.844592
52 0.024245 1.770335 0.015246 1.721452 -2460.844592
39 0.008377 0.783521 0.006592 0.793155 -2460.844592
9 0.001000 0.153476 0.001540 0.148176 -2460.844592
25 0.002894 0.783521 0.001613 0.787113 -2460.844592
24 0.002894 0.346774 0.001613 0.357781 -2460.844592
27 0.002894 4.000000 0.001613 3.635819 -2460.844592
23 0.002894 0.153476 0.001613 0.172802 -2460.844592
26 0.002894 1.770335 0.001613 1.673172 -2460.844592
40 0.008377 1.770335 0.006596 1.745436 -2460.844592
53 0.024245 4.000000 0.012645 3.422380 -2460.844592
41 0.008377 4.000000 0.006595 3.995670 -2460.844592
38 0.008377 0.346774 0.006595 0.355489 -2460.844592
13 0.001000 4.000000 0.001000 4.000000 -2460.844592
149 1000.000000 0.005889 536.821850 0.000100 -2466.079549
138 345.510729 0.002606 553.225162 0.000100 -2466.079743
137 345.510729 0.001153 612.964514 0.000100 -2466.080284
150 1000.000000 0.013305 676.235824 0.000100 -2466.080780
144 1000.000000 0.000100 1000.000000 0.000100 -2466.082292
0 0.001000 0.000100 0.001000 0.000100 -3729.098844
154 1000.000000 4.000000 1000.000000 4.000000 -75667.889059

155 rows × 5 columns

Dadi wants to infer a very drastic bottleneck in the very recent past. It hits the lower bound on T that I set above.


In [28]:
%%px

# further reduce lower bound on T (on all engines)

lower_bound = [1e-4, 1e-6]

In [29]:
# check maxiter setting on engines

cl[:]['maxiter']


Out[29]:
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10]

In [30]:
cl[:]['lower_bound']


Out[30]:
[[0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06]]

In [31]:
cl[:]['dadi_opt_func']


Out[31]:
[<function dadi.Inference.optimize_log_fmin>,
 <function dadi.Inference.optimize_log_fmin>,
 <function dadi.Inference.optimize_log_fmin>,
 <function dadi.Inference.optimize_log_fmin>,
 <function dadi.Inference.optimize_log_fmin>,
 <function dadi.Inference.optimize_log_fmin>,
 <function dadi.Inference.optimize_log_fmin>,
 <function dadi.Inference.optimize_log_fmin>,
 <function dadi.Inference.optimize_log_fmin>,
 <function dadi.Inference.optimize_log_fmin>,
 <function dadi.Inference.optimize_log_fmin>]

Still set to use Nelder-Mead algorithm.


In [32]:
cl[0]['sfs']


Out[32]:
Spectrum([-- 7833.03869 7414.699839 4109.279415 3614.717256 3095.973324 2031.460887
 1584.656928 2583.652317 1142.075255 1052.346021 1765.773415 1255.138799
 1072.516527 1417.916128 395.75047 1947.087637 367.072082 966.622924 -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --], folded=True, pop_ids=['ery'])

In [184]:
# rerun optimisation from above with reduced lower bound for T

#ar_ery = lbview.map(run_dadi, product(p0_nu, p0_T), block=False, order=True)

In [185]:
ar_ery.progress


Out[185]:
196

In [192]:
ar_ery.wall_time/60


Out[192]:
62.39776715

It took more than 1 hour to run this on 11 cores!


In [32]:
import dill

In [207]:
# save optimsiation results to file
with open("1D_two_epoch_opt_res_ERY.dill", "w") as fh:
    dill.dump(list(ar_ery.get()), fh) # unfortunately, dill cannot save the ar_ery object

In [189]:
get_flag_count(ar_ery)


success 145
Maximum number of function evaluations made. 0
Maximum number of iterations reached. 51
unknown flag 0

The optimisations starting from 51 initial parameter combinations did not converge before the maximum number of iterations (100) was reached.


In [ ]:
# get log likelihood values for all parameters returned
ll = get_ll.map([p[0] for p in ar_ery])

In [ ]:
# create rows for dataframe
rows = [(p0[0], p0[1], xopt[0][0], xopt[0][1], logL) for p0, xopt, logL in izip(product(p0_nu, p0_T), ar_ery, ll) if xopt[4] == 0]

In [ ]:
# create dataframe
df = pd.DataFrame(data=rows, columns=['nu_0', 'T_0', 'nu_opt', 'T_opt', 'logL'])

In [198]:
# look and see
df.sort_values(by='logL', ascending=False).head(40)


Out[198]:
nu_0 T_0 nu_opt T_opt logL
81 0.587802 0.153476 0.0001 0.000002 -2040.492673
86 1.701254 0.013305 0.0001 0.000002 -2040.492673
102 14.251027 0.000226 0.0001 0.000002 -2040.492673
95 4.923883 0.005889 0.0001 0.000002 -2040.492673
123 119.377664 0.005889 0.0001 0.000002 -2040.492673
58 0.070170 0.005889 0.0001 0.000002 -2040.492673
16 0.002894 0.000511 0.0001 0.000002 -2040.492673
76 0.587802 0.002606 0.0001 0.000002 -2040.492673
66 0.203092 0.002606 0.0001 0.000002 -2040.492673
132 345.510729 0.030063 0.0001 0.000002 -2040.492673
91 4.923883 0.000226 0.0001 0.000002 -2040.492673
18 0.002894 0.002606 0.0001 0.000002 -2040.492673
28 0.008377 0.000100 0.0001 0.000002 -2040.492673
113 41.246264 0.000511 0.0001 0.000002 -2040.492673
82 0.587802 0.346774 0.0001 0.000002 -2040.492673
57 0.070170 0.002606 0.0001 0.000002 -2040.492673
69 0.203092 0.030063 0.0001 0.000002 -2040.492673
94 4.923883 0.002606 0.0001 0.000002 -2040.492673
104 14.251027 0.001153 0.0001 0.000002 -2040.492673
78 0.587802 0.013305 0.0001 0.000002 -2040.492673
67 0.203092 0.005889 0.0001 0.000002 -2040.492673
87 1.701254 0.030063 0.0001 0.000002 -2040.492673
97 4.923883 0.030063 0.0001 0.000002 -2040.492673
115 41.246264 0.002606 0.0001 0.000002 -2040.492673
92 4.923883 0.000511 0.0001 0.000002 -2040.492673
108 14.251027 0.030063 0.0001 0.000002 -2040.492673
84 1.701254 0.000226 0.0001 0.000002 -2040.492674
111 14.251027 0.346774 0.0001 0.000002 -2040.492674
45 0.024245 0.002606 0.0001 0.000002 -2040.492674
125 119.377664 0.030063 0.0001 0.000002 -2040.492674
109 14.251027 0.067926 0.0001 0.000002 -2040.492674
71 0.203092 0.153476 0.0001 0.000002 -2040.492674
44 0.024245 0.001153 0.0001 0.000002 -2040.492674
142 1000.000000 0.067926 0.0001 0.000002 -2040.492674
79 0.587802 0.030063 0.0001 0.000002 -2040.492674
77 0.587802 0.005889 0.0001 0.000002 -2040.492674
103 14.251027 0.000511 0.0001 0.000002 -2040.492674
100 4.923883 0.346774 0.0001 0.000002 -2040.492674
59 0.070170 0.013305 0.0001 0.000002 -2040.492674
114 41.246264 0.001153 0.0001 0.000002 -2040.492674

In [199]:
cl[0]['lower_bound']


Out[199]:
[0.0001, 1e-06]

The optimisations with the highest likelihood are hitting the lower bound of parameter values that I've set. The time parameter $T$ that dadi infers with the highest likelihood is too low to make sense. It needs to be multiplied by $2N_{ref}$ to get the time in generations. With any reasonable effective ancient population size, this implies a very recent event. The event is inferred to be a very strong bottleneck. I don't think that these values can possibly make any sense.

residiual plots


In [222]:
p_opt = np.array( df.iloc[0, 2:4] )
p_opt


Out[222]:
array([  1.00010340e-04,   2.16061780e-06])

In [223]:
# calculate the best-fit model SFS given the paramter combination with the highest likelihood
expected_sfs = func_ex(p_opt, ns, pts_l)
expected_sfs.fold()


Out[223]:
Spectrum([-- 0.7545818109466478 0.4887066151704145 0.3548213941036278
 0.2788035583144265 0.23136799503938965 0.1995615141413847
 0.17706123031893858 0.1605073185422468 0.14798161545738503
 0.13832185896991653 0.13078894635825167 0.12489466723234886
 0.12030736601629187 0.11679777589049832 0.11420672212630598
 0.1124253640623551 0.11138301857255054 0.055519929554490866 -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- --], folded=True, pop_ids=None)

In [226]:
import matplotlib.pyplot as plt

%matplotlib inline

plt.rcParams['figure.figsize'] = [12.0, 10.0]
print(p_opt)
dadi.Plotting.plot_1d_comp_multinom(expected_sfs.fold()[:19], fs_ery[:19], residual='linear')


[  1.00010340e-04   2.16061780e-06]

Does this model provide a better fit than a standard neutral model?


In [227]:
# create link to function that specifies a standard neutral model
snm = dadi.Demographics1D.snm
# make extrapolating function
snm_ex = dadi.Numerics.make_extrap_log_func(snm)

In [228]:
?snm

In [238]:
# calculate the best-fit model SFS given paramter combination
expected_sfs = snm_ex(0, ns, pts_l)
# calculate the log likelihood of the expected SFS given the observed SFS
# no need to fold the model, is done automatically if data is folded
dadi.Inference.ll_multinom(expected_sfs, fs_ery)


Out[238]:
-2460.8445858070236

In [239]:
# calculate the best-fit model SFS given paramter combination
expected_sfs = func_ex(p_opt, ns, pts_l)
# calculate the log likelihood of the expected SFS given the observed SFS
# no need to fold the model, is done automatically if data is folded
dadi.Inference.ll_multinom(expected_sfs, fs_ery)


Out[239]:
-2040.4926752518243

Compared to a standard neutral model, the two epoch model with the best fitting parameter values does provide a much better fit to the data spectrum. However, it also requires two (instead of zero) parameters and the inferred parameter values cannot be trusted since they are hitting the boundary of sensible values.


In [193]:
%%px

dadi_opt_func = dadi.Inference.optimize_log # BFGS

In [195]:
ar_ery_BFGS = lbview.map(run_dadi, product(p0_nu, p0_T), block=False, order=True)

In [200]:
ar_ery_BFGS.progress


Out[200]:
196

In [203]:
get_flag_count(ar_ery_BFGS, NM=False)


success 27
Maximum number of iterations exceeded. 0
Gradient and/or function calls not changing. 169
unknown flag 0

Clearly, the BFGS algorithm cannot be used for this problem. It is not limited by the maximum number of iterations, but by the lack of a gradient in the likelihood function.

PAR


In [209]:
%%px

# set up global variables on engines requires for run_dadi function call

dadi_opt_func = dadi.Inference.optimize_log # BFGS
sfs = fs_par # use spectrum from PAR in optimisation
perturb = False
fold = 1
maxiter = 10 # keep it low for now
verbose = 0
full_output = True

In [208]:
# create range of starting values for nu and T, evenly distributed in log space
p0_nu = np.logspace(-3, 3, base=10.0, num=10)
p0_T = np.logspace(-4, np.log10(4), base=10, num=10)

In [210]:
ar_par_BFGS = lbview.map(run_dadi, product(p0_nu, p0_T), block=False, order=True)

In [213]:
ar_par_BFGS.progress


Out[213]:
100

In [214]:
get_flag_count(ar_par_BFGS, NM=False)


success 14
Maximum number of iterations exceeded. 19
Gradient and/or function calls not changing. 67
unknown flag 0

Again, the BFGS algorithm cannot be used. The majority of initial parameter combinations don't converge because of a lack of gradient in the likelihood surface.


In [215]:
%%px

dadi_opt_func = dadi.Inference.optimize_log_fmin # Nelder-Mead
maxiter = 100 # provide plenty of iterations

In [216]:
ar_par_NM = lbview.map(run_dadi, product(p0_nu, p0_T), block=False, order=True)

In [240]:
ar_par_NM.progress


Out[240]:
100

In [241]:
ar_par_NM.elapsed/60


Out[241]:
40.8876008

This took 40 minutes to complete. Most of the time running on one parameter combination.


In [242]:
get_flag_count(ar_par_NM, NM=True)


success 98
Maximum number of function evaluations made. 0
Maximum number of iterations reached. 2
unknown flag 0

In [244]:
with open("1D_two_epoch_opt_res_PAR.dill", "w") as fh:
    dill.dump(list(ar_par_NM.get()), fh)

Which initial parameter combination did not converge, i. e. for which did the maximum number of iterations (100) not suffice? These are also likely to be the ones that have run so long.


In [248]:
for p0, xopt in izip(product(p0_nu, p0_T), ar_par_NM):
    if xopt[4] == 2:
        print p0


(2.154434690031882, 0.0034199518933533931)
(1000.0, 0.011100946155696227)

In [ ]:
# get log likelihood values for all parameters returned
ll = get_ll.map([p[0] for p in ar_par_NM])

In [249]:
# create rows for dataframe
rows = [(p0[0], p0[1], xopt[0][0], xopt[0][1], logL) for p0, xopt, logL in izip(product(p0_nu, p0_T), ar_par_NM, ll) if xopt[4] == 0]

In [251]:
df = pd.DataFrame(data=rows, columns=['nu_0', 'T_0', 'nu_opt', 'T_opt', 'logL'])

In [252]:
df.sort_values(by='logL', ascending=False)


Out[252]:
nu_0 T_0 nu_opt T_opt logL
78 46.415888 4.000000 5.541314 4.000000 -8435.706819
68 10.000000 4.000000 5.541188 3.999998 -8435.706834
55 2.154435 0.116961 5.541445 3.999994 -8435.706838
66 10.000000 0.379647 5.542152 3.999999 -8435.706844
88 215.443469 4.000000 5.541140 3.999994 -8435.706852
86 215.443469 0.379647 5.541279 3.999991 -8435.706857
57 2.154435 1.232310 5.541516 3.999990 -8435.706858
87 215.443469 1.232310 5.541448 3.999988 -8435.706864
77 46.415888 1.232310 5.540994 3.999992 -8435.706878
58 2.154435 4.000000 5.541669 3.999985 -8435.706880
47 0.464159 0.379647 5.540767 3.999994 -8435.706894
95 1000.000000 0.379647 5.542070 3.999982 -8435.706914
96 1000.000000 1.232310 5.541333 3.999977 -8435.706916
56 2.154435 0.379647 5.542569 3.999991 -8435.706938
76 46.415888 0.379647 5.540035 3.999998 -8435.707033
67 10.000000 1.232310 5.554024 3.999993 -8435.720513
34 0.100000 0.011101 0.000142 0.000001 -9702.663060
23 0.021544 0.003420 0.000142 0.000001 -9702.663060
75 46.415888 0.116961 0.000142 0.000001 -9702.663060
60 10.000000 0.000325 0.000142 0.000001 -9702.663060
46 0.464159 0.116961 0.000142 0.000001 -9702.663060
72 46.415888 0.003420 0.000142 0.000001 -9702.663060
42 0.464159 0.001054 0.000142 0.000001 -9702.663060
93 1000.000000 0.036033 0.000142 0.000001 -9702.663060
10 0.004642 0.000100 0.000142 0.000001 -9702.663060
91 1000.000000 0.001054 0.000142 0.000001 -9702.663060
35 0.100000 0.036033 0.000142 0.000001 -9702.663060
2 0.001000 0.001054 0.000142 0.000001 -9702.663060
51 2.154435 0.000325 0.000142 0.000001 -9702.663060
79 215.443469 0.000100 0.000142 0.000001 -9702.663060
... ... ... ... ... ...
32 0.100000 0.001054 0.000117 0.000001 -9703.796796
30 0.100000 0.000100 0.000117 0.000001 -9703.796810
49 0.464159 4.000000 0.032468 3.094647 -9825.179957
25 0.021544 0.036033 0.000392 1.338353 -9825.179957
14 0.004642 0.011101 0.000291 0.164734 -9825.179957
3 0.001000 0.003420 0.000125 0.017114 -9825.179957
4 0.001000 0.011101 0.000312 0.020193 -9825.179957
48 0.464159 1.232310 0.008205 1.510185 -9825.179957
36 0.100000 0.116961 0.008146 1.929358 -9825.179957
37 0.100000 0.379647 0.008146 1.345287 -9825.179957
9 0.001000 4.000000 0.000894 3.984855 -9825.179957
7 0.001000 0.379647 0.001044 0.371688 -9825.179957
6 0.001000 0.116961 0.001044 0.108644 -9825.179957
26 0.021544 0.116961 0.004735 0.272936 -9825.179957
8 0.001000 1.232310 0.001044 1.243624 -9825.179957
27 0.021544 0.379647 0.006268 0.551400 -9825.179957
18 0.004642 1.232310 0.004414 1.235935 -9825.179957
17 0.004642 0.379647 0.004414 0.374511 -9825.179957
39 0.100000 4.000000 0.083837 3.813855 -9825.179957
29 0.021544 4.000000 0.012079 3.971402 -9825.179957
28 0.021544 1.232310 0.012079 1.201627 -9825.179957
38 0.100000 1.232310 0.030650 1.262050 -9825.179957
19 0.004642 4.000000 0.004411 3.992426 -9825.179957
15 0.004642 0.036033 0.000822 0.096582 -9825.179957
5 0.001000 0.036033 0.000479 0.047682 -9825.179957
16 0.004642 0.116961 0.002126 0.171127 -9825.179957
82 215.443469 0.003420 300.605096 0.000001 -9825.222496
92 1000.000000 0.003420 337.016302 0.000001 -9825.222512
81 215.443469 0.001054 637.204074 0.000001 -9825.222571
97 1000.000000 4.000000 1000.000000 4.000000 -70780.206638

98 rows × 5 columns

The optimal parameter combination returned is hitting the upper parameter bound that I set.


In [253]:
cl[0]['upper_bound']


Out[253]:
[1000.0, 4]

In [255]:
%%px

upper_bound = [1e3, 5]

In [256]:
ar_par_NM = lbview.map(run_dadi, product(p0_nu, p0_T), block=False, order=True)

In [276]:
ar_par_NM.progress


Out[276]:
100

In [277]:
get_flag_count(ar_par_NM, NM=True)


success 98
Maximum number of function evaluations made. 0
Maximum number of iterations reached. 2
unknown flag 0

In [278]:
for r in ar_par_NM:
    if r[4] == 2:
        print r[0]


[  1.42092098e-04   1.00001372e-06]
[  1.42128126e-04   1.00003804e-06]

In [305]:
x  = [(p0, xopt[0], m['completed'] - m['started']) for p0, xopt, m in izip(product(p0_nu, p0_T), ar_par_NM, ar_par_NM.metadata)]
y = sorted(x, key=lambda x: x[2])
for p in y:
    print "nu_0 = %.6f, T_0 = %.6f \t nu = %.6f, T = %.6f" % (p[0][0], p[0][1], p[1][0], p[1][1]), "\t", p[2]


nu_0 = 1000.000000, T_0 = 0.003420 	 nu = 337.016302, T = 0.000001 	0:00:01.253315
nu_0 = 215.443469, T_0 = 0.003420 	 nu = 300.605096, T = 0.000001 	0:00:01.337191
nu_0 = 215.443469, T_0 = 0.001054 	 nu = 637.204074, T = 0.000001 	0:00:01.406946
nu_0 = 10.000000, T_0 = 4.000000 	 nu = 6.607394, T = 4.999988 	0:00:01.608731
nu_0 = 0.100000, T_0 = 0.001054 	 nu = 0.000117, T = 0.000001 	0:00:01.870468
nu_0 = 215.443469, T_0 = 0.011101 	 nu = 0.000142, T = 0.000001 	0:00:01.964483
nu_0 = 1000.000000, T_0 = 0.000100 	 nu = 0.000142, T = 0.000001 	0:00:01.998725
nu_0 = 0.021544, T_0 = 0.000325 	 nu = 0.000142, T = 0.000001 	0:00:02.077759
nu_0 = 1000.000000, T_0 = 0.036033 	 nu = 0.000142, T = 0.000001 	0:00:02.078339
nu_0 = 10.000000, T_0 = 0.011101 	 nu = 0.000142, T = 0.000001 	0:00:02.104663
nu_0 = 46.415888, T_0 = 0.011101 	 nu = 0.000142, T = 0.000001 	0:00:02.125874
nu_0 = 0.021544, T_0 = 0.001054 	 nu = 0.000142, T = 0.000001 	0:00:02.170370
nu_0 = 10.000000, T_0 = 0.036033 	 nu = 0.000142, T = 0.000001 	0:00:02.193898
nu_0 = 0.100000, T_0 = 0.000325 	 nu = 0.000117, T = 0.000001 	0:00:02.199616
nu_0 = 0.021544, T_0 = 0.000100 	 nu = 0.000142, T = 0.000001 	0:00:02.255127
nu_0 = 1000.000000, T_0 = 0.000325 	 nu = 0.000142, T = 0.000001 	0:00:02.261108
nu_0 = 215.443469, T_0 = 0.000100 	 nu = 0.000142, T = 0.000001 	0:00:02.301531
nu_0 = 215.443469, T_0 = 0.000325 	 nu = 0.000320, T = 0.000002 	0:00:02.317016
nu_0 = 0.004642, T_0 = 0.000325 	 nu = 0.000142, T = 0.000001 	0:00:02.335408
nu_0 = 46.415888, T_0 = 0.036033 	 nu = 0.000142, T = 0.000001 	0:00:02.339114
nu_0 = 0.100000, T_0 = 0.011101 	 nu = 0.000142, T = 0.000001 	0:00:02.368938
nu_0 = 0.021544, T_0 = 0.011101 	 nu = 0.000142, T = 0.000001 	0:00:02.370462
nu_0 = 0.004642, T_0 = 0.000100 	 nu = 0.000142, T = 0.000001 	0:00:02.398309
nu_0 = 46.415888, T_0 = 4.000000 	 nu = 6.611146, T = 4.999997 	0:00:02.409867
nu_0 = 0.100000, T_0 = 0.000100 	 nu = 0.000117, T = 0.000001 	0:00:02.412759
nu_0 = 0.004642, T_0 = 0.001054 	 nu = 0.000142, T = 0.000001 	0:00:02.421509
nu_0 = 0.464159, T_0 = 0.001054 	 nu = 0.000142, T = 0.000001 	0:00:02.445728
nu_0 = 0.004642, T_0 = 0.003420 	 nu = 0.000117, T = 0.000001 	0:00:02.470861
nu_0 = 0.464159, T_0 = 0.000100 	 nu = 0.000142, T = 0.000001 	0:00:02.478097
nu_0 = 46.415888, T_0 = 0.000325 	 nu = 0.000142, T = 0.000001 	0:00:02.503926
nu_0 = 10.000000, T_0 = 0.001054 	 nu = 0.000142, T = 0.000001 	0:00:02.528135
nu_0 = 10.000000, T_0 = 0.116961 	 nu = 0.000142, T = 0.000001 	0:00:02.530787
nu_0 = 215.443469, T_0 = 0.116961 	 nu = 0.000142, T = 0.000001 	0:00:02.538487
nu_0 = 0.021544, T_0 = 0.003420 	 nu = 0.000142, T = 0.000001 	0:00:02.577833
nu_0 = 2.154435, T_0 = 4.000000 	 nu = 6.608428, T = 4.999986 	0:00:02.603281
nu_0 = 215.443469, T_0 = 4.000000 	 nu = 6.607767, T = 4.999980 	0:00:02.609589
nu_0 = 0.100000, T_0 = 0.003420 	 nu = 0.000117, T = 0.000001 	0:00:02.631288
nu_0 = 0.001000, T_0 = 0.000100 	 nu = 0.000142, T = 0.000001 	0:00:02.646596
nu_0 = 2.154435, T_0 = 0.379647 	 nu = 6.608082, T = 4.999998 	0:00:02.660362
nu_0 = 46.415888, T_0 = 0.003420 	 nu = 0.000142, T = 0.000001 	0:00:02.672557
nu_0 = 0.001000, T_0 = 0.000325 	 nu = 0.000142, T = 0.000001 	0:00:02.673205
nu_0 = 10.000000, T_0 = 0.000100 	 nu = 0.000142, T = 0.000001 	0:00:02.675552
nu_0 = 2.154435, T_0 = 0.116961 	 nu = 6.607484, T = 4.999997 	0:00:02.679635
nu_0 = 2.154435, T_0 = 0.036033 	 nu = 0.000142, T = 0.000001 	0:00:02.691305
nu_0 = 1000.000000, T_0 = 0.001054 	 nu = 0.000142, T = 0.000001 	0:00:02.699920
nu_0 = 0.464159, T_0 = 0.036033 	 nu = 0.000142, T = 0.000001 	0:00:02.728592
nu_0 = 0.464159, T_0 = 0.011101 	 nu = 0.000142, T = 0.000001 	0:00:02.741115
nu_0 = 46.415888, T_0 = 0.000100 	 nu = 0.000142, T = 0.000001 	0:00:02.744636
nu_0 = 0.464159, T_0 = 0.003420 	 nu = 0.000142, T = 0.000001 	0:00:02.770753
nu_0 = 10.000000, T_0 = 0.003420 	 nu = 0.000142, T = 0.000001 	0:00:02.775887
nu_0 = 2.154435, T_0 = 0.011101 	 nu = 0.000142, T = 0.000001 	0:00:02.790348
nu_0 = 46.415888, T_0 = 0.116961 	 nu = 0.000142, T = 0.000001 	0:00:02.806571
nu_0 = 1000.000000, T_0 = 0.116961 	 nu = 0.000142, T = 0.000001 	0:00:02.849591
nu_0 = 10.000000, T_0 = 0.000325 	 nu = 0.000142, T = 0.000001 	0:00:02.853166
nu_0 = 215.443469, T_0 = 0.036033 	 nu = 0.000117, T = 0.000001 	0:00:02.855650
nu_0 = 0.100000, T_0 = 0.036033 	 nu = 0.000142, T = 0.000001 	0:00:02.916466
nu_0 = 10.000000, T_0 = 0.379647 	 nu = 6.608280, T = 4.999981 	0:00:02.923626
nu_0 = 0.464159, T_0 = 0.116961 	 nu = 0.000142, T = 0.000001 	0:00:02.994529
nu_0 = 0.464159, T_0 = 0.379647 	 nu = 6.607515, T = 4.999997 	0:00:03.010559
nu_0 = 46.415888, T_0 = 0.379647 	 nu = 6.607876, T = 4.999993 	0:00:03.029864
nu_0 = 2.154435, T_0 = 0.000325 	 nu = 0.000142, T = 0.000001 	0:00:03.032885
nu_0 = 46.415888, T_0 = 0.001054 	 nu = 0.000142, T = 0.000001 	0:00:03.038171
nu_0 = 2.154435, T_0 = 0.000100 	 nu = 0.000142, T = 0.000001 	0:00:03.049132
nu_0 = 1000.000000, T_0 = 0.011101 	 nu = 0.000142, T = 0.000001 	0:00:03.195081
nu_0 = 2.154435, T_0 = 0.001054 	 nu = 0.000142, T = 0.000001 	0:00:03.222151
nu_0 = 1000.000000, T_0 = 4.000000 	 nu = 6.609853, T = 4.999999 	0:00:03.223168
nu_0 = 215.443469, T_0 = 0.379647 	 nu = 6.608089, T = 4.999993 	0:00:03.252147
nu_0 = 0.001000, T_0 = 0.001054 	 nu = 0.000142, T = 0.000001 	0:00:03.347438
nu_0 = 2.154435, T_0 = 1.232310 	 nu = 6.606892, T = 4.999969 	0:00:03.349424
nu_0 = 0.464159, T_0 = 0.000325 	 nu = 0.000142, T = 0.000001 	0:00:03.746889
nu_0 = 10.000000, T_0 = 1.232310 	 nu = 6.608133, T = 4.999993 	0:00:03.775315
nu_0 = 46.415888, T_0 = 1.232310 	 nu = 6.607869, T = 4.999995 	0:00:04.010776
nu_0 = 1000.000000, T_0 = 0.379647 	 nu = 6.607693, T = 4.999995 	0:00:04.207155
nu_0 = 215.443469, T_0 = 1.232310 	 nu = 6.607281, T = 4.999999 	0:00:04.399794
nu_0 = 2.154435, T_0 = 0.003420 	 nu = 0.000142, T = 0.000001 	0:00:04.590224
nu_0 = 1000.000000, T_0 = 1.232310 	 nu = 6.608342, T = 4.999996 	0:00:04.906379
nu_0 = 0.100000, T_0 = 4.000000 	 nu = 0.113828, T = 4.305707 	0:00:18.968231
nu_0 = 0.100000, T_0 = 1.232310 	 nu = 0.030650, T = 1.262050 	0:00:24.213188
nu_0 = 0.464159, T_0 = 4.000000 	 nu = 0.092965, T = 3.877113 	0:00:36.225329
nu_0 = 0.001000, T_0 = 0.011101 	 nu = 0.000312, T = 0.020193 	0:00:39.882887
nu_0 = 0.004642, T_0 = 0.379647 	 nu = 0.004414, T = 0.374511 	0:00:46.091178
nu_0 = 0.021544, T_0 = 0.116961 	 nu = 0.004735, T = 0.272936 	0:00:46.196308
nu_0 = 0.021544, T_0 = 1.232310 	 nu = 0.012079, T = 1.201627 	0:00:47.771483
nu_0 = 0.004642, T_0 = 0.116961 	 nu = 0.002126, T = 0.171127 	0:00:51.641603
nu_0 = 0.021544, T_0 = 0.379647 	 nu = 0.006268, T = 0.551400 	0:00:51.952621
nu_0 = 0.001000, T_0 = 0.116961 	 nu = 0.001044, T = 0.108644 	0:01:03.256574
nu_0 = 0.001000, T_0 = 0.036033 	 nu = 0.000479, T = 0.047682 	0:01:08.184646
nu_0 = 0.004642, T_0 = 0.036033 	 nu = 0.000822, T = 0.096582 	0:01:11.755987
nu_0 = 0.001000, T_0 = 0.003420 	 nu = 0.000125, T = 0.017114 	0:01:28.135864
nu_0 = 0.464159, T_0 = 1.232310 	 nu = 0.008205, T = 1.510185 	0:01:47.457700
nu_0 = 0.100000, T_0 = 0.379647 	 nu = 0.008146, T = 1.345287 	0:01:59.204310
nu_0 = 0.100000, T_0 = 0.116961 	 nu = 0.008146, T = 1.929358 	0:02:23.976614
nu_0 = 0.004642, T_0 = 1.232310 	 nu = 0.004414, T = 1.235935 	0:02:27.344255
nu_0 = 0.021544, T_0 = 4.000000 	 nu = 0.012079, T = 4.224868 	0:03:40.539961
nu_0 = 0.001000, T_0 = 0.379647 	 nu = 0.001044, T = 0.371688 	0:03:48.886615
nu_0 = 0.004642, T_0 = 0.011101 	 nu = 0.000291, T = 0.164734 	0:05:45.092611
nu_0 = 0.004642, T_0 = 4.000000 	 nu = 0.004413, T = 4.222831 	0:07:40.583773
nu_0 = 0.001000, T_0 = 1.232310 	 nu = 0.001044, T = 1.243624 	0:11:03.838684
nu_0 = 0.001000, T_0 = 4.000000 	 nu = 0.001044, T = 4.195350 	0:36:37.453178
nu_0 = 0.021544, T_0 = 0.036033 	 nu = 0.000392, T = 1.338353 	0:40:45.051349

This has printed out the initial parameter combinations, the final parameter values and the processing for each combination. There were two jobs that have taken a very long time to complete. Note, that these are not the ones which have exceeded maxiter (see above). Neither of the two long lasting jobs is hitting the boundaries of the parameter space that can be searched.


In [294]:
cl[:]['upper_bound']


Out[294]:
[[1000.0, 5],
 [1000.0, 5],
 [1000.0, 5],
 [1000.0, 5],
 [1000.0, 5],
 [1000.0, 5],
 [1000.0, 5],
 [1000.0, 5],
 [1000.0, 5],
 [1000.0, 5],
 [1000.0, 5]]

In [295]:
cl[:]['lower_bound']


Out[295]:
[[0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06],
 [0.0001, 1e-06]]

In [296]:
# get log likelihood values for all parameters returned
ll = get_ll.map([p[0] for p in ar_par_NM])

In [297]:
# create rows for dataframe, without the parameter values on non-convergent runs
rows = [(p0[0], p0[1], xopt[0][0], xopt[0][1], logL) for p0, xopt, logL in izip(product(p0_nu, p0_T), ar_par_NM, ll) if xopt[4] == 0]

In [298]:
df = pd.DataFrame(data=rows, columns=['nu_0', 'T_0', 'nu_opt', 'T_opt', 'logL'])

In [299]:
df.sort_values(by='logL', ascending=False)


Out[299]:
nu_0 T_0 nu_opt T_opt logL
47 0.464159 0.379647 6.607515 4.999997 -8432.350812
55 2.154435 0.116961 6.607484 4.999997 -8432.350814
87 215.443469 1.232310 6.607281 4.999999 -8432.350814
95 1000.000000 0.379647 6.607693 4.999995 -8432.350817
77 46.415888 1.232310 6.607869 4.999995 -8432.350821
56 2.154435 0.379647 6.608082 4.999998 -8432.350821
76 46.415888 0.379647 6.607876 4.999993 -8432.350826
86 215.443469 0.379647 6.608089 4.999993 -8432.350835
67 10.000000 1.232310 6.608133 4.999993 -8432.350836
68 10.000000 4.000000 6.607394 4.999988 -8432.350839
96 1000.000000 1.232310 6.608342 4.999996 -8432.350844
88 215.443469 4.000000 6.607767 4.999980 -8432.350857
66 10.000000 0.379647 6.608280 4.999981 -8432.350877
58 2.154435 4.000000 6.608428 4.999986 -8432.350879
57 2.154435 1.232310 6.606892 4.999969 -8432.350915
97 1000.000000 4.000000 6.609853 4.999999 -8432.351105
78 46.415888 4.000000 6.611146 4.999997 -8432.351568
34 0.100000 0.011101 0.000142 0.000001 -9702.663060
23 0.021544 0.003420 0.000142 0.000001 -9702.663060
75 46.415888 0.116961 0.000142 0.000001 -9702.663060
60 10.000000 0.000325 0.000142 0.000001 -9702.663060
46 0.464159 0.116961 0.000142 0.000001 -9702.663060
72 46.415888 0.003420 0.000142 0.000001 -9702.663060
42 0.464159 0.001054 0.000142 0.000001 -9702.663060
93 1000.000000 0.036033 0.000142 0.000001 -9702.663060
10 0.004642 0.000100 0.000142 0.000001 -9702.663060
91 1000.000000 0.001054 0.000142 0.000001 -9702.663060
35 0.100000 0.036033 0.000142 0.000001 -9702.663060
2 0.001000 0.001054 0.000142 0.000001 -9702.663060
51 2.154435 0.000325 0.000142 0.000001 -9702.663060
... ... ... ... ... ...
13 0.004642 0.003420 0.000117 0.000001 -9703.796795
32 0.100000 0.001054 0.000117 0.000001 -9703.796796
30 0.100000 0.000100 0.000117 0.000001 -9703.796810
19 0.004642 4.000000 0.004413 4.222831 -9825.179957
25 0.021544 0.036033 0.000392 1.338353 -9825.179957
14 0.004642 0.011101 0.000291 0.164734 -9825.179957
3 0.001000 0.003420 0.000125 0.017114 -9825.179957
4 0.001000 0.011101 0.000312 0.020193 -9825.179957
48 0.464159 1.232310 0.008205 1.510185 -9825.179957
36 0.100000 0.116961 0.008146 1.929358 -9825.179957
37 0.100000 0.379647 0.008146 1.345287 -9825.179957
9 0.001000 4.000000 0.001044 4.195350 -9825.179957
7 0.001000 0.379647 0.001044 0.371688 -9825.179957
6 0.001000 0.116961 0.001044 0.108644 -9825.179957
26 0.021544 0.116961 0.004735 0.272936 -9825.179957
49 0.464159 4.000000 0.092965 3.877113 -9825.179957
8 0.001000 1.232310 0.001044 1.243624 -9825.179957
27 0.021544 0.379647 0.006268 0.551400 -9825.179957
18 0.004642 1.232310 0.004414 1.235935 -9825.179957
17 0.004642 0.379647 0.004414 0.374511 -9825.179957
28 0.021544 1.232310 0.012079 1.201627 -9825.179957
29 0.021544 4.000000 0.012079 4.224868 -9825.179957
38 0.100000 1.232310 0.030650 1.262050 -9825.179957
15 0.004642 0.036033 0.000822 0.096582 -9825.179957
5 0.001000 0.036033 0.000479 0.047682 -9825.179957
16 0.004642 0.116961 0.002126 0.171127 -9825.179957
39 0.100000 4.000000 0.113828 4.305707 -9825.179957
82 215.443469 0.003420 300.605096 0.000001 -9825.222496
92 1000.000000 0.003420 337.016302 0.000001 -9825.222512
81 215.443469 0.001054 637.204074 0.000001 -9825.222571

98 rows × 5 columns

What is the maximal reasonable value of $T$?

The most likely parameter combinations are hitting the upper parameter bound for the time parameter again. I am not sure whether an even higher $T$ can reasonably be assumed.

The expected time to the most recent common ancestor (MRCA) in a neutral genealogy is:

$$ E\big[T_{MRCA}\big] = 2 \Big(1-\frac{1}{n}\Big) $$

measured in $2N_e$ generations. Note, that $T_{MRCA}$ is close to its large sample size limit of 2 already for moderate sample sizes.


In [37]:
# the above formula for the sample sizes here yields:
2 * (1-1.0/ns[0])


Out[37]:
1.9444444444444444

See figure 3.4, p. 79, in Wakeley2009 for the distribution of the $T_{MRCA}$. For $n=36$ it has a mode close 1.2 and an expected value of 1.94. Values for $T_{MRCA}$ greater than 4 are very unlikely given a standard coalescent model, but may be more likely under models including population expansion or gene flow from another population.


In [45]:
%%px

upper_bound = [1e3, 9]

In [320]:
cl[0]['upper_bound']


Out[320]:
[1000.0, 9]

In [321]:
cl[0]['lower_bound']


Out[321]:
[0.0001, 1e-06]

In [310]:
ar_par_NM = lbview.map(run_dadi, product(p0_nu, p0_T), block=False, order=True)

In [311]:
ar_par_NM.progress


Out[311]:
98

In [312]:
get_flag_count(ar_par_NM, NM=True)


success 98
Maximum number of function evaluations made. 0
Maximum number of iterations reached. 2
unknown flag 0

Removing starting parameter combinations that lead to extremely long running times


In [322]:
for p0, r in izip(product(p0_nu, p0_T), ar_par_NM):
    if r[4] == 2: # if run hit maxiter limit
        print p0, r[0]


(2.154434690031882, 0.0034199518933533931) [  1.42092098e-04   1.00001372e-06]
(1000.0, 0.011100946155696227) [  1.42128126e-04   1.00003804e-06]

In [326]:
x  = [(p0, xopt[0], m['completed'] - m['started']) for p0, xopt, m in izip(product(p0_nu, p0_T), ar_par_NM, ar_par_NM.metadata)]
y = sorted(x, key=lambda x: x[2], reverse=True)

for i, p in enumerate(y):
    if i > 1: break
    print "nu_0 = %.6f, T_0 = %.6f \t nu = %.6f, T = %.6f" % (p[0][0], p[0][1], p[1][0], p[1][1]), "\t", p[2]


nu_0 = 0.021544, T_0 = 0.036033 	 nu = 0.000392, T = 1.338353 	0:40:26.477521
nu_0 = 0.001000, T_0 = 4.000000 	 nu = 0.001044, T = 4.195350 	0:36:49.002706

I want to get these starting values out, since they take too long to complete.


In [335]:
excl = [ex[0] for ex in y[:2]]
excl[0] == excl[0]


Out[335]:
True

In [342]:
combs = [comb for comb in product(p0_nu, p0_T) if not (comb == excl[0] or comb == excl[1])]
print excl[1] in combs
print excl[0] in combs
print combs[22] in combs


False
False
True

Ok, I think the new parameter array does not include the sets that lead to such extreme running times.


In [343]:
ar_par_NM = lbview.map(run_dadi, combs, block=False, order=True)

In [347]:
ar_par_NM.progress


Out[347]:
98

In [349]:
ar_par_NM.elapsed/60


Out[349]:
11.455080816666667

11 instead of 40 minutes.


In [348]:
get_flag_count(ar_par_NM, NM=True)


success 96
Maximum number of function evaluations made. 0
Maximum number of iterations reached. 2
unknown flag 0

In [350]:
# get log likelihood values for all parameters returned
ll = get_ll.map([p[0] for p in ar_par_NM])

In [351]:
# create rows for dataframe, without the parameter values on non-convergent runs
rows = [(p0[0], p0[1], xopt[0][0], xopt[0][1], logL) for p0, xopt, logL in izip(product(p0_nu, p0_T), ar_par_NM, ll) if xopt[4] == 0]

In [352]:
df = pd.DataFrame(data=rows, columns=['nu_0', 'T_0', 'nu_opt', 'T_opt', 'logL'])

In [353]:
df.sort_values(by='logL', ascending=False)


Out[353]:
nu_0 T_0 nu_opt T_opt logL
55 2.154435 0.116961 10.916014 8.999996 -8427.371011
74 46.415888 0.036033 10.916156 8.999985 -8427.371017
64 10.000000 0.036033 10.916202 8.999970 -8427.371027
65 10.000000 0.116961 10.916538 8.999964 -8427.371034
94 1000.000000 0.116961 10.915734 8.999962 -8427.371034
86 215.443469 0.379647 10.916784 8.999967 -8427.371038
95 1000.000000 0.379647 10.915593 8.999947 -8427.371046
45 0.464159 0.036033 10.915737 8.999939 -8427.371048
76 46.415888 0.379647 10.916366 8.999931 -8427.371052
53 2.154435 0.011101 10.915707 8.999904 -8427.371070
56 2.154435 0.379647 10.918372 8.999940 -8427.371162
93 1000.000000 0.036033 10.919474 8.999965 -8427.371283
54 2.154435 0.036033 10.912450 8.999985 -8427.371323
75 46.415888 0.116961 10.906295 8.999925 -8427.373200
84 215.443469 0.036033 10.901372 8.999991 -8427.375911
85 215.443469 0.116961 10.898892 8.999995 -8427.377693
66 10.000000 0.379647 10.943858 8.999929 -8427.385887
32 0.100000 0.001054 0.000142 0.000001 -9702.663060
22 0.021544 0.001054 0.000142 0.000001 -9702.663060
73 46.415888 0.011101 0.000142 0.000001 -9702.663060
58 2.154435 4.000000 0.000142 0.000001 -9702.663060
44 0.464159 0.011101 0.000142 0.000001 -9702.663060
70 46.415888 0.000325 0.000142 0.000001 -9702.663060
40 0.464159 0.000100 0.000142 0.000001 -9702.663060
91 1000.000000 0.003420 0.000142 0.000001 -9702.663060
9 0.001000 4.000000 0.000142 0.000001 -9702.663060
89 1000.000000 0.000100 0.000142 0.000001 -9702.663060
33 0.100000 0.003420 0.000142 0.000001 -9702.663060
2 0.001000 0.001054 0.000142 0.000001 -9702.663060
49 0.464159 4.000000 0.000142 0.000001 -9702.663060
... ... ... ... ... ...
29 0.021544 4.000000 0.000117 0.000001 -9703.796795
82 215.443469 0.003420 0.000117 0.000001 -9703.796795
12 0.004642 0.001054 0.000117 0.000001 -9703.796795
30 0.100000 0.000100 0.000117 0.000001 -9703.796796
28 0.021544 1.232310 0.000117 0.000001 -9703.796810
18 0.004642 1.232310 0.004413 4.222831 -9825.179957
13 0.004642 0.003420 0.000291 0.164734 -9825.179957
3 0.001000 0.003420 0.000125 0.017114 -9825.179957
4 0.001000 0.011101 0.000312 0.020193 -9825.179957
46 0.464159 0.116961 0.008205 1.510185 -9825.179957
34 0.100000 0.011101 0.008146 1.929358 -9825.179957
35 0.100000 0.036033 0.008146 1.345287 -9825.179957
7 0.001000 0.379647 0.001044 0.371688 -9825.179957
47 0.464159 0.379647 0.150241 6.421112 -9825.179957
6 0.001000 0.116961 0.001044 0.108644 -9825.179957
24 0.021544 0.011101 0.004735 0.272936 -9825.179957
8 0.001000 1.232310 0.001044 1.243624 -9825.179957
25 0.021544 0.036033 0.006268 0.551400 -9825.179957
17 0.004642 0.379647 0.004414 1.235935 -9825.179957
16 0.004642 0.116961 0.004414 0.374511 -9825.179957
26 0.021544 0.116961 0.012079 1.201627 -9825.179957
27 0.021544 0.379647 0.012079 4.224868 -9825.179957
36 0.100000 0.116961 0.030650 1.262050 -9825.179957
14 0.004642 0.011101 0.000822 0.096582 -9825.179957
5 0.001000 0.036033 0.000479 0.047682 -9825.179957
15 0.004642 0.036033 0.002126 0.171127 -9825.179957
37 0.100000 0.379647 0.113828 4.305707 -9825.179957
80 215.443469 0.000325 300.605096 0.000001 -9825.222496
90 1000.000000 0.000325 337.016302 0.000001 -9825.222512
79 215.443469 0.000100 637.204074 0.000001 -9825.222571

96 rows × 5 columns

Still hitting the boundaries of $T$. Something does not work here.

I am not sure whether it is worth continuing.

Ryan Gutenkunst:

This indicates that dadi is having trouble fitting your data. One possibility is that the history of the population includes important events that aren’t in your models. Another possibility is that your data is biased in ways that aren’t in your models. For example, maybe your missing calls for rare alleles.

dadi forum thread on parameters hitting the boundary

The data is clearly biased or at least very noisy. However, the models that only include a single event, one event of growth or decline, might also be too simplistic. So, I think before I go on and try to further clean the data, I should try the two remaining more complex 1D models that come with dadi: bottlegrowth and three_epoch. If I cannot reliably estimate parameter values for those models, then I think it is most likely a data issue.


In [34]:
?dadi.Demographics1D.bottlegrowth

In [35]:
?dadi.Demographics1D.three_epoch

These two models shall be analysed in separete notebooks.


In [ ]: