Load dadi module


In [11]:
%pwd


Out[11]:
u'/data3/claudius/Big_Data/DADI/dadiExercises'

In [12]:
import dadi


---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-12-3ba56d7f7d9f> in <module>()
----> 1 import dadi

ImportError: No module named dadi

I have not installed dadi globally on huluvu. Instead, I left it in my Downloads directory '/home/claudius/Downloads/dadi'. In order for Python to find that module, I need to add that directory to the PYTHONPATH variable.


In [13]:
import sys

sys.path # get the PYTHONPATH variable


Out[13]:
['',
 '/usr/local/anaconda2/lib/python27.zip',
 '/usr/local/anaconda2/lib/python2.7',
 '/usr/local/anaconda2/lib/python2.7/plat-linux2',
 '/usr/local/anaconda2/lib/python2.7/lib-tk',
 '/usr/local/anaconda2/lib/python2.7/lib-old',
 '/usr/local/anaconda2/lib/python2.7/lib-dynload',
 '/usr/local/anaconda2/lib/python2.7/site-packages/Sphinx-1.3.5-py2.7.egg',
 '/usr/local/anaconda2/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg',
 '/usr/local/anaconda2/lib/python2.7/site-packages',
 '/usr/local/anaconda2/lib/python2.7/site-packages/IPython/extensions',
 '/home/claudius/.ipython']

In [14]:
sys.path.insert(0, '/home/claudius/Downloads/dadi') # add path to dadi at beginning of list
sys.path


Out[14]:
['/home/claudius/Downloads/dadi',
 '',
 '/usr/local/anaconda2/lib/python27.zip',
 '/usr/local/anaconda2/lib/python2.7',
 '/usr/local/anaconda2/lib/python2.7/plat-linux2',
 '/usr/local/anaconda2/lib/python2.7/lib-tk',
 '/usr/local/anaconda2/lib/python2.7/lib-old',
 '/usr/local/anaconda2/lib/python2.7/lib-dynload',
 '/usr/local/anaconda2/lib/python2.7/site-packages/Sphinx-1.3.5-py2.7.egg',
 '/usr/local/anaconda2/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg',
 '/usr/local/anaconda2/lib/python2.7/site-packages',
 '/usr/local/anaconda2/lib/python2.7/site-packages/IPython/extensions',
 '/home/claudius/.ipython']

In [15]:
import dadi, numpy

Import custom models

I am going to analyise and example data set that comes with dadi.


In [16]:
% ll /home/claudius/Downloads/dadi/examples/YRI_CEU/


total 48
drwxrwxr-x 2 claudius  4096 Feb 17 11:39 bootstraps/
-rw-rw-r-- 1 claudius  2622 Feb 17 11:39 demographic_models.py
-rw-rw-r-- 1 claudius  3099 Mar 15 09:55 demographic_models.pyc
-rw-rw-r-- 1 claudius  7900 Feb 17 11:39 YRI_CEU.fs
-rw-rw-r-- 1 claudius 20094 Feb 17 11:39 YRI_CEU.png
-rw-rw-r-- 1 claudius  7991 Feb 17 11:39 YRI_CEU.py

The Python script YRI_CEU.py contains example code for the analysis of these two human population samples.


In [17]:
! head /home/claudius/Downloads/dadi/examples/YRI_CEU/demographic_models.py


"""
Custom demographic model for our example.
"""
import numpy
import dadi

def prior_onegrow_mig((nu1F, nu2B, nu2F, m, Tp, T), (n1,n2), pts):
    """
    Model with growth, split, bottleneck in pop2, exp recovery, migration

The file demographic_models.py contains function definitions for custom demographic models.


In [18]:
%quickref

In [19]:
# insert path where file with model functions resides
sys.path.insert(0, '/home/claudius/Downloads/dadi/examples/YRI_CEU/')

# load custom demographic model functions into current namespace
import demographic_models

Load the data


In [20]:
% ll /home/claudius/Downloads/dadi/examples/YRI_CEU/


total 48
drwxrwxr-x 2 claudius  4096 Feb 17 11:39 bootstraps/
-rw-rw-r-- 1 claudius  2622 Feb 17 11:39 demographic_models.py
-rw-rw-r-- 1 claudius  3099 Mar 15 09:55 demographic_models.pyc
-rw-rw-r-- 1 claudius  7900 Feb 17 11:39 YRI_CEU.fs
-rw-rw-r-- 1 claudius 20094 Feb 17 11:39 YRI_CEU.png
-rw-rw-r-- 1 claudius  7991 Feb 17 11:39 YRI_CEU.py

In [21]:
! cat /home/claudius/Downloads/dadi/examples/YRI_CEU/YRI_CEU.fs


21 21 
nan 1505.721985213899 430.6082121707036 223.6280617335962 133.0413413088248 88.52601752754229 65.90636430764097 49.0000242243171 36.49370238010697 26.43817069812249 19.86214952334856 16.95766334931866 16.06033599115347 14.82456456819106 11.885542008532 7.744463446292672 3.973746809686413 1.752864147390121 1.039759934086901 0.6697982528333183 0.1963211568193445 3842.972452752669 185.2037721164112 123.4164792735226 90.49888906277334 72.05253345848469 58.71765542505418 50.45147448860831 45.74962646203179 40.59569879454973 33.72860415487282 27.06095055215234 21.93829919585865 17.97017460840617 14.33963679654744 10.67612301648367 7.120790728421752 4.250393316903829 2.287466768975575 1.417442955558849 0.7840316983663741 0.236461131352045 1427.24152011533 149.1684657560176 90.61702913246906 63.14754531564202 50.31042470038281 43.56301509782539 37.83904845428278 31.65921604234235 25.63678907480523 20.6385886565992 17.14857769945113 14.7336652336435 12.45748306969472 9.907251221398617 7.474982749431014 5.577501746627377 4.252153327376249 3.3014738588955 2.341074734132574 1.795040710499667 1.00497986488745 693.9041902591357 98.80759427222931 73.11016577214154 60.40443790708419 51.06224760864516 42.14277023359546 35.29635754566603 29.76770381013423 24.86512790970849 20.57398050161875 17.19938687451179 14.75744573882049 12.94492415183872 11.44786167789135 10.19955436273318 9.059725484023517 7.886206445200252 6.642967668971703 5.311214244889122 4.633928729372417 2.526794548670105 387.905627831532 71.27257552662333 60.7075693377231 52.03894849996019 42.64247107584721 35.46837892588759 30.98837419043325 27.11392613246915 23.29639181284838 20.28879551281858 18.11779403296798 16.0567652694227 13.5976623538085 10.95348581190216 8.92419491329643 8.246571129994891 8.581267979573381 8.620387199762721 7.716575214964498 6.422134559797929 3.128633703796372 225.7194602388819 48.63588809190185 42.38453028408858 38.05826780295489 34.99893508775119 31.16319591073135 27.15081303694445 23.41188145302951 20.21543064931167 18.09208313089486 17.14122045772298 16.41498601375998 14.78036099155702 12.29618968838261 10.01612760705411 8.668988469095385 8.026620831079891 7.377239657863501 6.357755313839079 5.381000885210134 3.475744511301924 144.5671717444917 34.2595127067771 28.09231629083403 24.92541508738459 24.88724216533347 24.35140500181145 22.67105939016338 20.59581798654765 18.89539659882208 17.65965370812203 16.61576526034574 15.46223333829421 14.10112653892374 12.82436020950018 12.03984386049929 11.51375717091653 10.58444291893916 9.36176978046349 7.937655779630191 5.889370376863262 4.917604808993234 105.3809063525426 26.71377666010118 23.37671893687104 21.27773982423625 20.4892894362151 20.35078652205766 19.91015418172012 19.50815092151845 19.35284896599626 19.15649937033029 18.56812472050961 17.48076397925753 16.286978569335 15.33361844507773 14.51787876057105 13.4373727463923 12.14293482268636 10.98976940866798 9.641682062373775 8.019229528195126 7.291877528338262 69.72334833662757 20.84785685173947 19.89383595991543 18.38823821880222 18.07207892227668 18.83016412231759 19.13801146057779 18.99539618043494 18.73071308623727 18.40836426467492 18.00404938350643 17.47206353296881 16.87099812440586 16.2294003089408 15.31363884656897 13.86060381143099 12.23375952022583 10.70987505908203 10.00198170848196 10.10300713948303 8.826508439532361 38.84697248938986 18.92637928255087 17.75337702040008 16.59451127802977 17.3885003522587 18.93341172158678 18.80795290389708 17.54496098098954 16.15966435118247 15.13269710816555 14.73991176937894 14.88217591921757 15.10320390932899 14.82564913312228 13.62828871141593 11.64653331567918 9.545711981928886 8.928990859181251 12.08947424485043 15.24571228238315 11.71913591897063 24.83299495952542 15.66756455504185 15.11775851606936 14.56443497959524 16.03323296725234 17.9234843134304 17.72391286716443 15.88702521586929 13.66706058112701 11.87217619599476 10.83443266807746 10.5442059057596 10.81465797539894 11.28845700370321 11.4979618127692 11.15354398133768 10.36685422453786 11.20120661903375 15.8474785240485 19.65375132853475 15.50884000241786 17.20743835841973 10.58942013369867 10.03271015261551 9.881057845311126 10.99802991823494 12.198986317855 12.22463711335031 11.67246393176848 11.30939123096482 11.12764583004743 10.73339407145717 10.11134260346993 9.735532054737909 10.04675460008552 10.87800722664643 11.45434734480484 11.47541115603192 12.51513365134035 15.09846176398627 17.13911223557356 18.67954908643252 12.14219048545315 5.794921141072335 6.18181731285368 7.033624063617901 8.157025154512448 9.274961698772385 9.700192063302014 9.890771904180106 10.46249735179633 11.0585545502395 10.93585194908209 10.15909673165344 9.468860929391644 9.313873061814258 9.557115077278029 9.893984953512829 10.70977127214634 12.83669443078263 14.83344489935912 13.62960493577467 21.30814716511689 6.384628393466505 2.289091587737791 2.628089282173947 4.008703180248165 5.211143811890773 6.002820433243259 6.239612727782098 6.538444956075939 7.441698278973794 8.542052953322033 9.053917579478002 8.930442927227844 8.706121961446325 8.691119035878282 8.931752776310793 9.320024124233768 9.93032250467342 10.94717570472498 13.30078712437888 13.3453613214671 27.49766589079091 1.858763251917963 1.565814520767503 1.398824917992742 2.232267056965382 2.796866873799119 3.073935594565152 3.247196562658599 3.801440745625767 4.838502593144923 5.93680867754493 6.780966140621023 7.467060679716882 8.156283850300817 8.767734659457652 9.277869936804077 9.813625112220295 10.35949273544799 10.93885742872739 13.05185406086008 15.04009430912032 34.81398463490486 0.9916076252191814 1.379424693942604 1.323381917663045 1.68630883668213 1.907323921783669 2.128169130177916 2.392719728670378 2.818200863281716 3.311680711796868 3.759286008974851 4.420512067649504 5.689228463996113 7.55246030961858 9.606386443438396 11.23482819203283 12.00993059380971 12.34873979088552 12.88688646704671 14.17474940290141 16.33502250763544 42.97697804041894 0.4905033331181143 1.315645459499866 1.97155905147955 1.851779002548052 1.483264721516415 1.542834871150056 1.887466062674028 2.403009315331162 2.990408832094462 3.588534156993468 4.341677197578071 5.518103250667916 7.187069723666123 9.008783167498366 10.20479922609122 10.27985071253046 9.634801060276464 9.722180993290813 10.40911269600711 14.96766801016373 51.65495531336602 0.7696648874352177 2.617807335027857 3.416106695099775 2.443743440887043 1.841713680415238 2.221592228134253 2.69112007086947 3.003221743885019 3.291371721450353 3.467062772766969 3.479382653691287 3.573205968427403 4.02642861736036 4.895404737013315 5.973376140113663 6.873616777824589 7.260777320842492 7.447492679606445 9.106309120327651 17.48748385327377 56.3212139091749 0.3411279231917071 1.429626869307465 2.083870449475914 1.438548654227283 0.886852450890049 1.022587205935562 1.290456974273852 1.632064340276644 2.125863482462951 2.520423235347011 2.774605915793468 3.230414667106667 3.911572248050143 4.419269055087501 4.884172285825914 6.203651766032385 8.529534733580046 10.2149032478127 11.98999491843549 19.27429720415496 86.91728292186653 0.04162232099073295 0.1979752197065984 0.4442568136559377 0.9552129927410965 1.611263796549541 1.908790949886483 1.598845726468509 1.500502550190393 2.480788748247619 4.616978720610427 7.432175869312534 10.06610045846519 11.29344496022791 10.51788998813611 8.881545307591182 8.379783917393826 9.3554666836539 11.35492054687524 12.64176986065523 13.25885941266464 24.86749549002576 0.00142542619288829 0.008199131273653577 0.04967063150621476 0.2172034416747167 0.4808970718783391 0.7163939587209509 0.7806630501995213 0.7282528398815221 0.8381267830525052 1.172652920503962 1.772579989251333 2.757671180498789 3.998977598212415 5.144221241973772 5.669048129322261 5.396496810684106 5.298702620501583 7.870127798875664 11.42281363510372 5.139663135750236 nan

This a 2D SFS format as understood by dadi. It pertains to the two human populations YRI and CEU. 10 Individuals of each population have been sampled. The first 21 numbers should be [YRI: 0, CEU: 0-20]. The following 21 numbers should be [YRI: 1, CEU: 0-20] and so on.


In [22]:
# read in the unfolded 2D SFS from file

data = dadi.Spectrum.from_file('/home/claudius/Downloads/dadi/examples/YRI_CEU/YRI_CEU.fs')

In [23]:
ns = data.sample_sizes
ns


Out[23]:
array([20, 20])

Number of samples (or sample size) refers to the number of sampled gene copies.


In [24]:
# flatten the array and get its length

len([ elem for row in data.data for elem in row ])


Out[24]:
441

In [25]:
21*21


Out[25]:
441

The Spectrum is a 21 $\times$ 21 matrix.


In [ ]:
%quickref

In [ ]:
# print the docstring of the fs object

%pdoc data

In [ ]:
%pinfo data

In [ ]:
# this prints the source code for the object

%psource data

Set grid size for extrapolation


In [26]:
pts_l = [40, 50, 60]

dadi will solve the partial differential equation at these three grid sizes and then extrapolate to an infinitely fine grid.

Create demographic model function

The demographic model we'll is defined in the function demographic_models.prior_onegrow_mig. Let's have a look at its definition:


In [ ]:
%psource demographic_models.prior_onegrow_mig

The function first records the required grid size. It then initialises the phi distribution with this grid. It then specifies a stepwise change in population size of the ancestral population. This initial ancestral population is implicitly taken as the reference population. The population size parameters nu1F, nu2F and nu2B are relativ to the population size of this initial ancestral population, which is set to 1. That means, if nu1F is greater 1, the model specifies a stepwise increase in population size. The population stays at this size for a time Tp, which is specified in $2N_{ref}$ generation. Next the model function specifies a population split. One of the daughter populations has the same population size as the ancestral population (African presumably). The other daughter population starts at a population size of nu2B and then exponentially increases in size to nu2F. During this time of divergence T, the two populations exchange gene copies at a rate m in each direction. The function Spectrum.from_phi then solves the partial differential equation given the spcified model and given the specified parameter values. Finally the expected SFS given the model is returned.


In [27]:
func = demographic_models.prior_onegrow_mig

Next we turn the model function into a version that can do the extrapolation.


In [28]:
func_ex = dadi.Numerics.make_extrap_log_func(func)

The func_ex is the function we are going to use for optimisation.

Set parameter bounds and initial values

The model function takes a list of 6 parameters as its first argument. See the docstring for their description.


In [151]:
%psource func

It is necessary to confine the search space space to reasonable values.


In [152]:
upper_bounds = [100, 100, 100, 10, 3, 3]

This specifies that the maximal size that the ancestral population can grow to (nu1F) is 100$\times$ its initial size. Similarly, the maximal time the ancestral population stays at this size before it splits into two populations is 3$\times 2N_{ref}$. Note that this time is 3 times the expected time to the MRCA for a sample of $n$ gene copies when $n$ is very large and under the standard neutral model (see p. 76 in Wakeley2009).


In [153]:
lower_bounds = [1e-2, 1e-2, 1e-2, 0, 0, 0]

The lower bound of the population size parameters is set to 1/100th of the reference population and the lower bounds of migration rate and time parameters is set to 0.


In [154]:
# define starting values for model parameters

p0 = [2, 0.1, 2, 1, 0.2, 0.2]

Since the optimisation algorithms are not guaranteed to find the global optimum, it is important to run several optimisations for each data set, each with different starting values.


In [155]:
%psource dadi.Misc.perturb_params

In [156]:
p0 = dadi.Misc.perturb_params(p0, upper_bound=upper_bounds, lower_bound=lower_bounds, fold=1.5)

The naming of the function argument for the "number of factors to disturb by" is very unfortunate in this context. Anyway, a higher value leads to greater perturbation.


In [92]:
p0


Out[92]:
array([ 4.38444096,  0.23312917,  5.07436093,  0.86716405,  0.20956699,
        0.33477026])

Optimisation


In [93]:
popt_1 = dadi.Inference.optimize_log(p0, data, func_ex, pts_l, \
                                   lower_bound=lower_bounds, upper_bound=upper_bounds, \
                                  verbose=10, maxiter=10)


570     , -2536.09    , array([ 4.38883    ,  0.233129   ,  5.07436    ,  0.867164   ,  0.209567   ,  0.33477    ])
600     , -2532.63    , array([ 4.38214    ,  0.232991   ,  5.07166    ,  0.866447   ,  0.209578   ,  0.33504    ])
610     , -1946.7     , array([ 1.2064     ,  0.129687   ,  1.70966    ,  1.53314    ,  0.26994    ,  0.0902284  ])
620     , -1482.67    , array([ 2.18852    ,  0.158952   ,  2.77612    ,  0.96188    ,  0.237819   ,  0.176358   ])
630     , -1323.83    , array([ 1.67268    ,  0.0715631  ,  1.2161     ,  2.19428    ,  0.281751   ,  0.168007   ])
640     , -1579.8     , array([ 3.39283    ,  0.0604752  ,  0.457756   ,  2.69739    ,  0.330078   ,  0.102105   ])
650     , -1225.08    , array([ 1.7647     ,  0.121421   ,  0.457845   ,  2.35081    ,  0.337931   ,  0.207296   ])
660     , -1139.55    , array([ 1.70197    ,  0.128196   ,  0.618936   ,  1.87368    ,  0.250954   ,  0.211124   ])
680     , -1139.68    , array([ 1.70197    ,  0.128068   ,  0.618936   ,  1.87368    ,  0.250954   ,  0.211335   ])
690     , -1130.58    , array([ 1.70914    ,  0.136341   ,  0.632009   ,  1.78758    ,  0.294513   ,  0.218843   ])
700     , -1120.07    , array([ 1.75003    ,  0.13867    ,  0.676864   ,  1.68941    ,  0.280609   ,  0.214729   ])
710     , -1103.82    , array([ 1.80837    ,  0.14654    ,  0.742903   ,  1.51346    ,  0.270241   ,  0.204867   ])

In [94]:
p_names = ("nu1F", "nu2B", "nu2F", "m", "Tp", "T")

for n,p in zip(p_names, popt_1):
    print str(n) + "\t" + "{0:.3f}".format(p)


nu1F	1.808
nu2B	0.147
nu2F	0.743
m	1.512
Tp	0.270
T	0.205

Let's see how robust these estimates are to different starting values.


In [95]:
# define starting values for model parameters
p0 = [2, 0.1, 2, 1, 0.2, 0.2]

# create new starting values for parameters
p0 = dadi.Misc.perturb_params(p0, upper_bound=upper_bounds, lower_bound=lower_bounds, fold=1.5)

# run optimisation with 10 iterations
popt_2 = dadi.Inference.optimize_log(p0, data, func_ex, pts_l, \
                                   lower_bound=lower_bounds, upper_bound=upper_bounds, \
                                  verbose=10, maxiter=10)


720     , -1432.73    , array([ 1.28998    ,  0.236668   ,  0.744773   ,  0.57889    ,  0.0927914  ,  0.183182   ])
730     , -6475.48    , array([ 11.8848    ,  0.305247   ,  1.00396    ,  1.49127    ,  0.0937302  ,  0.0357079  ])
740     , -1205.02    , array([ 1.76042    ,  0.245497   ,  0.776575   ,  0.660902   ,  0.0929223  ,  0.145698   ])
750     , -1730.52    , array([ 3.69003    ,  0.139004   ,  0.537123   ,  0.78903    ,  0.0954514  ,  0.190401   ])
760     , -1151.96    , array([ 2.07857    ,  0.215904   ,  0.714896   ,  0.687566   ,  0.0934841  ,  0.154873   ])
770     , -1806.5     , array([ 1.24802    ,  0.0179934  ,  1.27224    ,  0.736441   ,  0.187557   ,  0.0443897  ])
780     , -1093.96    , array([ 1.76513    ,  0.115195   ,  0.806599   ,  1.28262    ,  0.100547   ,  0.139114   ])
790     , -1089.65    , array([ 1.79586    ,  0.101064   ,  0.957184   ,  1.22443    ,  0.112958   ,  0.1387     ])
800     , -1081.75    , array([ 1.77492    ,  0.094468   ,  1.00101    ,  1.22527    ,  0.183273   ,  0.135162   ])
810     , -1072.2     , array([ 1.88729    ,  0.104621   ,  1.1083     ,  1.04733    ,  0.311631   ,  0.138624   ])
820     , -1070.37    , array([ 1.86569    ,  0.0962615  ,  1.17758    ,  1.04969    ,  0.338358   ,  0.133231   ])
830     , -1068.32    , array([ 1.85163    ,  0.0831581  ,  1.37485    ,  1.0023     ,  0.388724   ,  0.121723   ])

In [104]:
for n, p1, p2 in zip(p_names, popt_1, popt_2):
    print str(n) + "\t" + "{0:.3f}".format(p1) + "\t" + "{0:.3f}".format(p2)


nu1F	1.808	1.852
nu2B	0.147	0.083
nu2F	0.743	1.375
m	1.512	1.001
Tp	0.270	0.389
T	0.205	0.122

In [105]:
# define starting values for model parameters
p0 = [2, 0.1, 2, 1, 0.2, 0.2]

# create new starting values for parameters
p0 = dadi.Misc.perturb_params(p0, upper_bound=upper_bounds, lower_bound=lower_bounds, fold=1.5)

# run optimisation with 10 iterations
popt_3 = dadi.Inference.optimize_log(p0, data, func_ex, pts_l, \
                                   lower_bound=lower_bounds, upper_bound=upper_bounds, \
                                  verbose=10, maxiter=10)


840     , -2039.83    , array([ 1.08122    ,  0.0953247  ,  4.78716    ,  0.927671   ,  0.212284   ,  0.0829189  ])
860     , -2038.36    , array([ 1.08162    ,  0.0953409  ,  4.78568    ,  0.927425   ,  0.212284   ,  0.0829868  ])
870     , -2018.4     , array([ 1.08419    ,  0.0947323  ,  4.77609    ,  0.926748   ,  0.212287   ,  0.08343    ])
890     , -2002.56    , array([ 1.08631    ,  0.0943108  ,  4.76819    ,  0.924501   ,  0.212289   ,  0.0837976  ])
910     , -2002.81    , array([ 1.08633    ,  0.0943059  ,  4.7681     ,  0.92541    ,  0.212289   ,  0.0838018  ])
920     , -2000.74    , array([ 1.08636    ,  0.094301   ,  4.76801    ,  0.92447    ,  0.212289   ,  0.08389    ])
930     , -1333.7     , array([ 1.23535    ,  0.0703565  ,  4.27674    ,  0.841712   ,  0.212438   ,  0.111862   ])
940     , -1576.74    , array([ 5.5507     ,  0.0405354  ,  2.97162    ,  0.714863   ,  0.221907   ,  0.0733492  ])
950     , -1110.6     , array([ 2.09232    ,  0.0579661  ,  3.76412    ,  0.795646   ,  0.215712   ,  0.0964734  ])
960     , -2228.77    , array([ 0.803005   ,  0.0104992  ,  0.98419    ,  0.33719    ,  0.459089   ,  0.0163747  ])
970     , -1135.1     , array([ 1.91769    ,  0.0469363  ,  2.548      ,  1.28718    ,  0.679511   ,  0.0861933  ])
980     , -1085.79    , array([ 1.93523    ,  0.0497513  ,  3.14518    ,  0.858465   ,  0.304411   ,  0.0845585  ])
990     , -1094.73    , array([ 2.03043    ,  0.0649827  ,  2.66501    ,  0.62473    ,  0.524269   ,  0.0930475  ])
1000    , -1083.75    , array([ 1.96056    ,  0.0534443  ,  3.00719    ,  0.787449   ,  0.352696   ,  0.0868648  ])
1010    , -1074.88    , array([ 1.94283    ,  0.0634955  ,  2.13517    ,  0.8199     ,  0.344517   ,  0.0950093  ])
1020    , -1071.5     , array([ 1.90667    ,  0.05239    ,  3.39704    ,  0.785715   ,  0.386217   ,  0.0963756  ])
1030    , -1066.71    , array([ 1.86858    ,  0.0707231  ,  1.8327     ,  0.875372   ,  0.366706   ,  0.108916   ])
1040    , -1066.48    , array([ 1.87888    ,  0.0676156  ,  1.95166    ,  0.902399   ,  0.36663    ,  0.107433   ])

In [106]:
for n, p1, p2, p3 in zip(p_names, popt_1, popt_2, popt_3):
    print str(n) + "\t" + "{0:.3f}".format(p1) + "\t" + "{0:.3f}".format(p2) + "\t" + "{0:.3f}".format(p3)


nu1F	1.808	1.852	1.879
nu2B	0.147	0.083	0.068
nu2F	0.743	1.375	1.952
m	1.512	1.001	0.902
Tp	0.270	0.389	0.367
T	0.205	0.122	0.107

With just 10 iterations, the optimisation does not seem to converge for all parameters.

Analysis of optimisation result


In [85]:
# best fit parameter values (from YRY_CEU.py)
popt = [1.881, 0.0710, 1.845, 0.911, 0.355, 0.111]

In [86]:
# get best-fit model AFS
model = func_ex(popt, ns, pts_l)

In [31]:
model


Out[31]:
Spectrum([[-- 0.5500028375862792 0.1582342462251485 0.08111228441069211
  0.05195616321584283 0.03669333698313781 0.02712919137525195
  0.02049804356783755 0.01562996091422075 0.011939503126304358
  0.009092633952266832 0.006878109687486422 0.005151382378808764
  0.0038075869003987256 0.002767386161549131 0.0019689832904157783
  0.0013633091859598827 0.0009109409533608609 0.000579906456718332
  0.000343153775407661 0.0001707010945071514]
 [1.394851315659119 0.07077251880004307 0.03987508540586783
  0.03064253093469252 0.025310969606988105 0.02131135859568262
  0.01800107870648523 0.015158329489597762 0.01268556254685001
  0.010529470617871572 0.008654539870729663 0.00703306175940633
  0.0056410482872343015 0.004456509062767751 0.003458749709729287
  0.0026281447157671016 0.00194615343342964 0.0013954741768305013
  0.0009600925757413339 0.0006233647886480195 0.0003514967874778031]
 [0.5106503852726605 0.04974969557551684 0.028988571815442312
  0.022961537123092138 0.019568053684112505 0.01699549762339343
  0.01480585882446581 0.012860082606259206 0.011104950858580744
  0.00951653810809083 0.008081949981057962 0.006792559002299909
  0.005641263493000985 0.004621330695352947 0.0037259339300399264
  0.002948041611577773 0.0022805450980671717 0.0017166262713989796
  0.001250188258831367 0.0008736371417260157 0.0005497728873155609]
 [0.25142608884420115 0.03754366955718058 0.022948360100757424
  0.018620185313344754 0.016204040945761475 0.014363852676214827
  0.012772499105722531 0.011328259954798052 0.009994707508022026
  0.008757740638314271 0.007611664856851728 0.006554024377674834
  0.005583490020805345 0.004698950110438741 0.003899140414114033
  0.003182579855229304 0.0025477895364470833 0.0019939417035899527
  0.001522016865388681 0.001133345993038157 0.000789329998833677]
 [0.13996751548396735 0.028715461213225265 0.018480053643043268
  0.015376259229441728 0.013657649493345437 0.01234209908388596
  0.011184767255669357 0.010110633187682154 0.009094497086885636
  0.008128281273840804 0.007210370953268032 0.0063416468612970435
  0.00552386045926882 0.0047589371279247675 0.004048712383567575
  0.003394957765710788 0.0027997747795549255 0.002266722432973156
  0.0018032944471829662 0.0014220999650110299 0.0010905642299915248]
 [0.0833524959727232 0.022031685654988788 0.014932247261192345
  0.012752131225183079 0.01156541842335784 0.01065514484052938
  0.009838763496776082 0.00906088954994153 0.008304027210947935
  0.007563822469447461 0.006840932976354145 0.006137991614683314
  0.005458362446356145 0.004805616645858581 0.004183368617073878
  0.003595411933545598 0.003046354448375956 0.0025434332847237553
  0.0021010574798511926 0.0017474366449322883 0.0014712029533389093]
 [0.05175284821434903 0.016894959493504363 0.012051079206570153
  0.010569520076622842 0.00979155312798396 0.009198269172756849
  0.008654203081678148 0.00811831136057167 0.007578204439695944
  0.007031574194964661 0.006480038839055835 0.005926836089283657
  0.005375880906645708 0.004831384631208161 0.0042977864797812525
  0.003780017860436501 0.0032844453303898617 0.002821604624945181
  0.002413721460442264 0.0021107560049446633 0.0019486742406828422]
 [0.03301263565810132 0.012918782705872942 0.009688906672295295
  0.008731068371465095 0.008264577671068338 0.00791770221798869
  0.0075904419864099735 0.0072521621089399425 0.006893723650119084
  0.006513696655981455 0.006113748994402732 0.005696914600044178
  0.005266894042627671 0.0048277964557021 0.004384166140059461
  0.003941395565230145 0.0035070456674256736 0.0030947504979781694
  0.002735790765588945 0.002509871622446805 0.0025415847498003207]
 [0.021423029386448977 0.009828783041431196 0.0077433287930035585
  0.007172106494238276 0.006938833674830245 0.006780421726804623
  0.006623487623176292 0.006444931835365322 0.006237603275472509
  0.006000203399300604 0.005733879020583855 0.005440944383431948
  0.005124378985319444 0.0047876774387120265 0.004434972919322774
  0.0040716190586031166 0.003705955741578803 0.0033546436970571442
  0.0030594965837923383 0.0029397792555050815 0.003270510702538119]
 [0.014038100371101555 0.007423457817881856 0.006138243299126708
  0.0058458998508211435 0.00578248382033741 0.0057644407985758634
  0.0057382890467796025 0.005686402670060309 0.00560281392301612
  0.005485942608070663 0.005336122197208703 0.005154678784542447
  0.004943589489481252 0.004705428179134591 0.004443587277854131
  0.004163045857996155 0.0038726418928885665 0.0035920860817960512
  0.0033753332710017257 0.003392667137116032 0.004158468022142906]
 [0.009231432329881903 0.005552103952839182 0.0048151928416332816
  0.004717270527754282 0.004772400013859527 0.0048546615601546324
  0.004925400884344362 0.004971017617581576 0.004986267398804501
  0.0049691307481369755 0.004919055647901643 0.004836314646602641
  0.004721793154490965 0.004577019935797827 0.004404486899974504
  0.004208615201047255 0.0039985811882783 0.0037972264949864692
  0.003672239885200923 0.0038576165515780874 0.0052312671921611684]
 [0.006057760833568796 0.004100363827923463 0.0037282066158101108
  0.0037589558561247167 0.003891291851616891 0.004040554707081068
  0.0041791063124632455 0.004296380053807524 0.004387644927632501
  0.004450471469078573 0.004483516585290324 0.004486094169893334
  0.004458066772283886 0.00439994966194897 0.004313325151315902
  0.004202010684571272 0.004075465120100514 0.003959753503393079
  0.003937684281221288 0.004320160102512512 0.006517842206933972]
 [0.003944755896253562 0.002980239592760596 0.0028403292274517336
  0.002949202849334718 0.0031258133514044228 0.0033146190737305997
  0.0034961538318632156 0.0036622238496817005 0.0038085724549467635
  0.003932516291731465 0.004032132703827229 0.004105990540730566
  0.004153121811875908 0.004173187465767204 0.004166978951692148
  0.004137782375223953 0.004095363999227358 0.004069059460075665
  0.004157734143297144 0.004761750448859524 0.008050586810266106]
 [0.0025340284676312504 0.002123076861770402 0.0021211910962454735
  0.002270094246012335 0.002465234678760025 0.0026712830388910297
  0.0028748355779969984 0.003069640589121421 0.0032519815521984932
  0.003419151039870481 0.003568929439659874 0.003699432129285016
  0.003809140359576926 0.0038971160674832776 0.003963579085755515
  0.004011449709689544 0.004050877335168293 0.004114402098859152
  0.004317141969011838 0.005159151204190925 0.009865711951039184]
 [0.0015947574991661332 0.0014746066606359887 0.0015452996885075702
  0.0017063777780679987 0.0019005027829161684 0.0021061108169291597
  0.0023143018228331896 0.0025204845949197896 0.0027215980928329522
  0.002915167613964784 0.003098987561212558 0.003271049445891547
  0.0034296137200934347 0.003573460207238016 0.0037025248437309245
  0.003819589667220888 0.00393527643028055 0.004085071640300384
  0.004399456829296408 0.005483751832409549 0.012003630163024843]
 [0.0009747538385346045 0.0009914319146402057 0.001090849067936191
  0.001244661273444739 0.0014235886784298222 0.0016152367112824738
  0.0018140545953365156 0.0020169145943186137 0.0022215302418722507
  0.00242590264587014 0.0026281402866042045 0.002826439715761756
  0.0030191825496727756 0.0032052075777591307 0.0033844863054783257
  0.0035599146427918783 0.003742644467658172 0.00397056997646154
  0.004387174612797641 0.005700805171046035 0.014509370127882207]
 [0.0005717348089798108 0.0006385258624231319 0.000738900276729604
  0.0008728733803123436 0.0010270465590768022 0.0011949708843640486
  0.001373578497626376 0.0015610419089717473 0.00175593527051025
  0.0019569303202986287 0.0021627033925128857 0.0023719473556313986
  0.0025834804975079563 0.0027965260583126803 0.0030114011791839454
  0.0032313486123267906 0.0034680241228658036 0.0037608137180794514
  0.004261944731204509 0.005768593330770965 0.01743303893041768]
 [0.0003159299588190484 0.00038743435327178425 0.00047282207235767017
  0.0005799126806588283 0.0007037241940787438 0.0008415272475610159
  0.0009920675746838718 0.0011546505722336347 0.0013287362450754674
  0.001513785848500943 0.0017092206455817398 0.0019144494723893012
  0.002128978605223414 0.002352684166122969 0.0025864842367759565
  0.0028341318027349217 0.0031076153570651866 0.00344641572874328
  0.004004902944903376 0.005637584248958929 0.02083041062787055]
 [0.00015924964934482678 0.00021496946361025774 0.0002779122070148772
  0.00035542312674224984 0.00044656573702264924 0.0005508113257598179
  0.0006681791221840031 0.0007989120204700274 0.0009433045281575105
  0.0011016336096238773 0.0012741449997431514 0.0014610851514972674
  0.0016628025069256283 0.0018799969362016636 0.0021143387049375076
  0.002370124730159144 0.0026592839019067573 0.003019393129447792
  0.0035975535509262502 0.005250019278492036 0.024764117656194448]
 [6.838029144491628e-05 0.000102267366764123 0.00014116002739550544
  0.00018965432153951692 0.00024843734567756904 0.00031814805431479184
  0.0003996246217131336 0.0004938225255404987 0.0006017548259999002
  0.0007244654603090152 0.0008630266994465848 0.0010185651907734642
  0.001192341586434441 0.0013859527684071393 0.001601844896157123
  0.0018447010188425278 0.002125654805171466 0.0024775512802279725
  0.00302735202065503 0.00454458229286822 0.029310078566545092]
 [2.0356702550698528e-05 3.423452848006883e-05 5.1265833131007154e-05
  7.350112141792758e-05 0.00010194372794678439 0.00013765069557252206
  0.0001818654427541338 0.00023603282357071407 0.0003018139965155904
  0.0003811135216158995 0.000476122100850237 0.0005893840776947163
  0.0007239125192661192 0.0008834080807558893 0.0010727282447644106
  0.0012990338326761737 0.0015750546723351858 0.0019304253944992004
  0.002464186903292395 0.0037472179492354096 --]], folded=False, pop_ids=None)

In [32]:
model.data.sum()


Out[32]:
11.141569469942173

I do not understand what is in this model spectrum. I thought it would be expected proportions, so that the sum across the spectrum would be 1. I think these are expected counts (not proportions) assuming a $\theta$ of 1.


In [87]:
# Log likelihood of the data given the model
ll = dadi.Inference.ll_multinom(model, data)
ll


Out[87]:
-1066.3460755932974

In [34]:
%psource dadi.Inference.ll_multinom

In [35]:
# the optimal value of theta0 given the model
theta0 = dadi.Inference.optimal_sfs_scaling(model, data)
theta0


Out[35]:
2749.285796480809

In [36]:
import pylab

%matplotlib inline

pylab.rcParams['figure.figsize'] = [12.0, 10.0]

In [37]:
# plot a comparison of the model SFS with the SFS from the data

dadi.Plotting.plot_2d_comp_multinom(model, data, vmin=1, resid_range=3, pop_ids=("YRI", "CEU"))



In [162]:
# print the docstring of the function

%pdoc dadi.Plotting.plot_2d_comp_multinom

Simulation from estimated model

The following requires that ms is installed.


In [38]:
# generate the core of a ms command with the optimised model parameter values

mscore = demographic_models.prior_onegrow_mig_mscore(popt)

In [39]:
# generate full ms command

mscommand = dadi.Misc.ms_command(1., ns, mscore, int(1e5))

Note, the ms command specifies a $\theta$ of 1 for better efficiency. The simulated spectra can be rescaled later with the theta0 from above.


In [40]:
mscommand


Out[40]:
'ms 40 100000 -t 1.000000 -I 2 20 20 -n 1 1.881000 -n 2 1.845000 -eg 0 2 58.694679 -ma x 1.822000 1.822000 x -ej 0.055500 2 1 -en 0.233000 1 1'

In [41]:
import os

return_code = os.system('{0} > test.msout'.format(mscommand))

In [42]:
% ll


total 33660
lrwxrwxrwx 1 claudius       53 Feb 17 15:37 ERY.FOLDED.sfs -> /data3/claudius/Big_Data/ANGSD/SFS/ERY/ERY.FOLDED.sfs
-rw-rw-r-- 1 claudius      462 Mar 15 12:48 ERY.FOLDED.sfs.dadi_format
-rw-rw-r-- 1 claudius      462 Mar 15 12:45 ERY.FOLDED.sfs.dadi_format~
lrwxrwxrwx 1 claudius       37 Feb 18 17:46 EryPar.unfolded.2dsfs -> ../../ANGSD/FST/EryPar.unfolded.2dsfs
-rw-rw-r-- 1 claudius    13051 Feb 18 19:00 EryPar.unfolded.2dsfs.dadi_format
-rw-rw-r-- 1 claudius    13051 Feb 18 18:31 EryPar.unfolded.2dsfs.dadi_format~
drwxrwxr-x 5 claudius     4096 Feb 17 13:45 examples/
-rw-rw-r-- 1 claudius   137691 Mar 19 11:19 example_YRI_CEU.ipynb
-rw-rw-r-- 1 claudius   619518 Mar 17 16:18 First_Steps_with_dadi.ipynb
-rw-rw-r-- 1 claudius     1012 Mar 16 09:54 new.bib
lrwxrwxrwx 1 claudius       53 Feb 17 15:37 PAR.FOLDED.sfs -> /data3/claudius/Big_Data/ANGSD/SFS/PAR/PAR.FOLDED.sfs
-rw-rw-r-- 1 claudius      412 Feb 17 16:29 PAR.FOLDED.sfs.dadi_format
-rw-rw-r-- 1 claudius      218 Feb 17 15:51 PAR.FOLDED.sfs.dadi_format~
-rw-rw-r-- 1 claudius       18 Mar 19 11:20 seedms
-rw-rw-r-- 1 claudius 33643874 Mar 19 11:20 test.msout

In [43]:
msdata = dadi.Spectrum.from_ms_file('test.msout')

In [44]:
dadi.Plotting.plot_2d_comp_multinom(model, theta0*msdata, vmin=1, pop_ids=['YRI', 'CEU'])


The spectrum simulated with ms (averaged across iterations, I believe) is almost identical to the model spectrum. This confirms that $\delta$a$\delta$i's deterministic approximation is very good. One could now compare the ms simulated spectra to the observed spectrum.

Parameter uncertainty

In order to obtain confidence intervals for the parameter estimates, one needs to create conventional bootstraps over unlinked loci, i. e. over contigs instead of nucleotide sites. From these bootstrapped data sets one can generate site frequency spectra and estimate model parameters as for the full observed data. However, this is computationally expensive. A more efficient alternative is calculating the Godambe Information Matrix (GIM) from the bootstrapped data sets (see Coffman2016 for details).


In [48]:
# the examples directory contains site frequency spectra from bootstraps

% ll examples/YRI_CEU/bootstraps/


total 1200
-rw-rw-r-- 1 claudius 8813 Feb 17 13:45 00.fs
-rw-rw-r-- 1 claudius 8819 Feb 17 13:45 01.fs
-rw-rw-r-- 1 claudius 8808 Feb 17 13:45 02.fs
-rw-rw-r-- 1 claudius 8794 Feb 17 13:45 03.fs
-rw-rw-r-- 1 claudius 8829 Feb 17 13:45 04.fs
-rw-rw-r-- 1 claudius 8824 Feb 17 13:45 05.fs
-rw-rw-r-- 1 claudius 8830 Feb 17 13:45 06.fs
-rw-rw-r-- 1 claudius 8816 Feb 17 13:45 07.fs
-rw-rw-r-- 1 claudius 8799 Feb 17 13:45 08.fs
-rw-rw-r-- 1 claudius 8833 Feb 17 13:45 09.fs
-rw-rw-r-- 1 claudius 8820 Feb 17 13:45 10.fs
-rw-rw-r-- 1 claudius 8833 Feb 17 13:45 11.fs
-rw-rw-r-- 1 claudius 8812 Feb 17 13:45 12.fs
-rw-rw-r-- 1 claudius 8809 Feb 17 13:45 13.fs
-rw-rw-r-- 1 claudius 8822 Feb 17 13:45 14.fs
-rw-rw-r-- 1 claudius 8838 Feb 17 13:45 15.fs
-rw-rw-r-- 1 claudius 8812 Feb 17 13:45 16.fs
-rw-rw-r-- 1 claudius 8831 Feb 17 13:45 17.fs
-rw-rw-r-- 1 claudius 8817 Feb 17 13:45 18.fs
-rw-rw-r-- 1 claudius 8826 Feb 17 13:45 19.fs
-rw-rw-r-- 1 claudius 8832 Feb 17 13:45 20.fs
-rw-rw-r-- 1 claudius 8820 Feb 17 13:45 21.fs
-rw-rw-r-- 1 claudius 8822 Feb 17 13:45 22.fs
-rw-rw-r-- 1 claudius 8825 Feb 17 13:45 23.fs
-rw-rw-r-- 1 claudius 8817 Feb 17 13:45 24.fs
-rw-rw-r-- 1 claudius 8820 Feb 17 13:45 25.fs
-rw-rw-r-- 1 claudius 8799 Feb 17 13:45 26.fs
-rw-rw-r-- 1 claudius 8808 Feb 17 13:45 27.fs
-rw-rw-r-- 1 claudius 8841 Feb 17 13:45 28.fs
-rw-rw-r-- 1 claudius 8818 Feb 17 13:45 29.fs
-rw-rw-r-- 1 claudius 8810 Feb 17 13:45 30.fs
-rw-rw-r-- 1 claudius 8814 Feb 17 13:45 31.fs
-rw-rw-r-- 1 claudius 8807 Feb 17 13:45 32.fs
-rw-rw-r-- 1 claudius 8848 Feb 17 13:45 33.fs
-rw-rw-r-- 1 claudius 8814 Feb 17 13:45 34.fs
-rw-rw-r-- 1 claudius 8814 Feb 17 13:45 35.fs
-rw-rw-r-- 1 claudius 8797 Feb 17 13:45 36.fs
-rw-rw-r-- 1 claudius 8850 Feb 17 13:45 37.fs
-rw-rw-r-- 1 claudius 8792 Feb 17 13:45 38.fs
-rw-rw-r-- 1 claudius 8799 Feb 17 13:45 39.fs
-rw-rw-r-- 1 claudius 8831 Feb 17 13:45 40.fs
-rw-rw-r-- 1 claudius 8797 Feb 17 13:45 41.fs
-rw-rw-r-- 1 claudius 8818 Feb 17 13:45 42.fs
-rw-rw-r-- 1 claudius 8814 Feb 17 13:45 43.fs
-rw-rw-r-- 1 claudius 8820 Feb 17 13:45 44.fs
-rw-rw-r-- 1 claudius 8801 Feb 17 13:45 45.fs
-rw-rw-r-- 1 claudius 8826 Feb 17 13:45 46.fs
-rw-rw-r-- 1 claudius 8806 Feb 17 13:45 47.fs
-rw-rw-r-- 1 claudius 8810 Feb 17 13:45 48.fs
-rw-rw-r-- 1 claudius 8810 Feb 17 13:45 49.fs
-rw-rw-r-- 1 claudius 8817 Feb 17 13:45 50.fs
-rw-rw-r-- 1 claudius 8824 Feb 17 13:45 51.fs
-rw-rw-r-- 1 claudius 8839 Feb 17 13:45 52.fs
-rw-rw-r-- 1 claudius 8832 Feb 17 13:45 53.fs
-rw-rw-r-- 1 claudius 8832 Feb 17 13:45 54.fs
-rw-rw-r-- 1 claudius 8837 Feb 17 13:45 55.fs
-rw-rw-r-- 1 claudius 8824 Feb 17 13:45 56.fs
-rw-rw-r-- 1 claudius 8805 Feb 17 13:45 57.fs
-rw-rw-r-- 1 claudius 8819 Feb 17 13:45 58.fs
-rw-rw-r-- 1 claudius 8816 Feb 17 13:45 59.fs
-rw-rw-r-- 1 claudius 8812 Feb 17 13:45 60.fs
-rw-rw-r-- 1 claudius 8843 Feb 17 13:45 61.fs
-rw-rw-r-- 1 claudius 8810 Feb 17 13:45 62.fs
-rw-rw-r-- 1 claudius 8819 Feb 17 13:45 63.fs
-rw-rw-r-- 1 claudius 8821 Feb 17 13:45 64.fs
-rw-rw-r-- 1 claudius 8807 Feb 17 13:45 65.fs
-rw-rw-r-- 1 claudius 8842 Feb 17 13:45 66.fs
-rw-rw-r-- 1 claudius 8825 Feb 17 13:45 67.fs
-rw-rw-r-- 1 claudius 8833 Feb 17 13:45 68.fs
-rw-rw-r-- 1 claudius 8811 Feb 17 13:45 69.fs
-rw-rw-r-- 1 claudius 8832 Feb 17 13:45 70.fs
-rw-rw-r-- 1 claudius 8850 Feb 17 13:45 71.fs
-rw-rw-r-- 1 claudius 8844 Feb 17 13:45 72.fs
-rw-rw-r-- 1 claudius 8828 Feb 17 13:45 73.fs
-rw-rw-r-- 1 claudius 8790 Feb 17 13:45 74.fs
-rw-rw-r-- 1 claudius 8840 Feb 17 13:45 75.fs
-rw-rw-r-- 1 claudius 8809 Feb 17 13:45 76.fs
-rw-rw-r-- 1 claudius 8822 Feb 17 13:45 77.fs
-rw-rw-r-- 1 claudius 8818 Feb 17 13:45 78.fs
-rw-rw-r-- 1 claudius 8810 Feb 17 13:45 79.fs
-rw-rw-r-- 1 claudius 8833 Feb 17 13:45 80.fs
-rw-rw-r-- 1 claudius 8814 Feb 17 13:45 81.fs
-rw-rw-r-- 1 claudius 8837 Feb 17 13:45 82.fs
-rw-rw-r-- 1 claudius 8821 Feb 17 13:45 83.fs
-rw-rw-r-- 1 claudius 8802 Feb 17 13:45 84.fs
-rw-rw-r-- 1 claudius 8833 Feb 17 13:45 85.fs
-rw-rw-r-- 1 claudius 8807 Feb 17 13:45 86.fs
-rw-rw-r-- 1 claudius 8791 Feb 17 13:45 87.fs
-rw-rw-r-- 1 claudius 8828 Feb 17 13:45 88.fs
-rw-rw-r-- 1 claudius 8815 Feb 17 13:45 89.fs
-rw-rw-r-- 1 claudius 8825 Feb 17 13:45 90.fs
-rw-rw-r-- 1 claudius 8823 Feb 17 13:45 91.fs
-rw-rw-r-- 1 claudius 8833 Feb 17 13:45 92.fs
-rw-rw-r-- 1 claudius 8836 Feb 17 13:45 93.fs
-rw-rw-r-- 1 claudius 8816 Feb 17 13:45 94.fs
-rw-rw-r-- 1 claudius 8794 Feb 17 13:45 95.fs
-rw-rw-r-- 1 claudius 8830 Feb 17 13:45 96.fs
-rw-rw-r-- 1 claudius 8832 Feb 17 13:45 97.fs
-rw-rw-r-- 1 claudius 8831 Feb 17 13:45 98.fs
-rw-rw-r-- 1 claudius 8837 Feb 17 13:45 99.fs

In [58]:
# load spectra from bootstraps of the data into an array

all_boot = [ dadi.Spectrum.from_file('examples/YRI_CEU/bootstraps/{0:02d}.fs'.format(i)) for i in range(100) ]

In [56]:
print ['{0:02d}.fs'.format(i) for i in range(100)]


['00.fs', '01.fs', '02.fs', '03.fs', '04.fs', '05.fs', '06.fs', '07.fs', '08.fs', '09.fs', '10.fs', '11.fs', '12.fs', '13.fs', '14.fs', '15.fs', '16.fs', '17.fs', '18.fs', '19.fs', '20.fs', '21.fs', '22.fs', '23.fs', '24.fs', '25.fs', '26.fs', '27.fs', '28.fs', '29.fs', '30.fs', '31.fs', '32.fs', '33.fs', '34.fs', '35.fs', '36.fs', '37.fs', '38.fs', '39.fs', '40.fs', '41.fs', '42.fs', '43.fs', '44.fs', '45.fs', '46.fs', '47.fs', '48.fs', '49.fs', '50.fs', '51.fs', '52.fs', '53.fs', '54.fs', '55.fs', '56.fs', '57.fs', '58.fs', '59.fs', '60.fs', '61.fs', '62.fs', '63.fs', '64.fs', '65.fs', '66.fs', '67.fs', '68.fs', '69.fs', '70.fs', '71.fs', '72.fs', '73.fs', '74.fs', '75.fs', '76.fs', '77.fs', '78.fs', '79.fs', '80.fs', '81.fs', '82.fs', '83.fs', '84.fs', '85.fs', '86.fs', '87.fs', '88.fs', '89.fs', '90.fs', '91.fs', '92.fs', '93.fs', '94.fs', '95.fs', '96.fs', '97.fs', '98.fs', '99.fs']

In [64]:
%%time

uncerts = dadi.Godambe.GIM_uncert(func_ex, pts_l, all_boot, popt, data, multinom=True)


CPU times: user 23.4 s, sys: 3.81 ms, total: 23.4 s
Wall time: 23.4 s

In [65]:
print 'Estimated parameter standard deviations from GIM: {0}'.format(uncerts)


Estimated parameter standard deviations from GIM: [  2.37374973e-01   1.18487380e-02   7.57283821e-01   2.53603366e-01
   2.47282936e-01   1.71900105e-02   4.08528213e+02]

Folded data


In [67]:
# These are the optimal parameters when the spectrum is folded. They can be
# found simply by passing data.fold() to the above call to optimize_log. 

popt_fold =  numpy.array([1.907,  0.073,  1.830,  0.899,  0.425,  0.113])

In [68]:
# get standard deviations for model parameters

uncerts_folded = dadi.Godambe.GIM_uncert(func_ex, pts_l, all_boot, popt_fold, data.fold(), multinom=True)

In [72]:
print 'Folding increases parameter uncertainties by factors of: {}'.format(uncerts_folded/uncerts)


Folding increases parameter uncertainties by factors of: [ 2.38220944  1.73803606  1.52260491  1.61800623  3.9824292   2.17113759
  2.75118888]

Outgroup information greatly increases power!

Likelihood Ratio Test (LRT) between models

The following will compare the model with migration with a model without migration, thus testing whether the inferred migration rate is significantly different from 0.


In [74]:
# the model without migration is also defined in the demographic_models script

func_nomig = demographic_models.prior_onegrow_nomig
func_ex_nomig = dadi.Numerics.make_extrap_log_func(func_nomig)

In [75]:
# these are the best-fit parameters for the model without migration, 
# as provided in YRI_CEU.py

popt_nomig = numpy.array([ 1.897,  0.0388,  9.677,  0.395,  0.070])

In [76]:
# get the expected AFS from the model without migration

model_nomig = func_ex_nomig(popt_nomig, ns, pts_l)

In [91]:
# get the likelihood of the data given the model without migration

ll_nomig = dadi.Inference.ll_multinom(model_nomig, data)


Out[91]:
-1066.3460755932974

In [93]:
print 'The log likelihood of the model with migration was: {0:.1f}'.format(ll)
print 'The log likelihodd of the model without migration is: {0:.1f}'.format(ll_nomig)


The log likelihood of the model with migration was: -1066.3
The log likelihodd of the model without migration is: -1146.1

The more complex model with migration (one parameter more) has a greater likelihood as expected. But is that difference significant or just due to better being able to fit noise in the data?


In [79]:
p_lrt = popt
p_lrt[3] = 0
print p_lrt
print popt
# the first line just creates a reference, not a copy


[1.881, 0.071, 1.845, 0, 0.355, 0.111]
[1.881, 0.071, 1.845, 0, 0.355, 0.111]

In [80]:
# best fit parameter values for the model with migration (from YRY_CEU.py)
popt = [1.881, 0.0710, 1.845, 0.911, 0.355, 0.111]

In [82]:
p_lrt = popt[:] # copy parameter list
p_lrt[3] = 0
print p_lrt
print popt


[1.881, 0.071, 1.845, 0, 0.355, 0.111]
[1.881, 0.071, 1.845, 0.911, 0.355, 0.111]

Need to calculate an adjustment factor, maybe correcting for linkage (see Coffman2016).


In [84]:
adj = dadi.Godambe.LRT_adjust(func_ex, pts_l, all_boot, p_lrt, data, nested_indices=[3], multinom=True)

In [88]:
D_adj = adj * 2 * (ll - ll_nomig)

In [89]:
print 'Adjusted D statistic: {0:.4f}'.format(D_adj)


Adjusted D statistic: 0.2811

Verbatim from YRI_CEU.py: "Because this is test of a parameter on the boundary of parameter space (m cannot be less than zero), our null distribution is an even proportion of chi^2 distributions with 0 and 1 d.o.f. To evaluate the p-value, we use the point percent function for a weighted sum of chi^2 dists."

See also the manual and Coffman2016.


In [90]:
pval = dadi.Godambe.sum_chi2_ppf(D_adj, weights=(0.5, 0.5))

In [94]:
print 'p-val for rejecting the no-migration model: {0:.4f}'.format(pval)


p-val for rejecting the no-migration model: 0.2980

Adding the migration parameter does not significantly improve the fit of the model to the data. According to this data (and analysis), gene flow between YRI and CEU has not significantly affected the distribution of genetic variation as summarised in the joint SFS.


In [ ]: