In [1]:

    
from ipyparallel import Client

cl = Client()

cl.ids









    Out[1]:





[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]



In [2]:

    
%%px --local

# run whole cell on all engines a well as in the local IPython session

import numpy as np

import sys

sys.path.insert(0, '/home/claudius/Downloads/dadi')

import dadi



In [3]:

    
%ll dadiExercises/









    



total 33676
lrwxrwxrwx 1 claudius       53 Feb 17 15:37 ERY.FOLDED.sfs -> /data3/claudius/Big_Data/ANGSD/SFS/ERY/ERY.FOLDED.sfs
-rw-rw-r-- 1 claudius      499 Mar 24 14:04 ERY.FOLDED.sfs.dadi_format
-rw-rw-r-- 1 claudius      499 Mar 24 14:02 ERY.FOLDED.sfs.dadi_format~
lrwxrwxrwx 1 claudius       37 Feb 18 17:46 EryPar.unfolded.2dsfs -> ../../ANGSD/FST/EryPar.unfolded.2dsfs
-rw-rw-r-- 1 claudius    13051 Feb 18 19:00 EryPar.unfolded.2dsfs.dadi_format
-rw-rw-r-- 1 claudius    13051 Feb 18 18:31 EryPar.unfolded.2dsfs.dadi_format~
drwxrwxr-x 5 claudius     4096 Feb 17 13:45 examples/
-rw-rw-r-- 1 claudius   155251 Mar 22 12:37 example_YRI_CEU.ipynb
-rw-rw-r-- 1 claudius   619699 Apr 20 10:19 First_Steps_with_dadi.ipynb
-rw-rw-r-- 1 claudius     1012 Mar 16 09:54 new.bib
lrwxrwxrwx 1 claudius       53 Feb 17 15:37 PAR.FOLDED.sfs -> /data3/claudius/Big_Data/ANGSD/SFS/PAR/PAR.FOLDED.sfs
-rw-rw-r-- 1 claudius      486 Mar 24 20:08 PAR.FOLDED.sfs.dadi_format
-rw-rw-r-- 1 claudius      450 Mar 24 20:08 PAR.FOLDED.sfs.dadi_format~
-rw-rw-r-- 1 claudius       18 Mar 19 11:20 seedms
-rw-rw-r-- 1 claudius 33643874 Mar 19 11:20 test.msout



In [4]:

    
%less dadiExercises/EryPar.unfolded.2dsfs.dadi_format



In [5]:

    
# import 2D unfolded spectrum

sfs2d_unfolded = dadi.Spectrum.from_file('dadiExercises/EryPar.unfolded.2dsfs.dadi_format')



In [23]:

    
%page sfs2d_unfolded



In [7]:

    
sfs2d_unfolded.sample_sizes









    Out[7]:





array([36, 36])



In [8]:

    
# add population labels
sfs2d_unfolded.pop_ids = ["ery", "par"]

For the estimation of the 2D SFS, realSFS has only taken sites that had data from at least 9 individuals in each population (see assembly.sh, lines 1423 onwards).



In [9]:

    
sfs2d_unfolded.S()









    Out[9]:





60573.584426000001

The 2D spectrum contains counts from 60k sites that are variable in par or ery or both.



In [10]:

    
import pylab

%matplotlib inline



In [22]:

    
# note this needs to be in the same cell as the dadi plotting function call to take effect
pylab.rcParams['font.size'] = 14.0
pylab.rcParams['figure.figsize'] = [12.0, 10.0]

dadi.Plotting.plot_single_2d_sfs(sfs2d_unfolded, vmin=1, cmap=pylab.cm.jet)









    Out[22]:





<matplotlib.colorbar.Colorbar at 0x7fb92998fd50>



In [24]:

    
%psource dadi.Plotting.plot_single_2d_sfs

More colormaps



In [26]:

    
sfs2d_folded = sfs2d_unfolded.fold()



In [27]:

    
# plot the folded GLOBAL minor allele frequency spectrum

dadi.Plotting.plot_single_2d_sfs(sfs2d_folded, vmin=1, cmap=pylab.cm.jet)









    Out[27]:





<matplotlib.colorbar.Colorbar at 0x7fb9296ee9d0>



In [43]:

    
# setting the smallest grid size slightly larger than the largest population sample size (36)
pts_l = [40, 50, 60]

The fitting of parameters for various 1D models to the SFS's of par and ery has indicated the following:

ery has undergone a population size increase by >20 fold (between about 1-2 $\times2N_{ref}$ generations ago) and later (<1 $\times2N_{ref}$ generations ago) a decrease to about 15% of the ancient populations size
par has undergone only one size change to <10% of the ancient population size, this is inferred to have happened in the distant past, about 2-4 ($\times 2N_{ref}$) generations ago

I think it would be good to incorporate this information in the specification of a more complex 2D model.



In [36]:

    
%pinfo dadi.Demographics2D.split_mig

There are a couple of built-in models that I could use, but I think I need a custom model here that includes the information from the 1D model fitting.

I would like to write a model function that specifies an ancient split between ery and par, then a population decline in par that lasts until the present and later an exponential growth in ery that is more recently followed by a population decline.

An alternative model to test would be a population decline in the ancestral population, followed by the split between, later population increase in ery which is more recently followed by a population decline.



In [49]:

    
def split_1grow_2decline_1decline_nomig((nu1s, nu2s, nu2f, nu1b, nu1f, Ts, T2, Tb, Tf), (n1, n2), pts):
    """
    model function: specifies an ancient split, followed by growth in pop1 and 
                    decline in pop2, later also decline in pop1
    
    nu1s: rel. size of pop1 after split
    nu2s: rel. size of pop2 after split
    nu2f: final rel. size for pop2
    nu1b: rel. size of pop1 after first size change
    nu1f: final rel. size of pop1
    Ts: time betweem population split and size change in pop2
    T2: time between size change in pop2 and first size change in pop1
    Tb: time between first and second size change in pop1
    Tf: time between second size change in pop1 and present
    
    The population split happend Tf+Tb+T2+Ts (x2N) generations in the past.
    
    n1,n2: sample sizes
    pts: number of grid points to use in extrapolation
    """
    
    # define grid
    xx = yy = dadi.Numerics.default_grid(pts)
    
    # phi for the equilibrium ancestral pop
    phi = dadi.PhiManip.phi_1D(xx)
    
    # population split into pop1 and pop2
    phi = dadi.PhiManip.phi_1D_to_2D(xx, phi)
    
    # stepwise change in size for pop1 and pop2 after split
    phi = dadi.Integration.two_pops(phi, xx, Ts, nu2=nu2s, nu1=nu1s, m12=0, m21=0)
    # stepwise change in size for pop2 only
    phi = dadi.Integration.two_pops(phi, xx, T2, nu2=nu2f, nu1=nu1s, m12=0, m21=0)
    # stepwise change in size for pop1 only
    phi = dadi.Integration.two_pops(phi, xx, Tb, nu2=nu2f, nu1=nu1b, m12=0, m21=0)
    # stepwise change in size for pop1 only
    phi = dadi.Integration.two_pops(phi, xx, Tf, nu2=nu2f, nu1=nu1f, m12=0, m21=0)
    
    # calculate spectrum
    sfs = dadi.Spectrum.from_phi(phi, (n1, n2), (xx, yy))
    return sfs

I wonder which population dadi assumes to be pop1. In the sfs2d spectrum object, ery is pop1 and par is pop2.



In [39]:

    
?dadi.PhiManip.phi_1D_to_2D



In [50]:

    
# create link to function that specifies the model
func = split_1grow_2decline_1decline_nomig

# create extrapolating version of the model function
func_ex = dadi.Numerics.make_extrap_log_func(func)



In [44]:

    
?split_1grow_2decline_1decline_nomig



In [87]:

    
nu1s = 0.5
nu2s = 0.5
nu2f = 0.05
nu1b = 40
nu1f = 0.15
Ts = 0.1
T2 = 0.1
Tb = 0.1
Tf = 0.1



In [45]:

    
sfs2d_folded.sample_sizes









    Out[45]:





array([36, 36])



In [88]:

    
model_spectrum = func_ex((nu1s, nu2s, nu2f, nu1b, nu1f, Ts, T2, Tb, Tf), sfs2d_folded.sample_sizes, pts_l)



In [89]:

    
theta = dadi.Inference.optimal_sfs_scaling(model_spectrum.fold(), sfs2d_folded)



In [90]:

    
theta









    Out[90]:





22521.516493041683



In [91]:

    
dadi.Plotting.plot_2d_comp_multinom(model_spectrum.fold(), sfs2d_folded, vmin=1)

I think this indicates that I need to allow the split to be more recent.



In [ ]:

Table of Contents