In [1]:
from ipyparallel import Client
cl = Client()
cl.ids
Out[1]:
In [2]:
%%px --local
# run whole cell on all engines a well as in the local IPython session
import numpy as np
import sys
sys.path.insert(0, '/home/claudius/Downloads/dadi')
import dadi
In [3]:
%ll dadiExercises/
In [4]:
%less dadiExercises/EryPar.unfolded.2dsfs.dadi_format
In [5]:
# import 2D unfolded spectrum
sfs2d_unfolded = dadi.Spectrum.from_file('dadiExercises/EryPar.unfolded.2dsfs.dadi_format')
In [23]:
%page sfs2d_unfolded
In [7]:
sfs2d_unfolded.sample_sizes
Out[7]:
In [8]:
# add population labels
sfs2d_unfolded.pop_ids = ["ery", "par"]
For the estimation of the 2D SFS, realSFS
has only taken sites that had data from at least 9 individuals in each population (see assembly.sh
, lines 1423 onwards).
In [9]:
sfs2d_unfolded.S()
Out[9]:
The 2D spectrum contains counts from 60k sites that are variable in par or ery or both.
In [10]:
import pylab
%matplotlib inline
In [22]:
# note this needs to be in the same cell as the dadi plotting function call to take effect
pylab.rcParams['font.size'] = 14.0
pylab.rcParams['figure.figsize'] = [12.0, 10.0]
dadi.Plotting.plot_single_2d_sfs(sfs2d_unfolded, vmin=1, cmap=pylab.cm.jet)
Out[22]:
In [24]:
%psource dadi.Plotting.plot_single_2d_sfs
In [26]:
sfs2d_folded = sfs2d_unfolded.fold()
In [27]:
# plot the folded GLOBAL minor allele frequency spectrum
dadi.Plotting.plot_single_2d_sfs(sfs2d_folded, vmin=1, cmap=pylab.cm.jet)
Out[27]:
In [43]:
# setting the smallest grid size slightly larger than the largest population sample size (36)
pts_l = [40, 50, 60]
The fitting of parameters for various 1D models to the SFS's of par and ery has indicated the following:
I think it would be good to incorporate this information in the specification of a more complex 2D model.
In [36]:
%pinfo dadi.Demographics2D.split_mig
There are a couple of built-in models that I could use, but I think I need a custom model here that includes the information from the 1D model fitting.
I would like to write a model function that specifies an ancient split between ery and par, then a population decline in par that lasts until the present and later an exponential growth in ery that is more recently followed by a population decline.
An alternative model to test would be a population decline in the ancestral population, followed by the split between, later population increase in ery which is more recently followed by a population decline.
In [49]:
def split_1grow_2decline_1decline_nomig((nu1s, nu2s, nu2f, nu1b, nu1f, Ts, T2, Tb, Tf), (n1, n2), pts):
"""
model function: specifies an ancient split, followed by growth in pop1 and
decline in pop2, later also decline in pop1
nu1s: rel. size of pop1 after split
nu2s: rel. size of pop2 after split
nu2f: final rel. size for pop2
nu1b: rel. size of pop1 after first size change
nu1f: final rel. size of pop1
Ts: time betweem population split and size change in pop2
T2: time between size change in pop2 and first size change in pop1
Tb: time between first and second size change in pop1
Tf: time between second size change in pop1 and present
The population split happend Tf+Tb+T2+Ts (x2N) generations in the past.
n1,n2: sample sizes
pts: number of grid points to use in extrapolation
"""
# define grid
xx = yy = dadi.Numerics.default_grid(pts)
# phi for the equilibrium ancestral pop
phi = dadi.PhiManip.phi_1D(xx)
# population split into pop1 and pop2
phi = dadi.PhiManip.phi_1D_to_2D(xx, phi)
# stepwise change in size for pop1 and pop2 after split
phi = dadi.Integration.two_pops(phi, xx, Ts, nu2=nu2s, nu1=nu1s, m12=0, m21=0)
# stepwise change in size for pop2 only
phi = dadi.Integration.two_pops(phi, xx, T2, nu2=nu2f, nu1=nu1s, m12=0, m21=0)
# stepwise change in size for pop1 only
phi = dadi.Integration.two_pops(phi, xx, Tb, nu2=nu2f, nu1=nu1b, m12=0, m21=0)
# stepwise change in size for pop1 only
phi = dadi.Integration.two_pops(phi, xx, Tf, nu2=nu2f, nu1=nu1f, m12=0, m21=0)
# calculate spectrum
sfs = dadi.Spectrum.from_phi(phi, (n1, n2), (xx, yy))
return sfs
I wonder which population dadi assumes to be pop1. In the sfs2d
spectrum object, ery is pop1 and par is pop2.
In [39]:
?dadi.PhiManip.phi_1D_to_2D
In [50]:
# create link to function that specifies the model
func = split_1grow_2decline_1decline_nomig
# create extrapolating version of the model function
func_ex = dadi.Numerics.make_extrap_log_func(func)
In [44]:
?split_1grow_2decline_1decline_nomig
In [87]:
nu1s = 0.5
nu2s = 0.5
nu2f = 0.05
nu1b = 40
nu1f = 0.15
Ts = 0.1
T2 = 0.1
Tb = 0.1
Tf = 0.1
In [45]:
sfs2d_folded.sample_sizes
Out[45]:
In [88]:
model_spectrum = func_ex((nu1s, nu2s, nu2f, nu1b, nu1f, Ts, T2, Tb, Tf), sfs2d_folded.sample_sizes, pts_l)
In [89]:
theta = dadi.Inference.optimal_sfs_scaling(model_spectrum.fold(), sfs2d_folded)
In [90]:
theta
Out[90]:
In [91]:
dadi.Plotting.plot_2d_comp_multinom(model_spectrum.fold(), sfs2d_folded, vmin=1)
I think this indicates that I need to allow the split to be more recent.
In [ ]: