Determine whether reference 12C genomic DNA could be used to determine 13C incorporation

  • Compositional data issue:
    • Only ratios matter, so finding the mode in euclidean space from compositional data may be imposible
      • See example below
    • This may be able to be circumvented by marker genomic DNA.
      • If 12C genomic DNA is added pre-fractionation, then variance from this marker distribution would give an indication of the BD of other taxa

Init


In [1]:
%load_ext rpy2.ipython

In [16]:
%%R
library(dplyr)
library(tidyr)
library(ggplot2)
library(compositions)
library(coenocliner)
#library(robCompositions)


/opt/anaconda/lib/python2.7/site-packages/rpy2/robjects/functions.py:106: UserWarning: Loading required package: tensorA

  res = super(Function, self).__call__(*new_args, **new_kwargs)
/opt/anaconda/lib/python2.7/site-packages/rpy2/robjects/functions.py:106: UserWarning: 
Attaching package: ‘tensorA’


  res = super(Function, self).__call__(*new_args, **new_kwargs)
/opt/anaconda/lib/python2.7/site-packages/rpy2/robjects/functions.py:106: UserWarning: The following object is masked from ‘package:base’:

    norm


  res = super(Function, self).__call__(*new_args, **new_kwargs)
/opt/anaconda/lib/python2.7/site-packages/rpy2/robjects/functions.py:106: UserWarning: Loading required package: energy

  res = super(Function, self).__call__(*new_args, **new_kwargs)
/opt/anaconda/lib/python2.7/site-packages/rpy2/robjects/functions.py:106: UserWarning: Loading required package: bayesm

  res = super(Function, self).__call__(*new_args, **new_kwargs)
/opt/anaconda/lib/python2.7/site-packages/rpy2/robjects/functions.py:106: UserWarning: Welcome to compositions, a package for compositional data analysis.
Find an intro with "? compositions"


  res = super(Function, self).__call__(*new_args, **new_kwargs)
/opt/anaconda/lib/python2.7/site-packages/rpy2/robjects/functions.py:106: UserWarning: 
Attaching package: ‘compositions’


  res = super(Function, self).__call__(*new_args, **new_kwargs)
/opt/anaconda/lib/python2.7/site-packages/rpy2/robjects/functions.py:106: UserWarning: The following objects are masked from ‘package:stats’:

    cor, cov, dist, var


  res = super(Function, self).__call__(*new_args, **new_kwargs)
/opt/anaconda/lib/python2.7/site-packages/rpy2/robjects/functions.py:106: UserWarning: The following objects are masked from ‘package:base’:

    %*%, scale, scale.default


  res = super(Function, self).__call__(*new_args, **new_kwargs)

Creating communities

Community template


In [3]:
%%R 
set.seed(2)
M <- 3                                      # number of species
ming <- 1.60                                # gradient minimum...
maxg <- 1.80                                # ...and maximum
meang = mean(c(ming, maxg))
locs <- seq(ming, maxg, length = 30)        # gradient locations
opt  <- rnorm(M, mean=meang, sd=0)          # runif(M, min = ming, max = maxg)   # species optima
tol  <- rep(0.025, M)                       # species tolerances
h    <- ceiling(rlnorm(M, meanlog = 12))    # max abundances
pars <- cbind(opt = opt, tol = tol, h = h)  # put in a matrix

In [4]:
%%R -w 600 -h 350
mu1 <- coenocline(locs, responseModel = "gaussian", params = pars,
                    countModel = "poisson") #, countParams = list(alpha = 0.5))


matplot(locs, mu1, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Abundance", main='Template community')


control & treatment communities


In [5]:
%%R -w 600 -h 350

n.frac = 19

start.cont = 8
comm.cont = mu1[start.cont:(start.cont+n.frac),]

start.treat = 1
comm.treat = mu1[start.treat:(start.treat+n.frac),]

locs.p = locs[start.cont:(start.cont+n.frac)]


matplot(locs.p, comm.cont, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Abundance", main='Control')
matplot(locs.p, comm.treat, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Abundance", main='Treatment')


Convert to relative abundances


In [6]:
%%R -w 600 -h 350

to.rel = function(x){
    y = sum(x)
    if(y == 0){
        return(x)
    } else {
        return(x / y)
    }
}

comm.cont.rel = t(apply(comm.cont, 1, to.rel))
comm.treat.rel = t(apply(comm.treat, 1, to.rel))


matplot(locs.p, comm.cont.rel, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Relative abundance", main='Control')
matplot(locs.p, comm.treat.rel, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Relative abundance", main='Treatment')


Using a 12C-marker


In [49]:
%%R -w 600 -h 350

n.frac = 19

start.cont = 8
comm.cont = mu1[start.cont:(start.cont+n.frac),]

start.treat = 1
comm.treat = mu1[start.treat:(start.treat+n.frac),]
comm.treat[,2] = comm.cont[,2]

locs.p = locs[start.cont:(start.cont+n.frac)]


matplot(locs.p, comm.cont, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Abundance", main='Control')
matplot(locs.p, comm.treat, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Abundance", main='Treatment')



In [50]:
%%R -w 600 -h 350

comm.cont.rel = t(apply(comm.cont, 1, to.rel))
comm.treat.rel = t(apply(comm.treat, 1, to.rel))


matplot(locs.p, comm.cont.rel, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Relative abundance", main='Control')
matplot(locs.p, comm.treat.rel, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Relative abundance", main='Treatment')


Marker taxon not spanning into heavy fractions


In [51]:
%%R -w 600 -h 350

m = 120000

comm.cont[,2] = comm.cont[,2] - m
comm.cont[comm.cont < 0] = 0

comm.treat[,2] = comm.treat[,2] - m
comm.treat[comm.treat < 0] = 0

matplot(locs.p, comm.cont, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Abundance", main='Control')
matplot(locs.p, comm.treat, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Abundance", main='Treatment')



In [53]:
%%R -w 600 -h 350

comm.cont.rel = t(apply(comm.cont, 1, to.rel))
comm.treat.rel = t(apply(comm.treat, 1, to.rel))


matplot(locs.p, comm.cont.rel, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Relative abundance", main='Control')
matplot(locs.p, comm.treat.rel, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Relative abundance", main='Treatment')


Compositional variation


In [65]:
%%R -w 600 -h 350

n.frac = 19

start.cont = 8
comm.cont = mu1[start.cont:(start.cont+n.frac),]

start.treat = 1
comm.treat = mu1[start.treat:(start.treat+n.frac),]
comm.treat[,2] = comm.cont[,2]

locs.p = locs[start.cont:(start.cont+n.frac)]


matplot(locs.p, comm.cont, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Abundance", main='Control')
matplot(locs.p, comm.treat, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Abundance", main='Treatment')



In [67]:
%%R -w 600 -h 350

comm.cont.rel = t(apply(comm.cont, 1, to.rel))
comm.treat.rel = t(apply(comm.treat, 1, to.rel))


matplot(locs.p, comm.cont.rel, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Relative abundance", main='Control')
matplot(locs.p, comm.treat.rel, lty = "solid", type = "p", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Relative abundance", main='Treatment')



In [68]:
%%R
comm.cont.rel.acomp = acomp(comm.cont.rel)
x = variation(comm.cont.rel.acomp)[-2,2]
x %>% print

comm.treat.rel.acomp = acomp(comm.treat.rel)
x = variation(comm.treat.rel.acomp)[-2,2]
x %>% print


[1] 4.197845e-04 2.993141e-05
[1] 10.142181  9.938133

BD_shift ~ Aitchson_variation

  • how BD shift relates to Aitchison distance
  • Method:
    • simulate 2 communities
      • comm1: taxa with euclidean mode at same location
      • comm2: taxa with differing BD shifts from comm1
        • contains a reference taxon (also found in comm1)
    • calculate variation (vs marker) for both communities
    • calculate delta_var: (var_treatment - var_control)
    • plot BD_shift ~ delta_var
      • where BD_shift = delta mode (control vs treatment) in euclidean space

Control taxa all same GC

Making communities


In [718]:
%%R
n.taxa = 500
n.fracs = 500

In [719]:
%%R 
set.seed(2)
M <- n.taxa                                 # number of species
ming <- 1.60                                # gradient minimum...
maxg <- 1.80                                # ...and maximum
meang = mean(c(ming, maxg))
locs <- seq(ming, maxg, length = n.fracs)   # gradient locations
opt  <- rnorm(M, mean=meang, sd=0)          # runif(M, min = ming, max = maxg)   # species optima
tol  <- rep(0.025, M)                       # species tolerances
h    <- ceiling(rlnorm(M, meanlog = 12))    # max abundances
pars <- cbind(opt = opt, tol = tol, h = h)  # put in a matrix

In [720]:
%%R -w 600 -h 350
mu1 = coenocline(locs, responseModel = "gaussian", params = pars,
                    countModel = "poisson") #, countParams = list(alpha = 0.5))

matplot(locs, mu1, lty = "solid", type = "l", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Abundance", main='Control community')



In [721]:
%%R
# just using 1st taxon as reference
ref = 1

In [722]:
%%R 
# treatment community
set.seed(2)
M <- n.taxa                                      # number of species
opt.treat = opt + runif(M, min=0, max=0.036)
pars <- cbind(opt = opt.treat, tol = tol, h = h)  # put in a matrix

In [723]:
%%R -w 600 -h 350
mu2 = coenocline(locs, responseModel = "gaussian", params = pars, countModel = "poisson")

mu2[,ref] = mu1[,ref]

matplot(locs, mu2, lty = "solid", type = "l", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Abundance", main='Treatment community')


Convert to relative abundances


In [724]:
%%R -w 600 -h 350
mu1.rel = t(apply(mu1, 1, to.rel))
mu2.rel = t(apply(mu2, 1, to.rel))


matplot(locs, mu1.rel, lty = "solid", type = "l", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Relative abundance", main='Control')
matplot(locs, mu2.rel, lty = "solid", type = "l", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Relative abundance", main='Treatment')


Difference in euclidean mode BD


In [725]:
%%R

mode.BD = function(x, loc){
    loc[which(x == max(x))][1]
}

mu1.mode = apply(mu1, 2, mode.BD, loc=locs)
mu2.mode = apply(mu2, 2, mode.BD, loc=locs)


BD.shift = mu2.mode - mu1.mode
hist(BD.shift, breaks=20)


Calculating Aitchison variance to marker reference


In [726]:
%%R
mu1.rel.acomp = acomp(mu1.rel)
mu2.rel.acomp = acomp(mu2.rel)

mu1.var = variation(mu1.rel.acomp)[ref,]
mu2.var = variation(mu2.rel.acomp)[ref,]

delta.var = mu2.var - mu1.var
hist(delta.var, breaks=20)


Plotting delta_BD ~ delta_var


In [727]:
%%R -w 600 -h 350
df.shift = data.frame('taxon' = 1:ncol(mu1), 'BD.shift' = BD.shift, 'delta.var' = delta.var) %>%
    filter(taxon != ref)

ggplot(df.shift, aes(delta.var, BD.shift)) +
    geom_point() +
    geom_smooth() +
    labs(x='delta Aitchison variance', y='delta BD') +
    theme_bw() +
    theme(
        text = element_text(size=16)
    )


Control taxa w/ varying G+C


In [728]:
%%R
n.taxa = 500
n.fracs = 500

In [729]:
%%R 
set.seed(2)
M <- n.taxa                                 # number of species
ming <- 1.60                                # gradient minimum...
maxg <- 1.80                                # ...and maximum
meang = mean(c(ming, maxg))
locs <- seq(ming, maxg, length = n.fracs)   # gradient locations
opt.cont  <- rnorm(M, mean=meang, sd=0.01)          # runif(M, min = ming, max = maxg)   # species optima
#opt.cont[which(opt.cont == min(opt.cont))[1]] = min(opt.cont) - 0.01
tol  <- rep(0.025, M)                       # species tolerances
h    <- ceiling(rlnorm(M, meanlog = 12))    # max abundances
pars <- cbind(opt = opt.cont, tol = tol, h = h)  # put in a matrix

In [730]:
%%R -w 600 -h 350
mu1 = coenocline(locs, responseModel = "gaussian", params = pars,
                    countModel = "poisson") #, countParams = list(alpha = 0.5))

matplot(locs, mu1, lty = "solid", type = "l", pch = 1:10, cex = 0.8,
        xlab = "BD", ylab = "Abundance", main='Control community')