Goal

Question: how is incorporator identification accuracy affected by the percent isotope incorporation of taxa?

• Using genome dataset created in the "dataset" notebook

• Simulates isotope dilution or short incubations

• Method
• 25% taxa incorporate
• incorporation % same for all incorporators
• incorporation % treatments: 10, 20, 40, 60, 80, 100%
• Total treatments: 6
• ALSO, testing the use of differing BD ranges on sensitivity/specificity

User variables

``````

In [174]:

workDir = '/home/nick/notebook/SIPSim/dev/bac_genome1210/'
genomeDir = '/home/nick/notebook/SIPSim/dev/bac_genome1210/genomes/'
R_dir = '/home/nick/notebook/SIPSim/lib/R/'

``````

Init

``````

In [175]:

import glob
import itertools
from os.path import abspath
import nestly

``````
``````

In [176]:

``````
``````

``````
``````

In [177]:

%%R
library(ggplot2)
library(dplyr)
library(tidyr)
library(gridExtra)

``````

Using nestly: different incorporation percentages

``````

In [178]:

# building tree structure
nest = nestly.Nest()

## varying params
### perc incorporation
### BD range
BD_min = np.arange(1.67, 1.77, 0.02).tolist()
BD_max = [x + 0.04 for x in BD_min]
f = lambda x: {'BD_range': str(x[0]) + '-' + str(x[1]),
'BD_min':x[0],
'BD_max':x[1]}
BD_range = [f(x) for x in itertools.product(BD_min, BD_max)
if x[0] < x[1]]
#### contains BD_min & BD_max

## set params

## input/output files

# building directory tree
buildDir = os.path.join(workDir, 'percIncorpUnif_difBD')
nest.build(buildDir)

``````
``````

In [179]:

bashFile = os.path.join(workDir, 'SIPSimRun.sh')

``````
``````

In [180]:

%%writefile \$bashFile
#!/bin/bash

# copying files from perc_incorp_unif
cp {perc_incorp_dir}{percIncorp}/{otu_file} .
cp {perc_incorp_dir}{percIncorp}/{BD_shift} .

#-- R analysis --#
export PATH={R_dir}:\$PATH
# running DeSeq2 and making confusion matrix on predicting incorporators
## making phyloseq object from OTU table
phyloseq_make.r \
{otu_file} \
> {otu_file}.physeq

## filtering phyloseq object to just taxa/samples of interest
phyloseq_edit.r \
{otu_file}.physeq \
--BD_min {BD_min} --BD_max {BD_max} \
> {otu_file}_filt.physeq
## making ordination
phyloseq_ordination.r \
--log2 {log2} \
--hypo greater \
{otu_file}_filt.physeq \
{otu_file}_bray-NMDS.pdf
## DESeq2
phyloseq_DESeq2.r \
{otu_file}_filt.physeq \
> {otu_file}_DESeq2
## Confusion matrix
DESeq2_confuseMtx.r \
{BD_shift} \
{otu_file}_DESeq2 \

``````
``````

Overwriting /home/nick/notebook/SIPSim/dev/bac_genome1210/SIPSimRun.sh

``````
``````

In [181]:

!chmod 775 \$bashFile

``````
``````

In [ ]:

!cd \$workDir; \
nestrun -j 30 --template-file \$bashFile -d percIncorpUnif_difBD --log-file log.txt

``````
``````

2015-07-17 10:57:08,940 * INFO * Template: ./SIPSimRun.sh
2015-07-17 10:57:08,942 * INFO * [25676] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.67-1.79
2015-07-17 10:57:08,944 * INFO * [25677] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.73-1.77
2015-07-17 10:57:08,947 * INFO * [25680] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.69-1.77
2015-07-17 10:57:08,950 * INFO * [25684] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.71-1.75
2015-07-17 10:57:08,953 * INFO * [25688] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.75-1.81
2015-07-17 10:57:08,955 * INFO * [25693] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.71-1.73
2015-07-17 10:57:08,957 * INFO * [25700] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.73-1.79
2015-07-17 10:57:08,959 * INFO * [25705] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.69-1.71
2015-07-17 10:57:08,961 * INFO * [25719] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.77-1.81
2015-07-17 10:57:08,963 * INFO * [25731] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.69-1.73
2015-07-17 10:57:08,965 * INFO * [25736] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.69-1.79
2015-07-17 10:57:08,967 * INFO * [25749] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.67-1.77
2015-07-17 10:57:08,968 * INFO * [25759] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.75-1.79
2015-07-17 10:57:08,970 * INFO * [25766] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.69-1.75
2015-07-17 10:57:08,971 * INFO * [25773] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.67-1.75
2015-07-17 10:57:08,973 * INFO * [25787] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.75-1.77
2015-07-17 10:57:08,975 * INFO * [25798] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.67-1.81
2015-07-17 10:57:08,977 * INFO * [25804] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.73-1.75
2015-07-17 10:57:08,978 * INFO * [25812] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.71-1.77
2015-07-17 10:57:08,980 * INFO * [25832] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.69-1.81
2015-07-17 10:57:08,982 * INFO * [25843] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.67-1.71
2015-07-17 10:57:08,983 * INFO * [25857] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.73-1.81
2015-07-17 10:57:08,985 * INFO * [25867] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.71-1.81
2015-07-17 10:57:08,987 * INFO * [25876] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.71-1.79
2015-07-17 10:57:08,989 * INFO * [25888] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.67-1.73
2015-07-17 10:57:08,991 * INFO * [25899] Started ./SIPSimRun.sh in percIncorpUnif_difBD/80/1.77-1.79
2015-07-17 10:57:08,993 * INFO * [25909] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.67-1.79
2015-07-17 10:57:08,995 * INFO * [25917] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.73-1.77
2015-07-17 10:57:08,997 * INFO * [25924] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.69-1.77
2015-07-17 10:57:08,999 * INFO * [25934] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.71-1.75
2015-07-17 10:58:09,722 * INFO * [25684] percIncorpUnif_difBD/80/1.71-1.75 Finished with 0
2015-07-17 10:58:09,724 * INFO * [27662] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.75-1.81
2015-07-17 10:58:11,421 * INFO * [25705] percIncorpUnif_difBD/80/1.69-1.71 Finished with 0
2015-07-17 10:58:11,423 * INFO * [27730] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.71-1.73
2015-07-17 10:58:12,250 * INFO * [25804] percIncorpUnif_difBD/80/1.73-1.75 Finished with 0
2015-07-17 10:58:12,252 * INFO * [27742] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.73-1.79
2015-07-17 10:58:14,306 * INFO * [25731] percIncorpUnif_difBD/80/1.69-1.73 Finished with 0
2015-07-17 10:58:14,308 * INFO * [27779] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.69-1.71
2015-07-17 10:58:15,965 * INFO * [25876] percIncorpUnif_difBD/80/1.71-1.79 Finished with 0
2015-07-17 10:58:15,967 * INFO * [27821] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.77-1.81
2015-07-17 10:58:16,811 * INFO * [25867] percIncorpUnif_difBD/80/1.71-1.81 Finished with 0
2015-07-17 10:58:16,813 * INFO * [27848] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.69-1.73
2015-07-17 10:58:17,197 * INFO * [25749] percIncorpUnif_difBD/80/1.67-1.77 Finished with 0
2015-07-17 10:58:17,199 * INFO * [27860] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.69-1.79
2015-07-17 10:58:17,552 * INFO * [25736] percIncorpUnif_difBD/80/1.69-1.79 Finished with 0
2015-07-17 10:58:17,554 * INFO * [27872] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.67-1.77
2015-07-17 10:58:17,718 * INFO * [25677] percIncorpUnif_difBD/80/1.73-1.77 Finished with 0
2015-07-17 10:58:17,721 * INFO * [27884] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.75-1.79
2015-07-17 10:58:18,102 * INFO * [25843] percIncorpUnif_difBD/80/1.67-1.71 Finished with 0
2015-07-17 10:58:18,105 * INFO * [27896] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.69-1.75
2015-07-17 10:58:18,529 * INFO * [25787] percIncorpUnif_difBD/80/1.75-1.77 Finished with 0
2015-07-17 10:58:18,532 * INFO * [27914] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.67-1.75
2015-07-17 10:58:18,971 * INFO * [25700] percIncorpUnif_difBD/80/1.73-1.79 Finished with 0
2015-07-17 10:58:18,973 * INFO * [27932] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.75-1.77
2015-07-17 10:58:19,043 * INFO * [25759] percIncorpUnif_difBD/80/1.75-1.79 Finished with 0
2015-07-17 10:58:19,046 * INFO * [27944] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.67-1.81
2015-07-17 10:58:20,212 * INFO * [25719] percIncorpUnif_difBD/80/1.77-1.81 Finished with 0
2015-07-17 10:58:20,215 * INFO * [27986] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.73-1.75
2015-07-17 10:58:21,556 * INFO * [25899] percIncorpUnif_difBD/80/1.77-1.79 Finished with 0
2015-07-17 10:58:21,558 * INFO * [28016] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.71-1.77
2015-07-17 10:58:21,736 * INFO * [25832] percIncorpUnif_difBD/80/1.69-1.81 Finished with 0
2015-07-17 10:58:21,738 * INFO * [28040] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.69-1.81
2015-07-17 10:58:21,858 * INFO * [25693] percIncorpUnif_difBD/80/1.71-1.73 Finished with 0
2015-07-17 10:58:21,861 * INFO * [28052] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.67-1.71
2015-07-17 10:58:22,253 * INFO * [25676] percIncorpUnif_difBD/80/1.67-1.79 Finished with 0
2015-07-17 10:58:22,256 * INFO * [28064] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.73-1.81
2015-07-17 10:58:22,358 * INFO * [25934] percIncorpUnif_difBD/20/1.71-1.75 Finished with 0
2015-07-17 10:58:22,360 * INFO * [28077] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.71-1.81
2015-07-17 10:58:22,666 * INFO * [25917] percIncorpUnif_difBD/20/1.73-1.77 Finished with 0
2015-07-17 10:58:22,668 * INFO * [28100] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.71-1.79
2015-07-17 10:58:23,436 * INFO * [25766] percIncorpUnif_difBD/80/1.69-1.75 Finished with 0
2015-07-17 10:58:23,438 * INFO * [28128] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.67-1.73
2015-07-17 10:58:23,584 * INFO * [25909] percIncorpUnif_difBD/20/1.67-1.79 Finished with 0
2015-07-17 10:58:23,586 * INFO * [28140] Started ./SIPSimRun.sh in percIncorpUnif_difBD/20/1.77-1.79
2015-07-17 10:58:23,933 * INFO * [25680] percIncorpUnif_difBD/80/1.69-1.77 Finished with 0
2015-07-17 10:58:23,936 * INFO * [28158] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.67-1.79
2015-07-17 10:58:24,245 * INFO * [25773] percIncorpUnif_difBD/80/1.67-1.75 Finished with 0
2015-07-17 10:58:24,248 * INFO * [28176] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.73-1.77
2015-07-17 10:58:24,803 * INFO * [25688] percIncorpUnif_difBD/80/1.75-1.81 Finished with 0
2015-07-17 10:58:24,806 * INFO * [28194] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.69-1.77
2015-07-17 10:58:25,441 * INFO * [25888] percIncorpUnif_difBD/80/1.67-1.73 Finished with 0
2015-07-17 10:58:25,443 * INFO * [28218] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.71-1.75
2015-07-17 10:58:25,805 * INFO * [25924] percIncorpUnif_difBD/20/1.69-1.77 Finished with 0
2015-07-17 10:58:25,808 * INFO * [28230] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.75-1.81
2015-07-17 10:58:26,868 * INFO * [25857] percIncorpUnif_difBD/80/1.73-1.81 Finished with 0
2015-07-17 10:58:26,870 * INFO * [28254] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.71-1.73
2015-07-17 10:58:27,023 * INFO * [25812] percIncorpUnif_difBD/80/1.71-1.77 Finished with 0
2015-07-17 10:58:27,026 * INFO * [28266] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.73-1.79
2015-07-17 10:58:28,784 * INFO * [25798] percIncorpUnif_difBD/80/1.67-1.81 Finished with 0
2015-07-17 10:58:28,787 * INFO * [28306] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.69-1.71
2015-07-17 10:59:17,725 * INFO * [27742] percIncorpUnif_difBD/20/1.73-1.79 Finished with 0
2015-07-17 10:59:17,728 * INFO * [29794] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.77-1.81
2015-07-17 10:59:18,590 * INFO * [27662] percIncorpUnif_difBD/20/1.75-1.81 Finished with 0
2015-07-17 10:59:18,592 * INFO * [29840] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.69-1.73
2015-07-17 10:59:21,659 * INFO * [27986] percIncorpUnif_difBD/20/1.73-1.75 Finished with 0
2015-07-17 10:59:21,662 * INFO * [29939] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.69-1.79
2015-07-17 10:59:22,053 * INFO * [27730] percIncorpUnif_difBD/20/1.71-1.73 Finished with 0
2015-07-17 10:59:22,056 * INFO * [29958] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.67-1.77
2015-07-17 10:59:23,154 * INFO * [27779] percIncorpUnif_difBD/20/1.69-1.71 Finished with 0
2015-07-17 10:59:23,157 * INFO * [29996] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.75-1.79
2015-07-17 10:59:24,902 * INFO * [27884] percIncorpUnif_difBD/20/1.75-1.79 Finished with 0
2015-07-17 10:59:24,905 * INFO * [30025] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.69-1.75
2015-07-17 10:59:27,007 * INFO * [27821] percIncorpUnif_difBD/20/1.77-1.81 Finished with 0
2015-07-17 10:59:27,010 * INFO * [30065] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.67-1.75
2015-07-17 10:59:27,866 * INFO * [27914] percIncorpUnif_difBD/20/1.67-1.75 Finished with 0
2015-07-17 10:59:27,869 * INFO * [30089] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.75-1.77
2015-07-17 10:59:28,085 * INFO * [27944] percIncorpUnif_difBD/20/1.67-1.81 Finished with 0
2015-07-17 10:59:28,087 * INFO * [30101] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.67-1.81
2015-07-17 10:59:28,618 * INFO * [27848] percIncorpUnif_difBD/20/1.69-1.73 Finished with 0
2015-07-17 10:59:28,621 * INFO * [30113] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.73-1.75
2015-07-17 10:59:29,386 * INFO * [27860] percIncorpUnif_difBD/20/1.69-1.79 Finished with 0
2015-07-17 10:59:29,389 * INFO * [30125] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.71-1.77
2015-07-17 10:59:29,653 * INFO * [28040] percIncorpUnif_difBD/20/1.69-1.81 Finished with 0
2015-07-17 10:59:29,656 * INFO * [30149] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.69-1.81
2015-07-17 10:59:31,115 * INFO * [28128] percIncorpUnif_difBD/20/1.67-1.73 Finished with 0
2015-07-17 10:59:31,117 * INFO * [30197] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.67-1.71
2015-07-17 10:59:32,030 * INFO * [27872] percIncorpUnif_difBD/20/1.67-1.77 Finished with 0
2015-07-17 10:59:32,033 * INFO * [30215] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.73-1.81
2015-07-17 10:59:32,065 * INFO * [27932] percIncorpUnif_difBD/20/1.75-1.77 Finished with 0
2015-07-17 10:59:32,067 * INFO * [30227] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.71-1.81
2015-07-17 10:59:32,322 * INFO * [28140] percIncorpUnif_difBD/20/1.77-1.79 Finished with 0
2015-07-17 10:59:32,323 * INFO * [30248] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.71-1.79
2015-07-17 10:59:33,124 * INFO * [28052] percIncorpUnif_difBD/20/1.67-1.71 Finished with 0
2015-07-17 10:59:33,126 * INFO * [30275] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.67-1.73
2015-07-17 10:59:33,562 * INFO * [28176] percIncorpUnif_difBD/0/1.73-1.77 Finished with 0
2015-07-17 10:59:33,564 * INFO * [30299] Started ./SIPSimRun.sh in percIncorpUnif_difBD/0/1.77-1.79
2015-07-17 10:59:34,135 * INFO * [28077] percIncorpUnif_difBD/20/1.71-1.81 Finished with 0
2015-07-17 10:59:34,138 * INFO * [30317] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.67-1.79
2015-07-17 10:59:34,824 * INFO * [28016] percIncorpUnif_difBD/20/1.71-1.77 Finished with 0
2015-07-17 10:59:34,827 * INFO * [30329] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.73-1.77
2015-07-17 10:59:35,244 * INFO * [27896] percIncorpUnif_difBD/20/1.69-1.75 Finished with 0
2015-07-17 10:59:35,247 * INFO * [30363] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.69-1.77
2015-07-17 10:59:35,479 * INFO * [28254] percIncorpUnif_difBD/0/1.71-1.73 Finished with 0
2015-07-17 10:59:35,481 * INFO * [30388] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.71-1.75
2015-07-17 10:59:35,513 * INFO * [28230] percIncorpUnif_difBD/0/1.75-1.81 Finished with 0
2015-07-17 10:59:35,515 * INFO * [30399] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.75-1.81
2015-07-17 10:59:36,128 * INFO * [28218] percIncorpUnif_difBD/0/1.71-1.75 Finished with 0
2015-07-17 10:59:36,131 * INFO * [30423] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.71-1.73
2015-07-17 10:59:37,588 * INFO * [28306] percIncorpUnif_difBD/0/1.69-1.71 Finished with 0
2015-07-17 10:59:37,591 * INFO * [30451] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.73-1.79
2015-07-17 10:59:38,038 * INFO * [28194] percIncorpUnif_difBD/0/1.69-1.77 Finished with 0
2015-07-17 10:59:38,041 * INFO * [30463] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.69-1.71
2015-07-17 10:59:38,155 * INFO * [28064] percIncorpUnif_difBD/20/1.73-1.81 Finished with 0
2015-07-17 10:59:38,158 * INFO * [30475] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.77-1.81
2015-07-17 10:59:38,254 * INFO * [28100] percIncorpUnif_difBD/20/1.71-1.79 Finished with 0
2015-07-17 10:59:38,256 * INFO * [30490] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.69-1.73
2015-07-17 10:59:38,768 * INFO * [28158] percIncorpUnif_difBD/0/1.67-1.79 Finished with 0
2015-07-17 10:59:38,770 * INFO * [30515] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.69-1.79
2015-07-17 10:59:41,071 * INFO * [28266] percIncorpUnif_difBD/0/1.73-1.79 Finished with 0
2015-07-17 10:59:41,074 * INFO * [30573] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.67-1.77
2015-07-17 11:00:25,232 * INFO * [29794] percIncorpUnif_difBD/0/1.77-1.81 Finished with 0
2015-07-17 11:00:25,234 * INFO * [31991] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.75-1.79
2015-07-17 11:00:30,199 * INFO * [29840] percIncorpUnif_difBD/0/1.69-1.73 Finished with 0
2015-07-17 11:00:30,200 * INFO * [32131] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.69-1.75
2015-07-17 11:00:31,759 * INFO * [30089] percIncorpUnif_difBD/0/1.75-1.77 Finished with 0
2015-07-17 11:00:31,761 * INFO * [32163] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.67-1.75
2015-07-17 11:00:35,291 * INFO * [29939] percIncorpUnif_difBD/0/1.69-1.79 Finished with 0
2015-07-17 11:00:35,294 * INFO * [32213] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.75-1.77
2015-07-17 11:00:37,672 * INFO * [30227] percIncorpUnif_difBD/0/1.71-1.81 Finished with 0
2015-07-17 11:00:37,674 * INFO * [32283] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.67-1.81
2015-07-17 11:00:38,700 * INFO * [30101] percIncorpUnif_difBD/0/1.67-1.81 Finished with 0
2015-07-17 11:00:38,703 * INFO * [32313] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.73-1.75
2015-07-17 11:00:38,955 * INFO * [29996] percIncorpUnif_difBD/0/1.75-1.79 Finished with 0
2015-07-17 11:00:38,958 * INFO * [32331] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.71-1.77
2015-07-17 11:00:39,257 * INFO * [30388] percIncorpUnif_difBD/60/1.71-1.75 Finished with 0
2015-07-17 11:00:39,260 * INFO * [32353] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.69-1.81
2015-07-17 11:00:40,154 * INFO * [30025] percIncorpUnif_difBD/0/1.69-1.75 Finished with 0
2015-07-17 11:00:40,156 * INFO * [32380] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.67-1.71
2015-07-17 11:00:40,604 * INFO * [30197] percIncorpUnif_difBD/0/1.67-1.71 Finished with 0
2015-07-17 11:00:40,607 * INFO * [32407] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.73-1.81
2015-07-17 11:00:40,654 * INFO * [30125] percIncorpUnif_difBD/0/1.71-1.77 Finished with 0
2015-07-17 11:00:40,656 * INFO * [32419] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.71-1.81
2015-07-17 11:00:41,211 * INFO * [30065] percIncorpUnif_difBD/0/1.67-1.75 Finished with 0
2015-07-17 11:00:41,213 * INFO * [32459] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.71-1.79
2015-07-17 11:00:41,352 * INFO * [30113] percIncorpUnif_difBD/0/1.73-1.75 Finished with 0
2015-07-17 11:00:41,354 * INFO * [32471] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.67-1.73
2015-07-17 11:00:42,482 * INFO * [29958] percIncorpUnif_difBD/0/1.67-1.77 Finished with 0
2015-07-17 11:00:42,484 * INFO * [32495] Started ./SIPSimRun.sh in percIncorpUnif_difBD/60/1.77-1.79
2015-07-17 11:00:42,561 * INFO * [30149] percIncorpUnif_difBD/0/1.69-1.81 Finished with 0
2015-07-17 11:00:42,563 * INFO * [32507] Started ./SIPSimRun.sh in percIncorpUnif_difBD/100/1.67-1.79
2015-07-17 11:00:42,815 * INFO * [30329] percIncorpUnif_difBD/60/1.73-1.77 Finished with 0
2015-07-17 11:00:42,818 * INFO * [32519] Started ./SIPSimRun.sh in percIncorpUnif_difBD/100/1.73-1.77
2015-07-17 11:00:43,218 * INFO * [30215] percIncorpUnif_difBD/0/1.73-1.81 Finished with 0
2015-07-17 11:00:43,221 * INFO * [32531] Started ./SIPSimRun.sh in percIncorpUnif_difBD/100/1.69-1.77
2015-07-17 11:00:43,762 * INFO * [30299] percIncorpUnif_difBD/0/1.77-1.79 Finished with 0
2015-07-17 11:00:43,766 * INFO * [32549] Started ./SIPSimRun.sh in percIncorpUnif_difBD/100/1.71-1.75
2015-07-17 11:00:44,124 * INFO * [30463] percIncorpUnif_difBD/60/1.69-1.71 Finished with 0
2015-07-17 11:00:44,127 * INFO * [32561] Started ./SIPSimRun.sh in percIncorpUnif_difBD/100/1.75-1.81
2015-07-17 11:00:44,468 * INFO * [30275] percIncorpUnif_difBD/0/1.67-1.73 Finished with 0
2015-07-17 11:00:44,470 * INFO * [32576] Started ./SIPSimRun.sh in percIncorpUnif_difBD/100/1.71-1.73
2015-07-17 11:00:44,995 * INFO * [30423] percIncorpUnif_difBD/60/1.71-1.73 Finished with 0
2015-07-17 11:00:44,998 * INFO * [32591] Started ./SIPSimRun.sh in percIncorpUnif_difBD/100/1.73-1.79

``````
``````

In [ ]:

# aggregating confusion matrix data
## table
!cd \$workDir; \
nestagg delim \
-d percIncorpUnif_difBD \
-k percIncorp,BD_min,BD_max \
-o ./percIncorpUnif_difBD/DESeq2-cMtx_table.csv \
DESeq2-cMtx_table.csv

## overall
!cd \$workDir; \
nestagg delim \
-d percIncorpUnif_difBD \
-k percIncorp,BD_min,BD_max \
-o ./percIncorpUnif_difBD/DESeq2-cMtx_overall.csv \
DESeq2-cMtx_overall.csv

## byClass
!cd \$workDir; \
nestagg delim \
-d percIncorpUnif_difBD \
-k percIncorp,BD_min,BD_max \
-o ./percIncorpUnif_difBD/DESeq2-cMtx_byClass.csv \
DESeq2-cMtx_byClass.csv

``````

Plotting results

``````

In [ ]:

%%R -i workDir -w 600 -h 600
setwd(workDir)

byClass\$byClass[is.na(byClass\$byClass)] = 0

cat.str = function(x,y, col=':'){
x = as.character(x)
y = as.character(y)
z = paste(c(x,y),collapse=col)
return(z)
}

byClass = byClass %>%
mutate(BD_range = mapply(cat.str, BD_min, BD_max)) %>%
mutate(BD_min = as.character(BD_min),
BD_max = as.character(BD_max))

``````
``````

In [ ]:

%%R -w 800 -h 750

col2keep = c('Balanced Accuracy', 'Sensitivity','Specificity')
byClass.f = byClass %>%
filter(X %in% col2keep) %>%
mutate(BD_min = as.numeric(BD_min),
BD_max = as.numeric(BD_max),
byClass = as.numeric(byClass),
byClass.inv = 1 - byClass)

byClass.fs = byClass.f %>%
group_by(X, percIncorp) %>%
summarize(byClass.max = max(byClass))

just.true = function(x){
if(x == TRUE){
return(1)
} else{
return(NA)
}
}

byClass.j = inner_join(byClass.f, byClass.fs, c('X' = 'X',
'percIncorp' = 'percIncorp')) %>%
mutate(max_val = as.numeric(byClass == byClass.max))
byClass.jf = byClass.j %>%
filter(max_val == 1)

x.breaks = unique(byClass.j\$BD_min)

p = ggplot(byClass.j, aes(BD_min, BD_max, fill=byClass.inv)) +
geom_tile() +
geom_point(data=byClass.jf, aes(color='red')) +
scale_x_continuous(breaks=x.breaks) +
labs(x='Minimum BD', y='Maximum BD') +
facet_grid(percIncorp ~ X) +
theme_bw() +
theme(
text=element_text(size=16),
axis.text.x=element_text(angle=90, hjust=1, vjust=0.5)
)
p

``````
``````

In [ ]:

%%R -w 800 -h 750

col2keep = c('Balanced Accuracy', 'Sensitivity','Specificity')
byClass.f = byClass %>%
filter(X %in% col2keep) %>%
mutate(BD_min = as.numeric(BD_min),
BD_max = as.numeric(BD_max),
byClass = as.numeric(byClass),
byClass.inv = 1 - byClass)

byClass.fs = byClass.f %>%
group_by(X, percIncorp) %>%
summarize(byClass.max = max(byClass))

just.true = function(x){
if(x == TRUE){
return(1)
} else{
return(NA)
}
}

byClass.j = inner_join(byClass.f, byClass.fs, c('X' = 'X',
'percIncorp' = 'percIncorp')) %>%
mutate(max_val = as.numeric(byClass == byClass.max),
byClass.txt = round(byClass, 2))
byClass.jf = byClass.j %>%
filter(max_val == 1)

x.breaks = unique(byClass.j\$BD_min)

p = ggplot(byClass.j, aes(BD_min, BD_max, fill=byClass.inv)) +
geom_tile() +
geom_text(data=byClass.jf, aes(label=byClass.txt), color=c('white')) +
scale_x_continuous(breaks=x.breaks) +
labs(x='Minimum Buoyant Density', y='Maximum Buoyant Density') +
facet_grid(percIncorp ~ X) +
theme_bw() +
theme(
text=element_text(size=16),
axis.text.x=element_text(angle=90, hjust=1, vjust=0.5)
)
p

``````
``````

In [ ]:

%%R -w 800 -h 750

col2keep = c('Balanced Accuracy', 'Sensitivity','Specificity')
byClass.f = byClass %>%
filter(X %in% col2keep) %>%
mutate(BD_min = as.numeric(BD_min),
BD_max = as.numeric(BD_max),
byClass = as.numeric(byClass),
byClass.inv = 1 - byClass)

byClass.fs = byClass.f %>%
group_by(X, percIncorp) %>%
summarize(byClass.max = max(byClass))

just.true = function(x){
if(x == TRUE){
return(1)
} else{
return(NA)
}
}

byClass.j = inner_join(byClass.f, byClass.fs, c('X' = 'X',
'percIncorp' = 'percIncorp')) %>%
mutate(max_val = as.numeric(byClass == byClass.max),
byClass.txt = round(byClass, 2))
byClass.jf = byClass.j %>%
filter(max_val == 1)

x.breaks = unique(byClass.j\$BD_min)

p = ggplot(byClass.j, aes(BD_min, BD_max, fill=byClass.inv)) +
geom_tile() +
geom_text(aes(label=byClass.txt), color=c('white'), size=4) +
geom_text(data=byClass.jf, aes(label=byClass.txt), color=c('red'), size=4) +
scale_x_continuous(breaks=x.breaks) +
labs(x='Minimum Buoyant Density', y='Maximum Buoyant Density') +
facet_grid(percIncorp ~ X) +
theme_bw() +
theme(
text=element_text(size=16),
axis.text.x=element_text(angle=90, hjust=1, vjust=0.5)
)
p

``````
``````

In [ ]:

``````

Sandbox

Enrichment of TP for abundant incorporators?

• What is the abundance distribution of TP and FP?
• Are more abundant incorporators being detected more than low abundant taxa
``````

In [200]:

%%R -i workDir

setwd(workDir)

``````
``````

In [201]:

%%R
# OTU total counts
tbl.otu.sum = tbl.otu %>%
group_by(library, taxon) %>%
summarize(total_count = sum(count))

``````
``````

Source: local data frame [6 x 3]
Groups: library

library                                taxon total_count
1       1       Acaryochloris_marina_MBIC11017           0
2       1       Acetobacterium_woodii_DSM_1030           0
3       1 Acetobacter_pasteurianus_IFO_3283-12         751
4       1    Acetohalobium_arabaticum_DSM_5501           9
5       1         Acholeplasma_laidlawii_PG-8A         238
6       1        Achromobacter_xylosoxidans_A8          17

``````
``````

In [202]:

%%R
#
label.tp.fn = function(known, pred){
if(is.na(known) | is.na(pred)){
return(NA)
} else
if(known==TRUE & pred==TRUE){
return('TP')
} else
if(known==TRUE & pred==FALSE){
return('FN')
} else {
return(NA)
}
}

tbl.c.tp.fn = tbl.c %>%
mutate(tp.fn = mapply(label.tp.fn, incorp.known, incorp.pred)) %>%
filter(! is.na(tp.fn))

tbl.tp.fn = inner_join(tbl.c.tp.fn, tbl.otu.sum, c('taxon' = 'taxon'))

``````
``````

lib1 lib2                              taxon  BD_shift incorp.known  baseMean
1   NA    2 Actinobacillus_equuli_subsp_equuli 0.9999269         TRUE 1.8646755
2   NA    2 Actinobacillus_equuli_subsp_equuli 0.9999269         TRUE 1.8646755
3   NA    2 Candidatus_Moranella_endobia_PCVAL 0.9997449         TRUE 0.4688403
4   NA    2 Candidatus_Moranella_endobia_PCVAL 0.9997449         TRUE 0.4688403
5   NA    2     Alkaliphilus_oremlandii_OhILAs 0.9997434         TRUE 0.3118965
6   NA    2     Alkaliphilus_oremlandii_OhILAs 0.9997434         TRUE 0.3118965
log2FoldChange    lfcSE       stat       pvalue       padj            p
1     5.37154211 1.267839 4.23676891 2.267593e-05 0.00111679 2.677315e-05
2     5.37154211 1.267839 4.23676891 2.267593e-05 0.00111679 2.677315e-05
3     0.04533956 1.501490 0.03019637 9.759104e-01 1.00000000 5.542099e-01
4     0.04533956 1.501490 0.03019637 9.759104e-01 1.00000000 5.542099e-01
5     5.33532814 1.665671 3.20311065 1.359517e-03 0.02434772 1.132750e-03
6     5.33532814 1.665671 3.20311065 1.359517e-03 0.02434772 1.132750e-03
1 0.001318578        TRUE    TP       1        1728
2 0.001318578        TRUE    TP       2        1554
3 1.000000000       FALSE    FN       1         371
4 1.000000000       FALSE    FN       2        1981
5 0.022315185        TRUE    TP       1         183
6 0.022315185        TRUE    TP       2         247

``````
``````

In [203]:

%%R
# how many TP & FN?
tbl.tp.fn %>%
group_by(library, tp.fn) %>%
summarize(n = n())

``````
``````

Source: local data frame [4 x 3]
Groups: library

library tp.fn  n
1       1    FN 63
2       1    TP  9
3       2    FN 63
4       2    TP  9

``````
``````

In [204]:

%%R -h 350
tbl.tp.fn\$library = as.character(tbl.tp.fn\$library)
ggplot(tbl.tp.fn, aes(library, total_count, color=tp.fn)) +
geom_boxplot() +
labs(y='Total count') +
theme(
text = element_text(size=18)
)

``````
``````

``````
``````

In [209]:

%%R -h 700 -w 900
tbl.tp.fn\$library = as.character(tbl.tp.fn\$library)
ggplot(tbl.tp.fn, aes(taxon, total_count, group=taxon, color=library)) +
geom_point(size=3) +
geom_line() +
scale_y_log10() +
labs(y='Total count') +
facet_grid(. ~ tp.fn, scales='free_x') +
theme(
text = element_text(size=18),
axis.text.x = element_text(angle=90, hjust=1)
)

``````
``````

``````

Notes:

• FNs can be abundant
• TP can have a lower abundance in the 'treatment' library

Just looking at the 'heavy' fractions

``````

In [210]:

%%R
# OTU total counts

heavy.cut = 1.71

tbl.otu.sum = tbl.otu %>%
filter(! grepl('inf', fraction)) %>%
separate(fraction, into = c('BD_min','BD_max'), sep='-', convert=TRUE) %>%
filter(BD_min >= heavy.cut & BD_max <= 2) %>%
group_by(library, taxon) %>%
summarize(total_count = sum(count))

``````
``````

Source: local data frame [6 x 3]
Groups: library

library                                taxon total_count
1       1       Acaryochloris_marina_MBIC11017           0
2       1       Acetobacterium_woodii_DSM_1030           0
3       1 Acetobacter_pasteurianus_IFO_3283-12         564
4       1    Acetohalobium_arabaticum_DSM_5501           0
5       1         Acholeplasma_laidlawii_PG-8A           0
6       1        Achromobacter_xylosoxidans_A8          17

``````
``````

In [211]:

%%R

tbl.tp.fn = inner_join(tbl.c.tp.fn, tbl.otu.sum, c('taxon' = 'taxon'))

``````
``````

lib1 lib2                              taxon  BD_shift incorp.known  baseMean
1   NA    2 Actinobacillus_equuli_subsp_equuli 0.9999269         TRUE 1.8646755
2   NA    2 Actinobacillus_equuli_subsp_equuli 0.9999269         TRUE 1.8646755
3   NA    2 Candidatus_Moranella_endobia_PCVAL 0.9997449         TRUE 0.4688403
4   NA    2 Candidatus_Moranella_endobia_PCVAL 0.9997449         TRUE 0.4688403
5   NA    2     Alkaliphilus_oremlandii_OhILAs 0.9997434         TRUE 0.3118965
6   NA    2     Alkaliphilus_oremlandii_OhILAs 0.9997434         TRUE 0.3118965
log2FoldChange    lfcSE       stat       pvalue       padj            p
1     5.37154211 1.267839 4.23676891 2.267593e-05 0.00111679 2.677315e-05
2     5.37154211 1.267839 4.23676891 2.267593e-05 0.00111679 2.677315e-05
3     0.04533956 1.501490 0.03019637 9.759104e-01 1.00000000 5.542099e-01
4     0.04533956 1.501490 0.03019637 9.759104e-01 1.00000000 5.542099e-01
5     5.33532814 1.665671 3.20311065 1.359517e-03 0.02434772 1.132750e-03
6     5.33532814 1.665671 3.20311065 1.359517e-03 0.02434772 1.132750e-03
1 0.001318578        TRUE    TP       1        1383
2 0.001318578        TRUE    TP       2        1554
3 1.000000000       FALSE    FN       1          30
4 1.000000000       FALSE    FN       2        1981
5 0.022315185        TRUE    TP       1           0
6 0.022315185        TRUE    TP       2         246

``````
``````

In [212]:

%%R -h 350
tbl.tp.fn\$library = as.character(tbl.tp.fn\$library)
ggplot(tbl.tp.fn, aes(library, total_count, color=tp.fn)) +
geom_boxplot() +
labs(y='Total count') +
theme(
text = element_text(size=18)
)

``````
``````

``````
``````

In [215]:

%%R -h 700 -w 900
tbl.tp.fn\$library = as.character(tbl.tp.fn\$library)
ggplot(tbl.tp.fn, aes(taxon, total_count, group=taxon, color=library)) +
geom_point(size=3) +
geom_line() +
scale_y_log10() +
labs(y='Total count') +
facet_grid(. ~ tp.fn, scales='free_x') +
theme(
text = element_text(size=18),
axis.text.x = element_text(angle=90, hjust=1)
)

``````
``````

``````
``````

In [ ]:

%%R -i workDir -w 1000 -h 450
setwd(workDir)

tbl.otu = tbl.otu %>%
filter(!grepl('inf', fraction, ignore.case=T)) %>%
separate(fraction, into = c('BD_min','BD_max'), sep='-', convert=TRUE) %>%

tbl.j = inner_join(tbl.otu, tbl.ds, c('taxon' = 'taxon'))

``````

Notes:

• The TPs (for the most part) are dramatically different in abundance between control and treatment

Sandbox

``````

In [ ]:

``````
``````

In [73]:

# building tree structure
from os.path import abspath
nest = nestly.Nest()

##  values
vals = [str(x) for x in range(1,5)]

## input files

buildDir = '/home/nick/t/nestly/' #os.path.join(workDir, 'vals')
nest.build(buildDir)

``````
``````

In [74]:

%%writefile /home/nick/t/example.sh
#!/bin/bash

export TIME='elapsed,maxmem,exitstatus\n%e,%M,%x'

echo {--np} > {--np}_test.txt

``````
``````

Writing /home/nick/t/example.sh

``````
``````

In [75]:

!cd /home/nick/t/; \
chmod 777 example.sh

``````
``````

In [76]:

!cd /home/nick/t/; \
nestrun -j 2 --template-file example.sh -d nestly

``````
``````

2015-06-24 12:13:25,688 * INFO * Template: ./example.sh
2015-06-24 12:13:25,690 * INFO * [204263] Started ./example.sh in nestly/3
2015-06-24 12:13:25,692 * INFO * [204264] Started ./example.sh in nestly/2
2015-06-24 12:13:25,692 * INFO * [204263] nestly/3 Finished with 0
2015-06-24 12:13:25,694 * INFO * [204265] Started ./example.sh in nestly/1
2015-06-24 12:13:25,694 * INFO * [204264] nestly/2 Finished with 0
2015-06-24 12:13:25,696 * INFO * [204266] Started ./example.sh in nestly/4
2015-06-24 12:13:25,696 * INFO * [204265] nestly/1 Finished with 0
2015-06-24 12:13:25,696 * INFO * [204266] nestly/4 Finished with 0

``````
``````

In [ ]:

``````
``````

In [ ]:

``````