Description:

  • dev dataset
  • just using 10 random bacteria genomes
  • dataset copied from wPDF_py
  • re-coding how diffusion is modeled
    • modeling diffusion as a function of fragment length, but altering how it is implemented for speed

Workflow:

  • Fragment GC distributions
  • Community abundance and incorporation
    • Simulate communities for each gradient fraction
  • Simulate isotope incorp
  • Creating OTU tables

In [1]:
workDir = '/home/nick/notebook/SIPSim/dev/bac_genome10/'
SIPSimExe = '/home/nick/notebook/SIPSim/SIPSim'

In [2]:
import os
import sys
import numpy as np
import pandas as pd
import subprocess

In [3]:
%load_ext rpy2.ipython
%pylab inline


Populating the interactive namespace from numpy and matplotlib

In [4]:
%%R
library(ggplot2)
library(dplyr)
library(tidyr)


Attaching package: ‘dplyr’

The following object is masked from ‘package:stats’:

    filter

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Indexing genomes

  • needed for in silico PCR to simulate amplicon-fragments

In [8]:
! cd $workDir; \
    $SIPSimExe indexGenomes genomes/genomes10.txt \
    --fp ./genomes/ --np 10 > index_log.txt


Indexing: "Roseobacter_litoralis_Och_149"
Indexing: "Idiomarina_loihiensis_GSL_199"
Indexing: "Bacillus_cereus_F837_76"
Indexing: "Micrococcus_luteus_NCTC_2665"
Indexing: "Geobacter_lovleyi_SZ"
Indexing: "Cyanobium_gracile_PCC_6307"
Indexing: "Leuconostoc_citreum_KM20"
Indexing: "Escherichia_coli_APEC_O78"
Indexing: "Xanthomonas_axonopodis_Xac29-1"
Indexing: "Nitrosomonas_europaea_ATCC_19718"

Simulating fragments and calculating G+C


In [285]:
%%bash -s "$workDir" "$SIPSimExe"
# amplicon fragments

cd $1

$2 fragGC genomes/genomes10.txt \
    --fp ./genomes/ --fr 515Fm-927Rm.fna \
    --np 10 > genome10_ampFragGC.txt


Processing: "Roseobacter_litoralis_Och_149"
Processing: "Idiomarina_loihiensis_GSL_199"
Processing: "Bacillus_cereus_F837_76"
Processing: "Micrococcus_luteus_NCTC_2665"
Processing: "Geobacter_lovleyi_SZ"
Processing: "Cyanobium_gracile_PCC_6307"
Processing: "Leuconostoc_citreum_KM20"
Processing: "Escherichia_coli_APEC_O78"
Processing: "Xanthomonas_axonopodis_Xac29-1"
Processing: "Nitrosomonas_europaea_ATCC_19718"

In [286]:
%%bash -s "$workDir" "$SIPSimExe"
# shotgun fragments

cd $1

$2 fragGC genomes/genomes10.txt \
    --fp ./genomes/ --np 10 \
    > genome10_shotFragGC.txt


Processing: "Roseobacter_litoralis_Och_149"
Processing: "Idiomarina_loihiensis_GSL_199"
Processing: "Bacillus_cereus_F837_76"
Processing: "Micrococcus_luteus_NCTC_2665"
Processing: "Geobacter_lovleyi_SZ"
Processing: "Cyanobium_gracile_PCC_6307"
Processing: "Leuconostoc_citreum_KM20"
Processing: "Escherichia_coli_APEC_O78"
Processing: "Xanthomonas_axonopodis_Xac29-1"
Processing: "Nitrosomonas_europaea_ATCC_19718"

Simulating gradient communities


In [287]:
%%bash -s "$workDir" "$SIPSimExe"

cd $1

$2 gradientComms genomes/genomes10.txt \
    --fp ./genomes/  --pf grinder_profile \
    > genome10_comm_n3.txt


Overall diversity = 10 genomes
Percent shared   = 100.0 % (10 genomes)
Percent permuted = 100.0 % (10 top genomes)

Simulating isotope incorporation


In [288]:
import os

# making config file
config = """
[library 1]
  # baseline: no incorp
  
  [[intraPopDist 1]]
  distribution = uniform
  weight = 1

    [[[start]]]

      [[[[interPopDist 1]]]]
        distribution = uniform
        start = 0
        end = 0

    [[[end]]]

      [[[[interPopDist 1]]]]
        distribution = uniform
        start = 0
        end = 0

[library 2]
  # split intra-populations
  ## to get some taxa with split; use inter-pop mixture for 2nd intra-pop mu, where mixture is highly uneven
  
  [[intraPopDist 1]]
  distribution = normal
  weight = 0.5

    [[[mu]]]

      [[[[interPopDist 1]]]]
        distribution = normal
        mu = 90
        sigma = 2

    [[[sigma]]]

      [[[[interPopDist 1]]]]
        distribution = normal
        mu = 5
        sigma = 2

  [[intraPopDist 2]]
  distribution = normal
  weight = 0.5

    [[[mu]]]

      [[[[interPopDist 1]]]]
        distribution = normal
        mu = 5
        sigma = 2

    [[[sigma]]]

      [[[[interPopDist 1]]]]
        distribution = normal
        mu = 5
        sigma = 2


[library 3]
  # split inter-pop distribution (some approx. full; others none)
  
  [[intraPopDist 1]]
  distribution = normal
  weight = 1

    [[[mu]]]

      [[[[interPopDist 1]]]]
        distribution = normal
        mu = 90
        sigma = 2

      # these taxa in the community get no incorp
      [[[[interPopDist 2]]]]
        distribution = uniform
        start = 0
        end = 0

    [[[sigma]]]

      [[[[interPopDist 1]]]]
        distribution = normal
        mu = 5
        sigma = 2
        

"""

outfile = os.path.join(workDir, 'genome10_n3.config')

outf = open(outfile, 'wb')
outf.write(config)
outf.close()

In [289]:
import os

# making config file
config = """
[library 1]
  # baseline: no incorp
  
  [[intraPopDist 1]]
  distribution = uniform
  weight = 1

    [[[start]]]

      [[[[interPopDist 1]]]]
        distribution = uniform
        start = 0
        end = 0

    [[[end]]]

      [[[[interPopDist 1]]]]
        distribution = uniform
        start = 0
        end = 0

[library 2]
  # full incorp
  
  [[intraPopDist 1]]
  distribution = uniform
  weight = 1

    [[[start]]]

      [[[[interPopDist 1]]]]
        distribution = uniform
        start = 100
        end = 100

    [[[end]]]

      [[[[interPopDist 1]]]]
        distribution = uniform
        start = 100
        end = 100

[library 3]
  # split inter-pop distribution (some approx. full; others none)
  
  [[intraPopDist 1]]
  distribution = normal
  weight = 1

    [[[mu]]]

      [[[[interPopDist 1]]]]
        distribution = normal
        mu = 90
        sigma = 2

      # these taxa in the community get no incorp
      [[[[interPopDist 2]]]]
        distribution = uniform
        start = 0
        end = 0

    [[[sigma]]]

      [[[[interPopDist 1]]]]
        distribution = normal
        mu = 5
        sigma = 2
        

"""

outfile = os.path.join(workDir, 'genome10_n3.config')

outf = open(outfile, 'wb')
outf.write(config)
outf.close()

In [290]:
%%bash -s "$workDir" "$SIPSimExe"

cd $1

$2 isoIncorp genome10_comm_n3.txt genome10_n3.config > genome10_comm_n3_incorp.txt

Plotting incorporation in R


In [291]:
%%R
library(ggplot2)
library(dplyr)

In [292]:
%%R -i workDir

infile = paste(c(workDir, 'genome10_comm_n3_incorp.txt'), collapse='/')

tbl = read.csv(infile, sep='\t')

In [293]:
%%R -w 700 

tbl$taxon_name = reorder(tbl$taxon_name, tbl$param_value, max)

ggplot(tbl, aes(taxon_name, param_value, color=param)) +
    geom_point() +
    facet_grid(param ~ library, scales='free_y') +
    theme(
        text = element_text(size=16),
        axis.text.x = element_text(angle=90, hjust=1),
        axis.title.x = element_blank(),
        axis.title.y = element_blank()
    )


Simulating gradient fractions


In [294]:
%%bash -s "$workDir" "$SIPSimExe"

cd $1

$2 fractions genome10_comm_n3.txt > genome10_comm_n3_fracs.txt

In [295]:
%%R -i workDir

infile = paste(c(workDir, 'genome10_comm_n3_fracs.txt'), collapse='/')

tbl = read.csv(infile, sep='\t')

In [296]:
%%R -w 700 

#tbl$taxon_name = reorder(tbl$taxon_name, tbl$param_value, max)
tbl$library = as.character(tbl$library)

ggplot(tbl, aes(library, fraction_size)) +
    geom_boxplot()



In [297]:
%%R -h 300
tbl_sum = group_by(tbl, library) %>%
    summarize( n_fracs = n())

ggplot(tbl_sum, aes(library, n_fracs)) +
    geom_bar(stat='identity')


Creating OTU table & plotting simulated fragment info


In [298]:
%%bash -s "$workDir" "$SIPSimExe"

cd $1

$2 OTU_table genome10_ampFragGC.txt genome10_comm_n3.txt \
    genome10_comm_n3_incorp.txt genome10_comm_n3_fracs.txt \
    --abs_abund 1e4 --log genome10_OTU_abnd1e4_log.txt \
    > genome10_OTU_abnd1e4.txt


Processing library: "1"
  Processing taxon: "Bacillus_cereus_F837_76"
    N-fragments:   1130
    Time elapsed:  0.0 sec
  Processing taxon: "Roseobacter_litoralis_Och_149"
    N-fragments:   1098
    Time elapsed:  0.0 sec
  Processing taxon: "Xanthomonas_axonopodis_Xac29-1"
    N-fragments:   1068
    Time elapsed:  0.0 sec
  Processing taxon: "Escherichia_coli_APEC_O78"
    N-fragments:   1039
    Time elapsed:  0.0 sec
  Processing taxon: "Leuconostoc_citreum_KM20"
    N-fragments:   1010
    Time elapsed:  0.0 sec
  Processing taxon: "Cyanobium_gracile_PCC_6307"
    N-fragments:   983
    Time elapsed:  0.0 sec
  Processing taxon: "Geobacter_lovleyi_SZ"
    N-fragments:   955
    Time elapsed:  0.0 sec
  Processing taxon: "Nitrosomonas_europaea_ATCC_19718"
    N-fragments:   929
    Time elapsed:  0.0 sec
  Processing taxon: "Micrococcus_luteus_NCTC_2665"
    N-fragments:   904
    Time elapsed:  0.0 sec
  Processing taxon: "Idiomarina_loihiensis_GSL_199"
    N-fragments:   879
    Time elapsed:  0.0 sec
Processing library: "2"
  Processing taxon: "Bacillus_cereus_F837_76"
    N-fragments:   518
    Time elapsed:  0.0 sec
  Processing taxon: "Roseobacter_litoralis_Och_149"
    N-fragments:   1713
    Time elapsed:  0.0 sec
  Processing taxon: "Xanthomonas_axonopodis_Xac29-1"
    N-fragments:   1150
    Time elapsed:  0.0 sec
  Processing taxon: "Escherichia_coli_APEC_O78"
    N-fragments:   425
    Time elapsed:  0.0 sec
  Processing taxon: "Leuconostoc_citreum_KM20"
    N-fragments:   1404
    Time elapsed:  0.0 sec
  Processing taxon: "Cyanobium_gracile_PCC_6307"
    N-fragments:   633
    Time elapsed:  0.0 sec
  Processing taxon: "Geobacter_lovleyi_SZ"
    N-fragments:   2090
    Time elapsed:  0.0 sec
  Processing taxon: "Nitrosomonas_europaea_ATCC_19718"
    N-fragments:   772
    Time elapsed:  0.0 sec
  Processing taxon: "Micrococcus_luteus_NCTC_2665"
    N-fragments:   942
    Time elapsed:  0.0 sec
  Processing taxon: "Idiomarina_loihiensis_GSL_199"
    N-fragments:   348
    Time elapsed:  0.0 sec
Processing library: "3"
  Processing taxon: "Bacillus_cereus_F837_76"
    N-fragments:   657
    Time elapsed:  0.0 sec
  Processing taxon: "Roseobacter_litoralis_Och_149"
    N-fragments:   1671
    Time elapsed:  0.0 sec
  Processing taxon: "Xanthomonas_axonopodis_Xac29-1"
    N-fragments:   452
    Time elapsed:  0.0 sec
  Processing taxon: "Escherichia_coli_APEC_O78"
    N-fragments:   545
    Time elapsed:  0.0 sec
  Processing taxon: "Leuconostoc_citreum_KM20"
    N-fragments:   954
    Time elapsed:  0.0 sec
  Processing taxon: "Cyanobium_gracile_PCC_6307"
    N-fragments:   792
    Time elapsed:  0.0 sec
  Processing taxon: "Geobacter_lovleyi_SZ"
    N-fragments:   375
    Time elapsed:  0.0 sec
  Processing taxon: "Nitrosomonas_europaea_ATCC_19718"
    N-fragments:   1150
    Time elapsed:  0.0 sec
  Processing taxon: "Micrococcus_luteus_NCTC_2665"
    N-fragments:   1386
    Time elapsed:  0.0 sec
  Processing taxon: "Idiomarina_loihiensis_GSL_199"
    N-fragments:   2014
    Time elapsed:  0.0 sec
Fragment info written to: genome10_OTU_abnd1e4_log.txt

Plotting out fragment info


In [299]:
%%R -i workDir

# loading file
infile = paste(c(workDir, 'genome10_OTU_abnd1e4_log.txt'), collapse='/')
tbl = read.csv(infile, sep='\t')

# reformat
tbl$taxon = gsub("(.*?_.*?)_(.+)", "\\1\n\\2", tbl$taxon)
tbl$fragment_length = tbl$fragment_length / 1000

In [300]:
%%R -w 600 -h 1000
# plot

ggplot(tbl, aes(fragment_length, fragment_GC)) +
    geom_density2d() +
    facet_grid(taxon ~ library) +
    labs(x="fragment length [kb]", y="fragment G+C") +
    theme(
        text = element_text(size=18),
        axis.text.x = element_text(angle=90)
    )



In [301]:
%%R -w 1000 -h 700
# plot

tbl.l1 = tbl %>% filter(library == 1)

ggplot(tbl.l1, aes(fragment_length, fragment_GC)) +
    geom_density2d() +
    facet_wrap( ~ taxon) +
    labs(x="fragment length [kb]", y="fragment G+C") +
    theme(
        text = element_text(size=18),
        axis.text.x = element_text(angle=90)
    )


Creating community of larger total abundance


In [302]:
%%bash -s "$workDir" "$SIPSimExe"

cd $1

$2 OTU_table genome10_shotFragGC.txt genome10_comm_n3.txt \
    genome10_comm_n3_incorp.txt genome10_comm_n3_fracs.txt \
    --abs_abund 1e5 > genome10_OTU_abnd1e6.txt


Processing library: "1"
  Processing taxon: "Bacillus_cereus_F837_76"
    N-fragments:   11300
    Time elapsed:  0.2 sec
  Processing taxon: "Roseobacter_litoralis_Och_149"
    N-fragments:   10989
    Time elapsed:  0.1 sec
  Processing taxon: "Xanthomonas_axonopodis_Xac29-1"
    N-fragments:   10687
    Time elapsed:  0.1 sec
  Processing taxon: "Escherichia_coli_APEC_O78"
    N-fragments:   10393
    Time elapsed:  0.1 sec
  Processing taxon: "Leuconostoc_citreum_KM20"
    N-fragments:   10107
    Time elapsed:  0.1 sec
  Processing taxon: "Cyanobium_gracile_PCC_6307"
    N-fragments:   9830
    Time elapsed:  0.1 sec
  Processing taxon: "Geobacter_lovleyi_SZ"
    N-fragments:   9559
    Time elapsed:  0.1 sec
  Processing taxon: "Nitrosomonas_europaea_ATCC_19718"
    N-fragments:   9296
    Time elapsed:  0.1 sec
  Processing taxon: "Micrococcus_luteus_NCTC_2665"
    N-fragments:   9041
    Time elapsed:  0.1 sec
  Processing taxon: "Idiomarina_loihiensis_GSL_199"
    N-fragments:   8792
    Time elapsed:  0.1 sec
Processing library: "2"
  Processing taxon: "Bacillus_cereus_F837_76"
    N-fragments:   5188
    Time elapsed:  0.1 sec
  Processing taxon: "Roseobacter_litoralis_Och_149"
    N-fragments:   17134
    Time elapsed:  0.2 sec
  Processing taxon: "Xanthomonas_axonopodis_Xac29-1"
    N-fragments:   11505
    Time elapsed:  0.1 sec
  Processing taxon: "Escherichia_coli_APEC_O78"
    N-fragments:   4251
    Time elapsed:  0.1 sec
  Processing taxon: "Leuconostoc_citreum_KM20"
    N-fragments:   14040
    Time elapsed:  0.2 sec
  Processing taxon: "Cyanobium_gracile_PCC_6307"
    N-fragments:   6331
    Time elapsed:  0.1 sec
  Processing taxon: "Geobacter_lovleyi_SZ"
    N-fragments:   20909
    Time elapsed:  0.3 sec
  Processing taxon: "Nitrosomonas_europaea_ATCC_19718"
    N-fragments:   7726
    Time elapsed:  0.1 sec
  Processing taxon: "Micrococcus_luteus_NCTC_2665"
    N-fragments:   9428
    Time elapsed:  0.1 sec
  Processing taxon: "Idiomarina_loihiensis_GSL_199"
    N-fragments:   3483
    Time elapsed:  0.0 sec
Processing library: "3"
  Processing taxon: "Bacillus_cereus_F837_76"
    N-fragments:   6571
    Time elapsed:  0.1 sec
  Processing taxon: "Roseobacter_litoralis_Och_149"
    N-fragments:   16714
    Time elapsed:  0.2 sec
  Processing taxon: "Xanthomonas_axonopodis_Xac29-1"
    N-fragments:   4523
    Time elapsed:  0.1 sec
  Processing taxon: "Escherichia_coli_APEC_O78"
    N-fragments:   5452
    Time elapsed:  0.1 sec
  Processing taxon: "Leuconostoc_citreum_KM20"
    N-fragments:   9546
    Time elapsed:  0.1 sec
  Processing taxon: "Cyanobium_gracile_PCC_6307"
    N-fragments:   7920
    Time elapsed:  0.1 sec
  Processing taxon: "Geobacter_lovleyi_SZ"
    N-fragments:   3753
    Time elapsed:  0.1 sec
  Processing taxon: "Nitrosomonas_europaea_ATCC_19718"
    N-fragments:   11505
    Time elapsed:  0.2 sec
  Processing taxon: "Micrococcus_luteus_NCTC_2665"
    N-fragments:   13867
    Time elapsed:  0.2 sec
  Processing taxon: "Idiomarina_loihiensis_GSL_199"
    N-fragments:   20145
    Time elapsed:  0.3 sec

Plotting out values


In [5]:
%%R -i workDir

# loading file
infile = paste(c(workDir, 'genome10_OTU_abnd1e6.txt'), collapse='/')
tbl = read.csv(infile, sep='\t')

In [6]:
%%R

# formatting table
tbl$BD_min = gsub('-.+', '', tbl$fractions)
tbl$BD_min = as.numeric(tbl$BD_min)
tbl$BD_max = gsub('.+-', '', tbl$fractions)
tbl$BD_max = as.numeric(tbl$BD_max)

In [7]:
%%R

# summarizing counts (should be approx. total abundance)
tbl %>%
    group_by(library) %>%
    summarize(sum(count))


Source: local data frame [3 x 2]

  library sum(count)
1       1     999995
2       2     999995
3       3     999994

In [8]:
%%R -w 800
# plotting absolute abundances

## BD for G+C of 0 or 100
BD.GCp0 = 0 * 0.098 + 1.66
BD.GCp100 = 1 * 0.098 + 1.66

## plot
ggplot(tbl, aes(BD_min, count, fill=taxon, group=taxon)) +
    geom_area(stat='identity', alpha=0.5, position='dodge') +
    geom_vline(xintercept=c(BD.GCp0, BD.GCp100), linetype='dashed', alpha=0.5) +
    facet_grid(library ~ .) +
    labs(x='Buoyant density') +
    theme( text = element_text(size=16) )



In [9]:
%%R -w 800

# plotting relative abundances
ggplot(tbl, aes(BD_min, count, fill=taxon, group=taxon)) +
    geom_area(stat='identity', alpha=0.8, position='fill') +
    geom_vline(xintercept=c(BD.GCp0, BD.GCp100), linetype='dashed', alpha=0.5) +
    facet_grid(library ~ .) +
    labs(x='Buoyant density') +
    theme( text = element_text(size=16) )


Notes:

  • Limited 'noise'; taxa not present in (nearly) all fractions as seen empirically
    • This could be caused by:
      • Too low of total abundance?
      • Smaller fragments present, which have high diffusion

Example plot


In [48]:
%%R -w 500 -h 400

taxa.sel = unique(tbl$taxon)[1:5]
tbl.tmp = filter(tbl, library == 1, taxon %in% taxa.sel, count > 0)

x = as.character(tbl.tmp$BD_min)
tbl.tmp$BD_min = factor(x, levels=x[order(tbl.tmp$BD_min)])

# plotting relative abundances
ggplot(tbl.tmp, aes(BD_min, count, fill=taxon, group=taxon)) +
    geom_bar(stat='identity', position='fill') +
    labs(x='Buoyant density', y='Relative abundance') +
    coord_flip() +
    theme_bw() +
    theme( 
        text = element_text(size=16),
        axis.text.y = element_blank()
    )



Simulating community with some smaller fragments (no min. size limit)


In [359]:
!cd $workDir; \
    $SIPSimExe fragGC genomes/genomes10.txt \
    --fp ./genomes/ \
    --fr 515Fm-927Rm.fna \
    --fld skewed-normal,9000,2500,-5  \
    --flr 500,None \
    --np 10 \
    > genome10_ampFragGC_difSkew.txt


Processing: "Roseobacter_litoralis_Och_149"
Processing: "Idiomarina_loihiensis_GSL_199"
Processing: "Bacillus_cereus_F837_76"
Processing: "Micrococcus_luteus_NCTC_2665"
Processing: "Geobacter_lovleyi_SZ"
Processing: "Cyanobium_gracile_PCC_6307"
Processing: "Leuconostoc_citreum_KM20"
Processing: "Escherichia_coli_APEC_O78"
Processing: "Xanthomonas_axonopodis_Xac29-1"
Processing: "Nitrosomonas_europaea_ATCC_19718"

Plotting fragment sizes


In [360]:
%%R -i workDir

setwd(workDir)
infile = 'genome10_ampFragGC_difSkew.txt'

tbl = read.delim(infile, sep='\t')

tbl$frag_len = abs(tbl$fragEnd - tbl$fragStart) + 1

In [361]:
%%R -h 350 -w 650

# stats
print(length(tbl[tbl$frag_len < 4000, 'frag_len']) / nrow(tbl))

# plot
ggplot(tbl, aes(frag_len)) +
    geom_histogram(binwidth=100) +
    geom_vline(xintercept=4000, linetype='dashed', alpha=0.5) +
    theme(
        text = element_text(size=18)
    )


[1] 0.04442

Simulating community


In [336]:
%%bash -s "$workDir" "$SIPSimExe"

cd $1

$2 OTU_table genome10_ampFragGC_difSkew.txt genome10_comm_n3.txt \
    genome10_comm_n3_incorp.txt genome10_comm_n3_fracs.txt \
    --abs_abund 1e8 > genome10_OTU_abnd1e8.txt


Processing library: "1"
  Processing taxon: "Bacillus_cereus_F837_76"
    N-fragments:   11300354
    Time elapsed:  150.0 sec
  Processing taxon: "Roseobacter_litoralis_Och_149"
    N-fragments:   10989669
    Time elapsed:  144.4 sec
  Processing taxon: "Xanthomonas_axonopodis_Xac29-1"
    N-fragments:   10687525
    Time elapsed:  141.2 sec
  Processing taxon: "Escherichia_coli_APEC_O78"
    N-fragments:   10393689
    Time elapsed:  136.2 sec
  Processing taxon: "Leuconostoc_citreum_KM20"
    N-fragments:   10107931
    Time elapsed:  131.4 sec
  Processing taxon: "Cyanobium_gracile_PCC_6307"
    N-fragments:   9830030
    Time elapsed:  128.2 sec
  Processing taxon: "Geobacter_lovleyi_SZ"
    N-fragments:   9559769
    Time elapsed:  125.2 sec
  Processing taxon: "Nitrosomonas_europaea_ATCC_19718"
    N-fragments:   9296938
    Time elapsed:  121.2 sec
  Processing taxon: "Micrococcus_luteus_NCTC_2665"
    N-fragments:   9041334
    Time elapsed:  119.6 sec
  Processing taxon: "Idiomarina_loihiensis_GSL_199"
    N-fragments:   8792757
    Time elapsed:  114.7 sec
Processing library: "2"
  Processing taxon: "Bacillus_cereus_F837_76"
    N-fragments:   5188078
    Time elapsed:  67.9 sec
  Processing taxon: "Roseobacter_litoralis_Och_149"
    N-fragments:   17134534
    Time elapsed:  225.9 sec
  Processing taxon: "Xanthomonas_axonopodis_Xac29-1"
    N-fragments:   11505806
    Time elapsed:  150.2 sec
  Processing taxon: "Escherichia_coli_APEC_O78"
    N-fragments:   4251370
    Time elapsed:  54.9 sec
  Processing taxon: "Leuconostoc_citreum_KM20"
    N-fragments:   14040891
    Time elapsed:  185.1 sec
  Processing taxon: "Cyanobium_gracile_PCC_6307"
    N-fragments:   6331173
    Time elapsed:  83.1 sec
  Processing taxon: "Geobacter_lovleyi_SZ"
    N-fragments:   20909802
    Time elapsed:  276.7 sec
  Processing taxon: "Nitrosomonas_europaea_ATCC_19718"
    N-fragments:   7726126
    Time elapsed:  101.0 sec
  Processing taxon: "Micrococcus_luteus_NCTC_2665"
    N-fragments:   9428431
    Time elapsed:  123.7 sec
  Processing taxon: "Idiomarina_loihiensis_GSL_199"
    N-fragments:   3483784
    Time elapsed:  45.4 sec
Processing library: "3"
  Processing taxon: "Bacillus_cereus_F837_76"
    N-fragments:   6571355
    Time elapsed:  94.9 sec
  Processing taxon: "Roseobacter_litoralis_Och_149"
    N-fragments:   16714389
    Time elapsed:  242.0 sec
  Processing taxon: "Xanthomonas_axonopodis_Xac29-1"
    N-fragments:   4523563
    Time elapsed:  65.6 sec
  Processing taxon: "Escherichia_coli_APEC_O78"
    N-fragments:   5452150
    Time elapsed:  79.1 sec
  Processing taxon: "Leuconostoc_citreum_KM20"
    N-fragments:   9546170
    Time elapsed:  138.8 sec
  Processing taxon: "Cyanobium_gracile_PCC_6307"
    N-fragments:   7920307
    Time elapsed:  114.1 sec
  Processing taxon: "Geobacter_lovleyi_SZ"
    N-fragments:   3753130
    Time elapsed:  54.0 sec
  Processing taxon: "Nitrosomonas_europaea_ATCC_19718"
    N-fragments:   11505785
    Time elapsed:  168.2 sec
  Processing taxon: "Micrococcus_luteus_NCTC_2665"
    N-fragments:   13867666
    Time elapsed:  199.5 sec
  Processing taxon: "Idiomarina_loihiensis_GSL_199"
    N-fragments:   20145479
    Time elapsed:  292.6 sec

Plotting out distributions


In [342]:
%%R -i workDir

# loading file
infile = paste(c(workDir, 'genome10_OTU_abnd1e8.txt'), collapse='/')
tbl = read.csv(infile, sep='\t')

In [343]:
%%R

# formatting table
tbl$BD_min = gsub('-.+', '', tbl$fractions)
tbl$BD_min = as.numeric(tbl$BD_min)
tbl$BD_max = gsub('.+-', '', tbl$fractions)
tbl$BD_max = as.numeric(tbl$BD_max)

In [344]:
%%R

# summarizing counts (should be approx. total abundance)
tbl %>%
    group_by(library) %>%
    summarize(sum(count))


Source: local data frame [3 x 2]

  library sum(count)
1       1   99999996
2       2   99999995
3       3   99999994

In [345]:
%%R -w 800
# plotting absolute abundances

## BD for G+C of 0 or 100
BD.GCp0 = 0 * 0.098 + 1.66
BD.GCp100 = 1 * 0.098 + 1.66

## plot
ggplot(tbl, aes(BD_min, count, fill=taxon, group=taxon)) +
    geom_area(stat='identity', alpha=0.5, position='dodge') +
    geom_vline(xintercept=c(BD.GCp0, BD.GCp100), linetype='dashed', alpha=0.5) +
    facet_grid(library ~ .) +
    labs(x='Buoyant density') +
    theme( text = element_text(size=16) )



In [346]:
%%R -w 800

# plotting relative abundances
ggplot(tbl, aes(BD_min, count, fill=taxon, group=taxon)) +
    geom_area(stat='identity', alpha=0.8, position='fill') +
    geom_vline(xintercept=c(BD.GCp0, BD.GCp100), linetype='dashed', alpha=0.5) +
    facet_grid(library ~ .) +
    labs(x='Buoyant density') +
    theme( text = element_text(size=16) )







In [ ]:


In [ ]:


In [367]:
%%R -i workDir

infile = paste(c(workDir, 'genome10_OTU_abnd1e6.txt'), collapse='/')

tbl = read.csv(infile, sep='\t', row.names=1)
tbl$taxon_name = rownames(tbl)

In [368]:
%%R
tbl.m = melt(tbl, id.var=c('taxon_name'))
colnames(tbl.m) = c('taxon_name', 'variable', 'abundance')

tbl.m$lib = gsub('X|\\..+', '', tbl.m$variable)
tbl.m$BD_min = gsub('X[0-9]+\\.([0-9]+\\.[0-9]+).+', '\\1', tbl.m$variable)
tbl.m$BD_min = as.numeric(tbl.m$BD_min)
tbl.m$BD_max = gsub('X[0-9]+\\.[0-9]+\\.[0-9]+\\.(.+)', '\\1', tbl.m$variable)
tbl.m$BD_max = as.numeric(tbl.m$BD_max)

In [369]:
%%R -w 800

ggplot(tbl.m, aes(BD_min, abundance, color=taxon_name, group=taxon_name)) +
    geom_point(size=1.5) +
    geom_line(alpha=0.5) +
    facet_grid(lib ~ .) +
    theme( text = element_text(size=16) )



In [370]:
%%R -w 800

ggplot(tbl.m, aes(BD_min, abundance, fill=taxon_name, group=taxon_name)) +
    geom_area(stat='identity', alpha=0.5, position='dodge') +
    facet_grid(lib ~ .) +
    labs(x='Buoyant density') +
    theme( text = element_text(size=16) )



In [373]:
%%R -w 800 -h 800

tbl.m2 = tbl.m
tbl.m2$taxon_name = gsub("_","\n", tbl.m2$taxon_name)

ggplot(tbl.m2, aes(BD_min, abundance, fill=taxon_name, group=taxon_name)) +
    geom_area(stat='identity', alpha=0.5, position='dodge') +
    facet_grid(taxon_name ~ lib)