Description

Recreating the data in Figure 1 of Lueders et al., 2004

Parameters:

Taxa
- Methylobacterium extorquens AM1 DSM 1338
  - genome G+C = 68.7
  - accession = NC_012808
- M. barkeri DSM 800
  - aka Methanosarcina barkeri MS
  - genome G+C = 39.2
  - accession = NZ_CP009528
isotope(s)
- ¹³C
- assumed 100% incorporation
gradient
- rotor
  - TV865 vertical rotor (Sorvall)
    - vertical rotor
- CsCl density
  - 1.725 g/ml
- centrifugation
  - 20^oC
  - >36h
  - 45,000 r.p.m. (177,000 gav)
- fractionation volume
  - 400 ul
DNA quantification
- both qPCR & fluorometrically
- qPCR primers
  - Ar109f/Ar915r (Lueders and Friedrich, 2003)
    - Ar109f (ACKGCTCAGTAACACGT)
    - Ar915r (GTGCTCCCCCGCCAATTCCT)
  - Ba519f/Ba907 (Stubner, 2002)
    - Ba519f (CAGCMGCCGCGGTAANWC)
    - Ba907r (CCGTCAATTCMTTTRAGTT)

Setting variables



In [7]:

    
workDir = '/home/nick/t/Lueders2004/'

# params
bandwidth = 0.8

Init



In [8]:

    
import os
#import numpy as np
%load_ext rpy2.ipython









    



The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython



In [10]:

    
%%R
library(ggplot2)
library(dplyr)
library(tidyr)
library(gridExtra)
#library(genomes)



In [11]:

    
if not os.path.isdir(workDir):
    os.makedirs(workDir)
%cd $workDir









    



/home/nick/t/Lueders2004

Getting genomes



In [15]:

    
taxa="""Methylobacterium_extorquens_AM1 NC_012808
Methanosarcina_barkeri_MS NZ_CP009528
"""
taxa = taxa.replace(' ', '\t')

genome_file = os.path.join(workDir, 'genome_list.txt')
with open(genome_file, 'w') as oFH:
    oFH.write(taxa)
    
print('File written: {}'.format(genome_file))









    



File written: /home/nick/t/Lueders2004/genome_list.txt



In [16]:

    
!head $genome_file









    



Methylobacterium_extorquens_AM1	NC_012808
Methanosarcina_barkeri_MS	NZ_CP009528



In [17]:

    
%%bash -s $genome_file
source activate py27

# downloading genomes
SIPSim genome_download -d genomes -n 2 $1









    



File written: genomes/Methanosarcina_barkeri_MS.fna
File written: genomes/Methylobacterium_extorquens_AM1.fna



In [18]:

    
!ls -thlc genomes/*.fna









    



-rwxrwxr-x 1 nick nick 5.4M Jul  9 14:25 genomes/Methylobacterium_extorquens_AM1.fna
-rwxrwxr-x 1 nick nick 4.4M Jul  9 14:25 genomes/Methanosarcina_barkeri_MS.fna



In [19]:

    
# %%R
# data(proks)
# summary(proks)



In [20]:

    
# %%R
# update(proks)
# summary(proks)



In [21]:

    
# %%R
# M.ext = subset(proks, name %like% 'Methylobacterium extorquens AM1*')
# M.bark = subset(proks, name %like% 'Methanosarcina barkeri MS*')
# orgs = rbind(M.ext, M.bark)


# orgFile = 'genome_info.txt'
# write.table(orgs, orgFile, sep='\t', quote=FALSE, row.names=FALSE)
# cat('File written:', orgFile, '\n')



In [22]:

    
# !seqDB_tools accession-GI2fasta \
#     -a 11 -n 2 -f 30 -header \
#     -o genomes \
#     genome_info.txt



In [23]:

    
#!ls -thlc genomes/*.fna

Renaming genome sequences



In [24]:

    
%%bash 
source activate py27

find ./genomes/ -name "*.fna" |\
    xargs -P 2 -I % SIPSim genome_rename --prefix genomes_rn %









    



File written: /home/nick/t/Lueders2004/genomes_rn/Methylobacterium_extorquens_AM1.fna
File written: /home/nick/t/Lueders2004/genomes_rn/Methanosarcina_barkeri_MS.fna



In [25]:

    
!grep ">" genomes_rn/*fna | perl -pe 's/.+:>/>/'









    



>NZ_CP009528_1_Methanosarcina_barkeri_MS
>NC_012808_1_Methylobacterium_extorquens_AM1

Indexing genomes



In [26]:

    
# making a genome index file
## taxonName<tab>genomeSeqFileName
!ls -1 genomes_rn/*fna | perl -pe 's/.+\/|\.fna//g; s/(.+)/$1\t$1.fna/' > genome_index.txt
!cat genome_index.txt









    



Methanosarcina_barkeri_MS	Methanosarcina_barkeri_MS.fna
Methylobacterium_extorquens_AM1	Methylobacterium_extorquens_AM1.fna



In [31]:

    
%%bash 
source activate py27

SIPSim genome_index \
    --fp ./genomes_rn/ \
    --np 2 \
    genome_index.txt \
    > genome_index.log









    



Indexing: "Methanosarcina_barkeri_MS"
Indexing: "Methylobacterium_extorquens_AM1"
#-- All genomes indexed --#



In [32]:

    
!tail genome_index.log









    



0: 87.09%, 0:00:38.713404
0: 88.91%, 0:00:39.542779
0: 90.72%, 0:00:40.380094
0: 92.54%, 0:00:41.204959
0: 94.35%, 0:00:42.029624
0: 96.17%, 0:00:42.840094
0: 97.98%, 0:00:43.659560
0: 99.79%, 0:00:44.539434
Time used: 0:00:47.124080
Done.

Simulating pre-fractionation communities

Manually specifying taxon abundances in each community



In [33]:

    
comm = """library	taxon_name	rel_abund_perc	rank
1	Methanosarcina_barkeri_MS	100.0	1
1	Methylobacterium_extorquens_AM1	0.0	2
2	Methylobacterium_extorquens_AM1	100.0	1
2	Methanosarcina_barkeri_MS	0.0	2
"""

comm_file = 'comm-n2-unif.txt'
with open(comm_file, 'w') as oFH:
    oFH.write(comm)
    
!cat $comm_file









    



library	taxon_name	rel_abund_perc	rank
1	Methanosarcina_barkeri_MS	100.0	1
1	Methylobacterium_extorquens_AM1	0.0	2
2	Methylobacterium_extorquens_AM1	100.0	1
2	Methanosarcina_barkeri_MS	0.0	2

Simulating gradient fractions



In [34]:

    
%%bash 
source activate py27

SIPSim gradient_fractions \
    --BD_min 1.66 \
    --BD_max 1.8 \
    --params mu:0.0075,sigma:0.001 \
    comm-n2-unif.txt \
    > fracs-n2-unif.txt



In [35]:

    
# checking output
!wc -l fracs-n2-unif.txt
!head -n 5 fracs-n2-unif.txt
!tail -n 5 fracs-n2-unif.txt









    



40 fracs-n2-unif.txt
library	fraction	BD_min	BD_max	fraction_size
1	1	1.66	1.662	0.002
1	2	1.662	1.672	0.01
1	3	1.672	1.68	0.008
1	4	1.68	1.687	0.007
2	16	1.761	1.769	0.008
2	17	1.769	1.778	0.009
2	18	1.778	1.785	0.007
2	19	1.785	1.792	0.007
2	20	1.792	1.8	0.008



In [36]:

    
%%R -w 800 -h 300
# plotting
df = read.delim('fracs-n2-unif.txt', sep='\t')

df %>%
    group_by(library) %>%
    summarize(n=n()) %>%
    print

p = ggplot(df, aes(BD_min, fraction_size, group=library, color=library %>% as.character)) +
    geom_point() +
    geom_line() +
    scale_x_continuous(breaks=seq(1.66, 1.8, 0.01)) +
    scale_color_discrete('gradient') +
    labs(x='Buoyant density', y='Fraction size (buoyant density)') +
    theme_bw() +
    theme(
        text = element_text(size=16)
    )
plot(p)









    





# A tibble: 2 × 2
  library     n
    <int> <int>
1       1    19
2       2    20

SIPSim of shotgun fragments

for comparing to fluormetric quanitification



In [37]:

    
%%bash 
source activate py27

SIPSim fragments \
    genome_index.txt \
    --fp genomes \
    --fld skewed-normal,9000,2500,-5 \
    --flr None,None \
    --nf 100X \
    --np 2 \
    2> shotFrag_skewN90-25-n5-nS.log \
    > shotFrag_skewN90-25-n5-nS.pkl



In [38]:

    
!tail shotFrag_skewN90-25-n5-nS.log









    



Processing: "Methanosarcina_barkeri_MS"
Processing: "Methylobacterium_extorquens_AM1"
  Genome name: Methanosarcina_barkeri_MS
  Genome length (bp): 4533209
  Number of amplicons: None
  Number of fragments simulated: 64438
  Genome name: Methylobacterium_extorquens_AM1
  Genome length (bp): 5511322
  Number of amplicons: None
  Number of fragments simulated: 78234

Converting to 2d-kde objects

Also converts fragment G+C to buoyant density



In [41]:

    
%%bash -s $bandwidth
source activate py27

SIPSim fragment_KDE \
    shotFrag_skewN90-25-n5-nS.pkl \
    --bw $1 \
    > shotFrag_skewN90-25-n5-nS_kde.pkl



In [42]:

    
!ls -thlc shotFrag_skewN90-25-n5-nS_kde.pkl









    



-rwxrwxr-x 1 nick nick 2.2M Jul  9 14:30 shotFrag_skewN90-25-n5-nS_kde.pkl



In [43]:

    
%%bash 
source activate py27
# getting info on the KDEs 
SIPSim KDE_info -s shotFrag_skewN90-25-n5-nS_kde.pkl









    



lib_ID	taxon_ID	KDE_ID	min	percentile_5	percentile_25	mean	median	percentile_75	percentile_95	max	stdev
1	Methylobacterium_extorquens_AM1	1	1.7037358643	1.72326609418	1.72627765	1.72733014122	1.72760620837	1.72874113247	1.73058104238	1.73763597299	0.0024262170702
1	Methylobacterium_extorquens_AM1	2	304.0	4091.0	6120.0	7044.75819976	7317.0	8216.0	9081.0	10859.0	1550.84283561
1	Methanosarcina_barkeri_MS	1	1.68427680448	1.69393247247	1.6967008849	1.69843864574	1.69847512287	1.70015473819	1.70286392182	1.71221489699	0.00273258640582
1	Methanosarcina_barkeri_MS	2	298.0	4074.85	6109.0	7035.07079673	7309.0	8208.0	9077.0	10859.0	1558.41254957






    



Loading KDEs...

Plotting distributions



In [44]:

    
# number of samples to draw from KDEs
n=100000



In [45]:

    
%%bash -s $n
source activate py27

SIPSim KDE_sample \
    -n $1 shotFrag_skewN90-25-n5-nS_kde.pkl \
    > shotFrag_skewN90-25-n5-nS_kde.txt



In [49]:

    
%%R -w 600 -h 400
df = read.delim('shotFrag_skewN90-25-n5-nS_kde.txt', sep='\t') %>%
    gather(taxon, Buoyant_density, Methanosarcina_barkeri_MS, Methylobacterium_extorquens_AM1) 


p = ggplot(df, aes(Buoyant_density, fill=taxon)) +
    geom_density(alpha=0.5) +
    scale_x_continuous(limits=c(1.68, 1.78)) +
    scale_fill_discrete('') +
    theme_bw() +
    theme(
        text = element_text(size=16),
        legend.position = 'bottom'
    )
pl = p + scale_y_log10()

grid.arrange(p, pl, ncol=1)

Adding diffusion



In [47]:

    
%%bash -s $bandwidth
source activate py27

SIPSim diffusion \
    shotFrag_skewN90-25-n5-nS_kde.pkl \
    --np 2 \
    --bw $1 \
    > shotFrag_skewN90-25-n5-nS_dif_kde.pkl









    



Index size: 90508
Processing: Methanosarcina_barkeri_MS
Processing: Methylobacterium_extorquens_AM1



In [48]:

    
!ls -thlc shotFrag_skewN90-25-n5-nS_dif_kde.pkl









    



-rwxrwxr-x 1 nick nick 7.7M Jul  9 14:32 shotFrag_skewN90-25-n5-nS_dif_kde.pkl

Plotting distributions



In [50]:

    
# number of samples to draw from KDEs
n=100000



In [52]:

    
%%bash -s $n
source activate py27

SIPSim KDE_sample \
    -n $1 shotFrag_skewN90-25-n5-nS_dif_kde.pkl \
    > shotFrag_skewN90-25-n5-nS_dif_kde.txt



In [53]:

    
%%R -w 600 -h 400
df = read.delim('shotFrag_skewN90-25-n5-nS_dif_kde.txt', sep='\t') %>%
    gather(taxon, Buoyant_density, Methanosarcina_barkeri_MS, Methylobacterium_extorquens_AM1) 


p = ggplot(df, aes(Buoyant_density, fill=taxon)) +
    geom_density(alpha=0.5) +
    scale_x_continuous(limits=c(1.68, 1.78)) +
    scale_fill_discrete('') +
    theme_bw() +
    theme(
        text = element_text(size=16),
        legend.position = 'bottom'
    )
pl = p + scale_y_log10()

grid.arrange(p, pl, ncol=1)

Adding DBL effects



In [57]:

    
%%bash -s $n
source activate py27

SIPSim DBL \
    shotFrag_skewN90-25-n5-nS_dif_kde.pkl \
    --comm comm-n2-unif.txt \
    --commx 0 \
    -D 1.725 \
    -w 19807714 \
    --tube_height 5.1 \
    --r_min 7.47 \
    --r_max 8.79 \
    --vertical \
    --np 2 \
    > shotFrag_skewN90-25-n5-nS_dif_kde_DBL.pkl









    



/home/nick/bin/anaconda3/envs/py27/lib/python2.7/site-packages/numpy/lib/polynomial.py:680: RuntimeWarning: invalid value encountered in multiply
  y = y * x + p[i]
DBL_index file written: "DBL_index.txt"
Processing: Methylobacterium_extorquens_AM1
Processing: Methanosarcina_barkeri_MS
Processing: Methylobacterium_extorquens_AM1
Processing: Methanosarcina_barkeri_MS

Plotting smearing index



In [58]:

    
%%R -h 300 -w 400
df = read.delim('DBL_index.txt', sep='\t')

p = ggplot(df, aes(DBL_BD, ymin=vert_gradient_BD_low, ymax=vert_gradient_BD_high)) +
    geom_linerange() +
    scale_y_reverse() +
    labs(x='Diffusive boundary layer buoyant density', 
         y='CsCl gradient buoyant density',
         title='Extent of DBL smearing') +
    theme_bw() +
    theme(
        text = element_text(size=16)
    )

plot(p)

Plotting distributions



In [59]:

    
n=100000



In [61]:

    
%%bash -s $n
source activate py27

SIPSim KDE_sample -n $1 \
    shotFrag_skewN90-25-n5-nS_dif_kde_DBL.pkl \
    > shotFrag_skewN90-25-n5-nS_dif_kde_DBL.txt



In [62]:

    
%%R -w 600 -h 550
df = read.delim('shotFrag_skewN90-25-n5-nS_dif_kde_DBL.txt', sep='\t') %>%
    gather(taxon, Buoyant_density, Methanosarcina_barkeri_MS, Methylobacterium_extorquens_AM1) 


p = ggplot(df, aes(Buoyant_density, fill=taxon)) +
    geom_density(alpha=0.5) +
    scale_x_continuous(limits=c(1.68, 1.78)) +
    scale_fill_discrete('') +
    facet_grid(libID ~ .) +
    theme_bw() +
    theme(
        text = element_text(size=16),
        legend.position = 'bottom'
    )
pl = p + scale_y_log10()

grid.arrange(p, pl, ncol=1)

Simulating isotope incorporation



In [64]:

    
# making config file
config = """
[1]
  # baseline: no incorp
  
  [[intraPopDist 1]]
  distribution = uniform

    [[[start]]]

      [[[[interPopDist 1]]]]
        distribution = uniform
        start = 0
        end = 0

    [[[end]]]

      [[[[interPopDist 1]]]]
        distribution = uniform
        start = 0
        end = 0
        
[2]
  # half of taxa incorporate
  max_perc_taxa_incorp = 100
  
  [[intraPopDist 1]]
  distribution = uniform

    [[[start]]]

      [[[[interPopDist 1]]]]
        distribution = uniform
        start = 100
        end = 100

    [[[end]]]

      [[[[interPopDist 1]]]]
        distribution = uniform
        start = 100
        end = 100 
"""

outfile = os.path.join(workDir, 'incorp.config')

outf = open(outfile, 'w')
outf.write(config)
outf.close()



In [65]:

    
%%bash 
source activate py27

SIPSim isotope_incorp \
    shotFrag_skewN90-25-n5-nS_dif_kde_DBL.pkl \
    incorp.config \
    --comm comm-n2-unif.txt \
    --np 2 \
    > shotFrag_skewN90-25-n5-nS_dif_kde_DBL_incorp.pkl









    



Loading KDE object...
Processing library: 1
Processing: Methylobacterium_extorquens_AM1
Processing: Methanosarcina_barkeri_MS
Processing library: 2
Processing: Methylobacterium_extorquens_AM1
Processing: Methanosarcina_barkeri_MS
File written: BD-shift_stats.txt

Plotting distributions



In [ ]:

    
n=100000



In [67]:

    
%%bash -s $n
source activate py27

SIPSim KDE_sample -n $1 \
    shotFrag_skewN90-25-n5-nS_dif_kde_DBL_incorp.pkl \
    > shotFrag_skewN90-25-n5-nS_dif_kde_DBL_incorp.txt



In [68]:

    
%%R -w 600 -h 550
df = read.delim('shotFrag_skewN90-25-n5-nS_dif_kde_DBL_incorp.txt', sep='\t') %>%
    gather(taxon, Buoyant_density, Methanosarcina_barkeri_MS, Methylobacterium_extorquens_AM1) 


p = ggplot(df, aes(Buoyant_density, fill=taxon)) +
    geom_density(alpha=0.5) +
    scale_x_continuous(limits=c(1.68, 1.78)) +
    scale_fill_discrete('') +
    facet_grid(libID ~ .) +
    theme_bw() +
    theme(
        text = element_text(size=16),
        legend.position = 'bottom'
    )
pl = p + scale_y_log10()

grid.arrange(p, pl, ncol=1)

Making OTU table



In [91]:

    
%%bash -s 
source activate py27

SIPSim OTU_table \
    shotFrag_skewN90-25-n5-nS_dif_kde_DBL_incorp.pkl \
    comm-n2-unif.txt \
    fracs-n2-unif.txt \
    --abs 1e9 \
    --np 2 \
    > shotFrag_OTU_n2_abs1e9.txt









    



Loading files...
Simulating OTUs...
Processing library: "1"
  Processing taxon: "Methanosarcina_barkeri_MS"
   taxon abs-abundance:  1000000000
  Processing taxon: "Methylobacterium_extorquens_AM1"
   taxon abs-abundance:  0
Processing library: "2"
  Processing taxon: "Methanosarcina_barkeri_MS"
   taxon abs-abundance:  0
  Processing taxon: "Methylobacterium_extorquens_AM1"
   taxon abs-abundance:  1000000000



In [93]:

    
!head shotFrag_OTU_n2_abs1e9.txt
!tail shotFrag_OTU_n2_abs1e9.txt









    



library	taxon	fraction	BD_min	BD_mid	BD_max	count	rel_abund
1	Methanosarcina_barkeri_MS	-inf-1.660	-inf	1.659	1.659	237166	1
1	Methanosarcina_barkeri_MS	1.660-1.662	1.66	1.661	1.662	16069	1
1	Methanosarcina_barkeri_MS	1.662-1.672	1.662	1.667	1.672	146734	1
1	Methanosarcina_barkeri_MS	1.672-1.680	1.672	1.676	1.68	1079638	1
1	Methanosarcina_barkeri_MS	1.680-1.687	1.68	1.683	1.687	21937006	1
1	Methanosarcina_barkeri_MS	1.687-1.694	1.687	1.691	1.694	192442586	1
1	Methanosarcina_barkeri_MS	1.694-1.700	1.694	1.697	1.7	392739924	1
1	Methanosarcina_barkeri_MS	1.700-1.706	1.7	1.703	1.706	299470596	1
1	Methanosarcina_barkeri_MS	1.706-1.714	1.706	1.71	1.714	87640965	1
2	Methylobacterium_extorquens_AM1	1.735-1.741	1.735	1.738	1.741	568099	1
2	Methylobacterium_extorquens_AM1	1.741-1.747	1.741	1.744	1.747	2874344	1
2	Methylobacterium_extorquens_AM1	1.747-1.755	1.747	1.751	1.755	56526078	1
2	Methylobacterium_extorquens_AM1	1.755-1.761	1.755	1.758	1.761	262203102	1
2	Methylobacterium_extorquens_AM1	1.761-1.769	1.761	1.765	1.769	537056304	1
2	Methylobacterium_extorquens_AM1	1.769-1.778	1.769	1.773	1.778	136942812	1
2	Methylobacterium_extorquens_AM1	1.778-1.785	1.778	1.781	1.785	2823009	1
2	Methylobacterium_extorquens_AM1	1.785-1.792	1.785	1.788	1.792	138083	1
2	Methylobacterium_extorquens_AM1	1.792-1.800	1.792	1.796	1.8	92908	1
2	Methylobacterium_extorquens_AM1	1.800-inf	1.801	1.801	inf	278394	1

Plotting OTU abundances



In [129]:

    
%%R -i workDir

# loading file
F = file.path(workDir, 'shotFrag_OTU_n2_abs1e9.txt')
df =  read.delim(F)

df1 = df %>%
    filter(library == 1,
           taxon == 'Methanosarcina_barkeri_MS')
df2 = df %>%
    filter(library == 2,
           taxon == 'Methylobacterium_extorquens_AM1')
df = rbind(df1, df2)

df_all = df %>% 
    group_by(library) %>%
    mutate(rel_abund = count / max(count)) %>%
    ungroup()
df_all %>% head(n=3) %>% as.data.frame









    





  library                     taxon    fraction BD_min BD_mid BD_max  count
1       1 Methanosarcina_barkeri_MS  -inf-1.660   -Inf  1.659  1.659 237166
2       1 Methanosarcina_barkeri_MS 1.660-1.662  1.660  1.661  1.662  16069
3       1 Methanosarcina_barkeri_MS 1.662-1.672  1.662  1.667  1.672 146734
     rel_abund
1 6.038755e-04
2 4.091512e-05
3 3.736162e-04



In [130]:

    
%%R -w 500 -h 300

# max possible shift for M barkeri (using mean G+C)
max_shift = 1.727326 + 0.036

# plotting
p_all = ggplot(df_all, aes(BD_min, rel_abund, color=taxon)) +
    geom_point(size=3.5) +
    geom_line() +
    scale_x_continuous(limits=c(1.68, 1.78), breaks=seq(1.68, 1.78, 0.02)) +
    scale_color_discrete('') +
    labs(x='CsCl buoyant density [g ml^-1]', y='Ratio of maximum quantities', 
         title='All DNA fragments') +
    theme_bw() +
    theme( 
        text = element_text(size=16),
        legend.position = 'bottom'
    )
plot(p_all)

SIPSim of amplicon fragments

for comparing to qPCR results

Simulating fragments



In [99]:

    
# primers
primers = """>Ar109f
ACKGCTCAGTAACACGT
>Ar915r
GTGCTCCCCCGCCAATTCCT
>Ba519f
CAGCMGCCGCGGTAANWC
>Ba907r
CCGTCAATTCMTTTRAGTT
"""
primer_file = 'Ba-Ar_primers.fna'
with open(primer_file, 'w') as outFH:
    outFH.write(primers)
    
!cat $primer_file









    



>Ar109f
ACKGCTCAGTAACACGT
>Ar915r
GTGCTCCCCCGCCAATTCCT
>Ba519f
CAGCMGCCGCGGTAANWC
>Ba907r
CCGTCAATTCMTTTRAGTT



In [100]:

    
%%bash 
source activate py27

SIPSim fragments \
    genome_index.txt \
    --fp genomes_rn \
    --fr Ba-Ar_primers.fna \
    --fld skewed-normal,9000,2500,-5 \
    --flr None,None \
    --nf 10000 \
    --np 2 \
    2> ampFrag_skewN90-25-n5-nS.log \
    > ampFrag_skewN90-25-n5-nS.pkl



In [101]:

    
!tail ampFrag_skewN90-25-n5-nS.log









    



Processing: "Methanosarcina_barkeri_MS"
Processing: "Methylobacterium_extorquens_AM1"
  Genome name: Methylobacterium_extorquens_AM1
  Genome name: Methanosarcina_barkeri_MS
  Genome length (bp): 5511322
  Number of amplicons: 5
  Number of fragments simulated: 10000
  Genome length (bp): 4533209
  Number of amplicons: 3
  Number of fragments simulated: 10000

Converting to 2d-kde objects

Also converts fragment G+C to buoyant density



In [102]:

    
%%bash -s $bandwidth
source activate py27

SIPSim fragment_KDE \
    --bw $1 \
    ampFrag_skewN90-25-n5-nS.pkl \
    > ampFrag_skewN90-25-n5-nS_kde.pkl



In [103]:

    
!ls -thlc ampFrag_skewN90-25-n5-nS_kde.pkl









    



-rwxrwxr-x 1 nick nick 315K Jul  9 15:51 ampFrag_skewN90-25-n5-nS_kde.pkl

Plotting distributions



In [104]:

    
n=100000



In [105]:

    
%%bash -s $n
source activate py27

SIPSim KDE_sample -n $1 \
    ampFrag_skewN90-25-n5-nS_kde.pkl \
    > ampFrag_skewN90-25-n5-nS_kde.txt



In [106]:

    
%%R -w 600 -h 400
df = read.delim('ampFrag_skewN90-25-n5-nS_kde.txt', sep='\t') %>%
    gather(taxon, Buoyant_density, Methanosarcina_barkeri_MS, Methylobacterium_extorquens_AM1) 


p = ggplot(df, aes(Buoyant_density, fill=taxon)) +
    geom_density(alpha=0.5) +
    scale_x_continuous(limits=c(1.68, 1.78)) +
    scale_fill_discrete('') +
    theme_bw() +
    theme(
        text = element_text(size=16),
        legend.position = 'bottom'
    )
pl = p + scale_y_log10()

grid.arrange(p, pl, ncol=1)

Adding diffusion



In [107]:

    
%%bash -s $bandwidth
source activate py27

SIPSim diffusion \
    --np 2 \
    --bw $1 \
    ampFrag_skewN90-25-n5-nS_kde.pkl \
    > ampFrag_skewN90-25-n5-nS_dif_kde.pkl









    



Index size: 90508
Processing: Methanosarcina_barkeri_MS
Processing: Methylobacterium_extorquens_AM1



In [108]:

    
!ls -thlc ampFrag_skewN90-25-n5-nS_dif_kde.pkl









    



-rwxrwxr-x 1 nick nick 7.7M Jul  9 15:52 ampFrag_skewN90-25-n5-nS_dif_kde.pkl

Plotting distributions



In [109]:

    
n=100000



In [110]:

    
%%bash -s $n
source activate py27

SIPSim KDE_sample -n $1 \
    ampFrag_skewN90-25-n5-nS_kde.pkl \
    > ampFrag_skewN90-25-n5-nS_kde.txt



In [111]:

    
%%R -w 600 -h 400
df = read.delim('ampFrag_skewN90-25-n5-nS_kde.txt', sep='\t') %>%
    gather(taxon, Buoyant_density, Methanosarcina_barkeri_MS, Methylobacterium_extorquens_AM1) 


p = ggplot(df, aes(Buoyant_density, fill=taxon)) +
    geom_density(alpha=0.5) +
    scale_x_continuous(limits=c(1.68, 1.78)) +
    scale_fill_discrete('') +
    theme_bw() +
    theme(
        text = element_text(size=16),
        legend.position = 'bottom'
    )
pl = p + scale_y_log10()

grid.arrange(p, pl, ncol=1)

Adding DBL effects



In [112]:

    
%%bash 
source activate py27

SIPSim DBL \
    ampFrag_skewN90-25-n5-nS_dif_kde.pkl \
    --comm comm-n2-unif.txt \
    --commx 0 \
    -D 1.725 \
    -w 19807714 \
    --tube_height 5.1 \
    --r_min 7.47 \
    --r_max 8.79 \
    --vertical \
    --np 2 \
    > ampFrag_skewN90-25-n5-nS_dif_kde_DBL.pkl









    



/home/nick/bin/anaconda3/envs/py27/lib/python2.7/site-packages/numpy/lib/polynomial.py:680: RuntimeWarning: invalid value encountered in multiply
  y = y * x + p[i]
DBL_index file written: "DBL_index.txt"
Processing: Methylobacterium_extorquens_AM1
Processing: Methanosarcina_barkeri_MS
Processing: Methylobacterium_extorquens_AM1
Processing: Methanosarcina_barkeri_MS

Plotting DBL index



In [113]:

    
%%R -h 300 -w 400
df = read.delim('DBL_index.txt', sep='\t')

p = ggplot(df, aes(DBL_BD, ymin=vert_gradient_BD_low, ymax=vert_gradient_BD_high)) +
    geom_linerange() +
    scale_y_reverse() +
    labs(x='Diffusive boundary layer buoyant density', 
         y='CsCl gradient buoyant density',
         title='Extent of DBL smearing') +
    theme_bw() +
    theme(
        text = element_text(size=16)
    )
plot(p)

Plotting distributions



In [114]:

    
n=100000



In [115]:

    
%%bash -s $n
source activate py27

SIPSim KDE_sample -n $1 \
    ampFrag_skewN90-25-n5-nS_kde.pkl \
    > ampFrag_skewN90-25-n5-nS_kde.txt



In [116]:

    
%%R -w 600 -h 400
df = read.delim('ampFrag_skewN90-25-n5-nS_kde.txt', sep='\t') %>%
    gather(taxon, Buoyant_density, Methanosarcina_barkeri_MS, Methylobacterium_extorquens_AM1) 


p = ggplot(df, aes(Buoyant_density, fill=taxon)) +
    geom_density(alpha=0.5) +
    scale_x_continuous(limits=c(1.68, 1.78)) +
    scale_fill_discrete('') +
    theme_bw() +
    theme(
        text = element_text(size=16),
        legend.position = 'bottom'
    )
pl = p + scale_y_log10()

grid.arrange(p, pl, ncol=1)

Simulating isotope incorporation



In [117]:

    
%%bash 
source activate py27

SIPSim isotope_incorp \
    ampFrag_skewN90-25-n5-nS_dif_kde_DBL.pkl \
    incorp.config \
    --comm comm-n2-unif.txt \
    --np 2 \
    > ampFrag_skewN90-25-n5-nS_dif_kde_DBL_incorp.pkl









    



Loading KDE object...
Processing library: 1
Processing: Methylobacterium_extorquens_AM1
Processing: Methanosarcina_barkeri_MS
Processing library: 2
Processing: Methylobacterium_extorquens_AM1
Processing: Methanosarcina_barkeri_MS
File written: BD-shift_stats.txt

Plotting distributions



In [118]:

    
n=100000



In [119]:

    
%%bash -s $n
source activate py27

SIPSim KDE_sample -n $1 \
    ampFrag_skewN90-25-n5-nS_kde.pkl \
    > ampFrag_skewN90-25-n5-nS_kde.txt



In [120]:

    
%%R -w 600 -h 400
df = read.delim('ampFrag_skewN90-25-n5-nS_kde.txt', sep='\t') %>%
    gather(taxon, Buoyant_density, Methanosarcina_barkeri_MS, Methylobacterium_extorquens_AM1) 


p = ggplot(df, aes(Buoyant_density, fill=taxon)) +
    geom_density(alpha=0.5) +
    scale_x_continuous(limits=c(1.68, 1.78)) +
    scale_fill_discrete('') +
    theme_bw() +
    theme(
        text = element_text(size=16),
        legend.position = 'bottom'
    )
pl = p + scale_y_log10()

grid.arrange(p, pl, ncol=1)

Making OTU table



In [121]:

    
%%bash 
source activate py27

SIPSim OTU_table \
    ampFrag_skewN90-25-n5-nS_dif_kde_DBL_incorp.pkl \
    comm-n2-unif.txt \
    fracs-n2-unif.txt \
    --abs 1e9 \
    --np 2 \
    > ampFrag_OTU_n2_abs1e9.txt









    



Loading files...
Simulating OTUs...
Processing library: "1"
  Processing taxon: "Methanosarcina_barkeri_MS"
   taxon abs-abundance:  1000000000
  Processing taxon: "Methylobacterium_extorquens_AM1"
   taxon abs-abundance:  0
Processing library: "2"
  Processing taxon: "Methanosarcina_barkeri_MS"
   taxon abs-abundance:  0
  Processing taxon: "Methylobacterium_extorquens_AM1"
   taxon abs-abundance:  1000000000



In [128]:

    
!head ampFrag_OTU_n2_abs1e9.txt









    



library	taxon	fraction	BD_min	BD_mid	BD_max	count	rel_abund
1	Methanosarcina_barkeri_MS	-inf-1.660	-inf	1.659	1.659	255854	1
1	Methanosarcina_barkeri_MS	1.660-1.662	1.66	1.661	1.662	9019	1
1	Methanosarcina_barkeri_MS	1.662-1.672	1.662	1.667	1.672	49983	1
1	Methanosarcina_barkeri_MS	1.672-1.680	1.672	1.676	1.68	46049	1
1	Methanosarcina_barkeri_MS	1.680-1.687	1.68	1.683	1.687	261133	1
1	Methanosarcina_barkeri_MS	1.687-1.694	1.687	1.691	1.694	13270288	1
1	Methanosarcina_barkeri_MS	1.694-1.700	1.694	1.697	1.7	135725442	1
1	Methanosarcina_barkeri_MS	1.700-1.706	1.7	1.703	1.706	404583857	1
1	Methanosarcina_barkeri_MS	1.706-1.714	1.706	1.71	1.714	393225323	1

Plotting OTU abundances



In [132]:

    
%%R -i workDir

# loading file
F = file.path(workDir, 'ampFrag_OTU_n2_abs1e9.txt')
df =  read.delim(F)

df1 = df %>%
    filter(library == 1,
           taxon == 'Methanosarcina_barkeri_MS')
df2 = df %>%
    filter(library == 2,
           taxon == 'Methylobacterium_extorquens_AM1')
df = rbind(df1, df2)

df_16S = df %>% 
    group_by(library) %>%
    mutate(rel_abund = count / max(count)) %>%
    ungroup()
df_16S %>% head(n=3) %>% as.data.frame









    





  library                     taxon    fraction BD_min BD_mid BD_max  count
1       1 Methanosarcina_barkeri_MS  -inf-1.660   -Inf  1.659  1.659 255854
2       1 Methanosarcina_barkeri_MS 1.660-1.662  1.660  1.661  1.662   9019
3       1 Methanosarcina_barkeri_MS 1.662-1.672  1.662  1.667  1.672  49983
     rel_abund
1 6.323881e-04
2 2.229204e-05
3 1.235418e-04



In [133]:

    
%%R -w 500 -h 300

# max possible shift for M barkeri (using mean G+C)
max_shift = 1.727326 + 0.036

# plotting
p_16S = ggplot(df.16S, aes(BD_min, rel_abund, color=taxon)) +
    geom_point(size=3.5) +
    geom_line() +
    scale_x_continuous(limits=c(1.68, 1.78), breaks=seq(1.68, 1.78, 0.02)) +
    scale_color_discrete('') +
    labs(x='CsCl buoyant density [g ml^-1]', y='Ratio of maximum quantities', 
         title='DNA fragments containing 16S rRNA genes') +
    theme_bw() +
    theme( 
        text = element_text(size=16),
        legend.position = 'bottom'
    )
plot(p_16S)

Plotting as in Leuders et al., 2004

Combining data



In [135]:

    
%%R
# simulation data
tmp1 = df_all %>% 
    dplyr::select(taxon, BD_min, count) %>%
    mutate(DNA = 'Total DNA')

tmp2 = df_16S %>% 
    dplyr::select(taxon, BD_min, count) %>%
    mutate(DNA = '16S rRNA genes')

df_j = rbind(tmp1, tmp2) %>%
    group_by(taxon, DNA) %>%
    mutate(rel_abund = count / max(count)) %>%
    ungroup() %>%
    mutate(taxon = gsub('Methanosarcina_barkeri_MS', 'M. barkeri MS', taxon),
           taxon = gsub('Methylobacterium_extorquens_AM1', 'M. extorquens AM1', taxon)) %>%
    dplyr::select(-count) %>%
    mutate(dataset='simulation')
colnames(df_j) = c('Taxon', 'BD', 'DNA', 'conc', 'dataset')
tmp1 = tmp2 = NULL

# status
df_j %>% tail(n=3)









    





# A tibble: 3 × 5
              Taxon    BD            DNA         conc    dataset
              <chr> <dbl>          <chr>        <dbl>      <chr>
1 M. extorquens AM1 1.785 16S rRNA genes 0.0001453175 simulation
2 M. extorquens AM1 1.792 16S rRNA genes 0.0001134474 simulation
3 M. extorquens AM1 1.801 16S rRNA genes 0.0006202007 simulation



In [137]:

    
%%R -w 550 -h 400

df_j$DNA = factor(df_j$DNA, levels=c('Total DNA', '16S rRNA genes'))
x.lab = expression(paste('Buoyant density (g ml' ^ '-1', ')'))

# plotting
p = ggplot(df_j, aes(BD, conc, shape=Taxon)) +
    geom_point(size=3) +
    geom_line() +
    scale_color_discrete('Dataset') +
    scale_x_continuous(limits=c(1.68, 1.78), 
                       breaks=seq(1.68, 1.78, 0.02), 
                       expand=c(0.001,0.001)) +
    labs(x=x.lab, y='Ratio of maximum quantities') +
    facet_grid(DNA ~ .) +
    theme_bw() +
    theme(
        text = element_text(size=16)
    )
p



In [ ]: