Description

Simulating CsCl gradient to reproduce the results in:

Lueders T, Manefield M, Friedrich MW. (2004). Enhanced sensitivity of DNA- and rRNA-based stable isotope probing by fractionation and quantitative analysis of isopycnic centrifugation gradients. Environmental Microbiology 6:73–78.
rotor:
- TV865
rotor speed:
- 45000 rpm
spin time:
- >36 hr
gradient average density:
- 1.725 g/ml
Fraction sizes:
- 400 ul
Used gDNA
- 13C-labeled
- 13C-methanol
- 5 ug
strains
- Methylobacterium extorquens AM1 DSM 1338
- M. barkeri DSM 800

Setting variables



In [30]:

    
workDir = "/home/nick/notebook/SIPSim/t/M.bark_M.ext/"

Init



In [1]:

    
import os
import sys
%load_ext rpy2.ipython



In [2]:

    
%%R
library(ggplot2)
library(dplyr)
library(tidyr)









    





Attaching package: ‘dplyr’

The following object is masked from ‘package:stats’:

    filter

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Downloading genomes

M. barkeri DSM 800
- RefSeq = NC_007355.1, NC_007349.1
Methylobacterium extorquens AM1 DSM 1338
- RefSeq = NC_012808.1, NC_012811.1, NC_012807.1, NC_012809.1, NC_012810.1



In [4]:

    
!cd $workDir; \
    seqDB_tools accession-GI2fasta < M.barkeri_refseq.txt > M.barkeri.fna









    



Starting batch: 1
Starting trial: 1

--------------------- WARNING ---------------------
MSG: No whitespace allowed in FASTA ID [NC_007355|Methanosarcina barkeri str. Fusaro, complete genome.]
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No whitespace allowed in FASTA ID [NC_007355|Methanosarcina barkeri str. Fusaro, complete genome.]
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No whitespace allowed in FASTA ID [NC_007349|Methanosarcina barkeri str. fusaro plasmid 1, complete sequence.]
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No whitespace allowed in FASTA ID [NC_007349|Methanosarcina barkeri str. fusaro plasmid 1, complete sequence.]
---------------------------------------------------



In [5]:

    
!cd $workDir; \
    seqDB_tools accession-GI2fasta < M.extorquens_AM1_refseq.txt > M.extorquens_AM1.fna









    



Starting batch: 1
Starting trial: 1

--------------------- WARNING ---------------------
MSG: No whitespace allowed in FASTA ID [NC_012808|Methylobacterium extorquens AM1, complete genome.]
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No whitespace allowed in FASTA ID [NC_012808|Methylobacterium extorquens AM1, complete genome.]
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No whitespace allowed in FASTA ID [NC_012811|Methylobacterium extorquens AM1 megaplasmid, complete sequence.]
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No whitespace allowed in FASTA ID [NC_012811|Methylobacterium extorquens AM1 megaplasmid, complete sequence.]
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No whitespace allowed in FASTA ID [NC_012807|Methylobacterium extorquens AM1 plasmid p1META1, complete sequence.]
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No whitespace allowed in FASTA ID [NC_012807|Methylobacterium extorquens AM1 plasmid p1META1, complete sequence.]
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No whitespace allowed in FASTA ID [NC_012809|Methylobacterium extorquens AM1 plasmid p2META1, complete sequence.]
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No whitespace allowed in FASTA ID [NC_012809|Methylobacterium extorquens AM1 plasmid p2META1, complete sequence.]
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No whitespace allowed in FASTA ID [NC_012810|Methylobacterium extorquens AM1 plasmid p3META1, complete sequence.]
---------------------------------------------------

--------------------- WARNING ---------------------
MSG: No whitespace allowed in FASTA ID [NC_012810|Methylobacterium extorquens AM1 plasmid p3META1, complete sequence.]
---------------------------------------------------

Indexing genomes



In [12]:

    
# renaming genome sequences
!cd $workDir; \
    find . -name "*_rn.fna" |\
    xargs -I % rm -f %
    
!cd $workDir; \
    find . -name "*.fna" |\
    perl -pe 's/\.fna$//' | \
    xargs -P 2 -I % bash -c \
    "SIPSim renameGenomes %.fna > %_rn.fna"



In [13]:

    
# list of all genomes files and their associated names
!cd $workDir; \
    find . -name "*_rn.fna" | \
    perl -pe 's/.+\///' | \
    perl -pe 's/(.+)(\.[^.]+)/\$1\t\$1\$2/' > genomes_all_list.txt
    
!cd $workDir; head genomes_all_list.txt









    



M.extorquens_AM1_rn	M.extorquens_AM1_rn.fna
M.barkeri_rn	M.barkeri_rn.fna



In [15]:

    
!cd $workDir; \
    SIPSim indexGenomes genomes_all_list.txt \
    --np 2 > index_log.txt









    



Indexing: "M.extorquens_AM1_rn"
Indexing: "M.barkeri_rn"

Simulating a gradient community



In [26]:

    
!cd $workDir; \
    SIPSim gradientComms \
    --n_comm 1 \
    --abund_dist uniform \
    --abund_dist_p low:1,high:1 \
    genomes_all_list.txt > comm-n1-unif.txt
    
!cd $workDir; tail comm-n1-unif.txt









    



library	taxon_name	rel_abund_perc	rank
1	M.extorquens_AM1_rn	50.000000000	1
1	M.barkeri_rn	50.000000000	2

Simulating isotope incorporation



In [27]:

    
# making config file
config = """
[library 1]
  # baseline: no incorp
  
  [[intraPopDist 1]]
  distribution = uniform

    [[[start]]]

      [[[[interPopDist 1]]]]
        distribution = uniform
        start = 0
        end = 0

    [[[end]]]

      [[[[interPopDist 1]]]]
        distribution = uniform
        start = 0
        end = 0
"""

outfile = os.path.join(workDir, 'incorp.config')

outf = open(outfile, 'wb')
outf.write(config)
outf.close()



In [28]:

    
!cd $workDir; \
    SIPSim isoIncorp \
    comm-n1-unif.txt incorp.config \
    > incorp-n1-unif.txt
    
!cd $workDir; head incorp-n1-unif.txt









    



library	taxon_name	distribution_index	distribution	weight	param	param_value
1	M.barkeri_rn	1	uniform	1.0	start	0.0
1	M.barkeri_rn	1	uniform	1.0	end	0.0
1	M.extorquens_AM1_rn	1	uniform	1.0	start	0.0
1	M.extorquens_AM1_rn	1	uniform	1.0	end	0.0



In [87]:

    
%%bash -s "$workDir"
# adding incorp 100% library for N.extorquens
cd $1

cat incorp-n1-unif.txt \
    <(tail -n +2 incorp-n1-unif.txt | \
    perl -pe 's/^1/2/' | \
    perl -pe 's/0\.0/100.0/ if /M.extorquens/') \
    > incorp-n2-unif.txt
    
    
head incorp-n2-unif.txt









    



library	taxon_name	distribution_index	distribution	weight	param	param_value
1	M.barkeri_rn	1	uniform	1.0	start	0.0
1	M.barkeri_rn	1	uniform	1.0	end	0.0
1	M.extorquens_AM1_rn	1	uniform	1.0	start	0.0
1	M.extorquens_AM1_rn	1	uniform	1.0	end	0.0
2	M.barkeri_rn	1	uniform	1.0	start	0.0
2	M.barkeri_rn	1	uniform	1.0	end	0.0
2	M.extorquens_AM1_rn	1	uniform	1.0	start	100.0
2	M.extorquens_AM1_rn	1	uniform	1.0	end	100.0

Resimulating community (n=2)



In [90]:

    
!cd $workDir; \
    SIPSim gradientComms \
    --n_comm 2 \
    --abund_dist uniform \
    --abund_dist_p low:1,high:1 \
    genomes_all_list.txt > comm-n2-unif.txt
    
!cd $workDir; tail comm-n2-unif.txt









    



library	taxon_name	rel_abund_perc	rank
1	M.extorquens_AM1_rn	50.000000000	1
1	M.barkeri_rn	50.000000000	2
2	M.extorquens_AM1_rn	50.000000000	1
2	M.barkeri_rn	50.000000000	2

Simulating gradient fractions



In [97]:

    
!cd $workDir; \
    SIPSim fractions \
    comm-n2-unif.txt \
    > fracs-n2-unif.txt
    
!cd $workDir; head fracs-n2-unif.txt









    



library	fraction	BD_min	BD_max	fraction_size
1	1	1.66	1.666	0.006
1	2	1.666	1.67	0.004
1	3	1.67	1.671	0.001
1	4	1.671	1.676	0.005
1	5	1.676	1.678	0.002
1	6	1.678	1.681	0.003
1	7	1.681	1.686	0.005
1	8	1.686	1.69	0.004
1	9	1.69	1.694	0.004

Simulating shotgun fragments & calculating GC



In [38]:

    
!cd $workDir;\
    SIPSim fragGC \
    genomes_all_list.txt \
    --flr 500,None \
    --fld skewed-normal,9000,2500,-5 \
    --nf 50x \
    --np 24 \
    2> shotFragGC_skewN90-25-n5-nS.log \
    > shotFragGC_skewN90-25-n5-nS.pkl
    
!cd $workDir; head shotFragGC_skewN90-25-n5-nS.log









    



Processing: "M.extorquens_AM1_rn"
Processing: "M.barkeri_rn"
  Genome name: M.barkeri_rn
  Genome length (bp): 4873766
  Number of amplicons: None
  Number of fragments simulated: 34679
  Genome name: M.extorquens_AM1_rn
  Genome length (bp): 6879778
  Number of amplicons: None
  Number of fragments simulated: 48864

Creating OTU table



In [ ]:

    
!cd $workDir; \
    SIPSim OTU_sim \
    shotFragGC_skewN90-25-n5-nS.pkl \
    comm-n2-unif.txt \
    incorp-n2-unif.txt \
    fracs-n2-unif.txt \
    --abs_abund 2e8 \
    2> OTU-n2-unif_skewN90-25-n5-nS_A2e9.log \
    > OTU-n2-unif_skewN90-25-n5-nS_A2e9.txt
    
!cd $workDir; head OTU-n2-unif_skewN90-25-n5-nS_A2e9.log

Plotting OTU abundances



In [ ]:

    
%%R -i workDir

# loading file
inFiles = c('OTU-n2-unif_skewN90-25-n5-nS_A2e9.txt')

inFiles = sapply(inFiles, function(x){
            x = as.character(x)
            paste(c(workDir, x), collapse='/')
    })


tbls = list()
for (fileName in inFiles){
    tbls[[fileName]] = read.csv(fileName, sep='\t')
    }
tbl = do.call(rbind, tbls)
tbl$abs_abund = as.numeric(gsub('.+-nS_A|\\.txt\\.[0-9]+', '', rownames(tbl)))

tbl = tbl %>% 
    filter(!grepl('inf', fractions, ignore.case=T)) %>%
    separate(fractions, into = c('BD_min','BD_max'), sep='-', convert=TRUE) %>%
    filter(BD_min != 1.795)



In [ ]:

    
%%R
## BD for G+C of 0 or 100
BD.GCp0 = 0 * 0.098 + 1.66
BD.GCp100 = 1 * 0.098 + 1.66



In [ ]:

    
%%R -w 800 -h 400
# plotting absolute abundances

tbl.s = tbl %>%
    mutate(BD_mean = (BD_min + BD_max) / 2) %>%
    group_by(abs_abund, BD_mean, library, taxon) %>%
    summarize(total_count = sum(count))


## plot
p = ggplot(tbl.s, aes(BD_mean, total_count, shape=taxon, color=taxon)) +
    geom_point() +
    geom_line() +
    scale_x_continuous(limits=c(1.68,1.78), breaks=seq(1.68,1.78,0.02)) +
    labs(x='Buoyant density') +
    facet_grid(library ~ .) +
    theme( 
        text = element_text(size=16) 
    )
p



In [ ]:



In [ ]: