Description

  • simulating amplicons & shotgun reads for each genome
  • for each amplicon/read:
    • simulating a template fragment
    • calculating GC content & buoyant density

workflow

  • Grinder simuation of 1000 amplicons per genome
    • using 515F-927R
    • simulating 454-pyro reads of 215 bp
  • deltaGC
    • using left-skewed normal distribution with mean of 12.5 kb or 6 kb
  • plotting fragment buoyant density distributions

Bacteria

Grinder simulations

@ system76-server:~/notebook/deltaGC_notes/data/amplicon/140411_allBac/amp454-L215-N1k

Amplicions

  • 1k per genome

Grinder run

@ system76-server:~/notebook/deltaGC_notes/data/amplicon/140409_allBac/amp454-L215-N1k

mkdir read_files

for file in `find ~/notebook/deltaGC_notes/data/genome_data/prok-bac-genomes_Ngaps/ -name "*fasta" | perl -pe 's/.+\///'`
do
    grinder -reference_file ~/notebook/deltaGC_notes/data/genome_data/prok-bac-genomes_Ngaps/$file  \
    -abundance_model uniform -forward_reverse 515Fm-927Rm.fna -rd 215 normal 0 \
    -tr 1000 -copy_bias 0 -length_bias 0 -unidirectional 1 -bn read_files/$file
done

    * had to use for loop because some genomes did not produce any amplicons
    * 1092 of the 1163 genomes had amplicons
    * all have 1000 reads

Combining read files

  • leaving out 'HE577327 Azospirillum brasilense Sp245 main chromosome complete genome.'

@ system76-server:~/notebook/deltaGC_notes/data/amplicon/140409_allBac/amp454-L215-N1k

cat $(find read_files -name "*fa" | grep -v "Azospirillum_brasilense_Sp245.fasta-reads.fa") | seqIO.pl -i fasta -o fasta > amp454-L215-N1k_reads.fa
    * 1091000 sequences extracted
    * seqIO.pl used to make line wrapping consistent


Shotgun reads

  • 10k per genome

Grinder run

@ system76-server:~/notebook/deltaGC_notes/data/mg-reads/140409_allBac/mg454-L215-N1k

mkdir read_files

screen -S grinder_batch -L bash -c \
    "find /home/nick/notebook/deltaGC_notes/data/genome_data/prok-bac-genomes_Ngaps/ -name '*fasta' | \
    perl -pe 's/.+\///' | xargs -n 1 -I % -P 28 \
    grinder -reference_file /home/nick/notebook/deltaGC_notes/data/genome_data/prok-bac-genomes_Ngaps/% \
    -total_reads 1000 -bn read_files/% -read_dist 215 -length_bias 0 -unidirectional 1"

    * 1162 genomes produced shotgun reads
    * all have 1000 reads


Combining read files

@ system76-server:~/notebook/deltaGC_notes/data/mg-reads/140409_allBac/mg454-L215-N10k

  • just using genomes that amplified with 515F and 927R primers (consistent with 16S amplicon simulations)
  • leaving out: "Azospirillum_brasilense_Sp245"

@ system76-server:~/notebook/deltaGC_notes/data/mg-reads/140411_allBac/mg454-L215-N10k

grinder_descByFileName.pl ../../140409_allBac/mg454-L215-N10k/read_files/ > mg454-L215-N10k_reads.fa
    * combining read files
    * correcting multiple chromosome problem; description is now the same for each taxon

grep -f <(find ~/notebook/deltaGC_notes/data/amplicon/140409_allBac/amp454-L215-N1k/read_files/ -name "*fa" | perl -pe 's/.+\/|\.fasta-read.+//g; s/\*/\\*/') mg454-L215-N10k_reads.fa | fasta_selectSeqs.pl -f mg454-L215-N10k_reads.fa -l - | seqIO.pl -i fasta -o fasta > mg454-L215-N10k_reads_e.fa

    * 10910000 sequences
    * 10000 sequences per genome

deltaGC using a left-skewed distribution

@ system76-server:~/notebook/deltaGC_notes/data/amplicon/140411_allBac/amp454-L215-N1k

deltaGC.pl -a -g /home/nick/notebook/deltaGC_notes/data/genome_data/prokBac-CompRep.fna -r amp454-L215-N1k_reads.fa -range 4000-20000 -mean 15000 -sd f -t 12 > skewnorm_m15k_r4k-20k.txt

deltaGC.pl -a -g /home/nick/notebook/deltaGC_notes/data/genome_data/prokBac-CompRep.fna -r amp454-L215-N1k_reads.fa -range 4000-11500 -mean 10000 -sd f -df 50 50 -t 12 -i > skewnorm_m10k_r4k-11.5k.txt

binning by buoyant density

deltaGCBin.pl skewnorm_m15k_r4k-20k.txt > skewnorm_m15k_r4k-20k_bin.txt

deltaGCBin.pl skewnorm_m10k_r4k-11.5k.txt > skewnorm_m10k_r4k-11.5k_bin.txt

Analyzing & plotting in R


In [31]:
import PIL
import re
from IPython.display import Image

In [2]:
%load_ext rmagic
%load_ext rpy2.ipython


/opt/anaconda/lib/python2.7/site-packages/pytz/__init__.py:29: UserWarning: Module argparse was already imported from /opt/anaconda/lib/python2.7/argparse.pyc, but /opt/anaconda/lib/python2.7/site-packages is being added to sys.path
  from pkg_resources import resource_stream

In [3]:
%%R
require(ggplot2)
require(reshape)
require(data.table)
require(gridExtra)
require(colorspace)


Loading required package: ggplot2
Loading required package: reshape
Loading required package: data.table
data.table 1.9.2  For help type: help("data.table")
Loading required package: gridExtra
Loading required package: grid
Loading required package: colorspace

In [4]:
%%R
# loading amplicon data 
base.dir = "~/notebook/deltaGC_notes/data/amplicon/140411_allBac/amp454-L215-N1k";
files = c(
    'skewnorm_m15k_r4k-20k',
    'skewnorm_m10k_r4k-11.5k'
    )

tbl.l = list()
for(i in files){
    in.file = paste(c(i, "txt"), collapse=".")
    in.file = paste(c(base.dir, in.file), collapse="/")
    tbl.l[[i]] = fread(in.file, header=T, sep="\t")
    tbl.l[[i]]$file = rep(i, nrow(tbl.l[[i]]))
    }

tbl.bin.l = list()
for(i in files){
    in.file = paste(c(i, "bin.txt"), collapse="_")
    in.file = paste(c(base.dir, in.file), collapse="/")
    tbl.bin.l[[i]] = fread(in.file, header=T, sep="\t")
    tbl.bin.l[[i]]$file = rep(i, nrow(tbl.bin.l[[i]]))
    }


# loading in genome GC table
tbl.genGC = fread("~/notebook/deltaGC_notes/data/genome_data/prok-bac-genomes_Ngaps_GC/prok-bac-genomes_Ngaps_GC.txt",
                    header=F)
## editing table
colnames(tbl.genGC) = c('genome', 'genome_GC')
tbl.genGC[,genome:=gsub(".+\\/|\\.fasta", "", genome)]

## selecting just genomes in other tables
tbl.genGC = subset(tbl.genGC, tbl.genGC$genome %in% unique(tbl.bin.l[[i]]$genome) )

## adding rank
tbl.genGC[,genome_GC_rank:=rank(genome_GC, ties.method='first')]

## merging tables
for(i in names(tbl.bin.l)){
    tbl.bin.l[[i]] = merge(tbl.bin.l[[i]], tbl.genGC, by='genome')
}

## merging lists
tbl.amp = do.call(rbind, tbl.l)
tbl.amp$data = rep('amplicon\nfragments', nrow(tbl.amp))
tbl.l = list()

tbl.bin.amp = do.call(rbind, tbl.bin.l)
tbl.bin.amp$data = rep('amplicon\nfragments', nrow(tbl.bin.amp))
tbl.bin.l = list()


Read 52.2% of 1091000 rows
Read 79.7% of 1091000 rows
Read 1091000 rows and 11 (of 11) columns from 0.151 GB file in 00:00:04

Read 98.1% of 1091000 rows
Read 1091000 rows and 11 (of 11) columns from 0.150 GB file in 00:00:03

In [5]:
%%R
# loading shotgun read data
base.dir = "~/notebook/deltaGC_notes/data/mg-reads/140411_allBac/mg454-L215-N10k";
files = list(
    'skewnorm_m15k_r4k-20k',
    'skewnorm_m10k_r4k-11.5k'
    )

tbl.l = list()
for(i in files){
    in.file = paste(c(i, "txt"), collapse=".")
    in.file = paste(c(base.dir, in.file), collapse="/")
    tbl.l[[i]] = fread(in.file, header=T, sep="\t")
    tbl.l[[i]]$file = rep(i, nrow(tbl.l[[i]]))
    }

tbl.bin.l = list()
for(i in files){
    in.file = paste(c(i, "bin.txt"), collapse="_")
    in.file = paste(c(base.dir, in.file), collapse="/")
    tbl.bin.l[[i]] = fread(in.file, header=T, sep="\t")
    tbl.bin.l[[i]]$file = rep(i, nrow(tbl.bin.l[[i]]))
    }


# loading in genome GC table
tbl.genGC = fread("~/notebook/deltaGC_notes/data/genome_data/prok-bac-genomes_Ngaps_GC/prok-bac-genomes_Ngaps_GC.txt",
                    header=F)
## editing table
colnames(tbl.genGC) = c('genome', 'genome_GC')
tbl.genGC[,genome:=gsub(".+\\/|\\.fasta", "", genome)]

## selecting just genomes in other tables
tbl.genGC = subset(tbl.genGC, tbl.genGC$genome %in% unique(tbl.bin.l[[i]]$genome) )

## adding rank
tbl.genGC[,genome_GC_rank:=rank(genome_GC, ties.method='first')]

## merging tables
for(i in names(tbl.bin.l)){
    tbl.bin.l[[i]] = merge(tbl.bin.l[[i]], tbl.genGC, by='genome')
}

## merging lists
tbl.mg = do.call(rbind, tbl.l)
tbl.mg$data = rep('shotgun\nfragments', nrow(tbl.mg))
tbl.l = list()

tbl.bin.mg = do.call(rbind, tbl.bin.l)
tbl.bin.mg$data = rep('shotgun\nfragments', nrow(tbl.bin.mg))
tbl.bin.l = list()


Read 0.0% of 10910000 rows
Read 6.0% of 10910000 rows
Read 9.0% of 10910000 rows
Read 11.5% of 10910000 rows
Read 14.1% of 10910000 rows
Read 16.8% of 10910000 rows
Read 19.2% of 10910000 rows
Read 21.0% of 10910000 rows
Read 23.0% of 10910000 rows
Read 24.5% of 10910000 rows
Read 26.9% of 10910000 rows
Read 29.4% of 10910000 rows
Read 32.1% of 10910000 rows
Read 34.8% of 10910000 rows
Read 37.8% of 10910000 rows
Read 40.9% of 10910000 rows
Read 44.1% of 10910000 rows
Read 47.5% of 10910000 rows
Read 51.0% of 10910000 rows
Read 54.7% of 10910000 rows
Read 58.7% of 10910000 rows
Read 62.7% of 10910000 rows
Read 67.0% of 10910000 rows
Read 71.6% of 10910000 rows
Read 76.4% of 10910000 rows
Read 80.0% of 10910000 rows
Read 81.3% of 10910000 rows
Read 86.5% of 10910000 rows
Read 92.1% of 10910000 rows
Read 97.9% of 10910000 rows
Read 10910000 rows and 11 (of 11) columns from 1.523 GB file in 00:00:51

Read 0.0% of 10910000 rows
Read 6.8% of 10910000 rows
Read 13.7% of 10910000 rows
Read 20.7% of 10910000 rows
Read 27.8% of 10910000 rows
Read 34.6% of 10910000 rows
Read 41.4% of 10910000 rows
Read 48.2% of 10910000 rows
Read 54.8% of 10910000 rows
Read 61.6% of 10910000 rows
Read 68.2% of 10910000 rows
Read 74.8% of 10910000 rows
Read 81.7% of 10910000 rows
Read 88.8% of 10910000 rows
Read 96.1% of 10910000 rows
Read 10910000 rows and 11 (of 11) columns from 1.512 GB file in 00:00:20

In [6]:
%%R
## merging amp & mg tables
tbl = rbind(tbl.amp, tbl.mg)
tbl.amp = vector()  # clearing memory
tbl.mg = vector()

tbl.bin = rbind(tbl.bin.amp, tbl.bin.mg)
tbl.bin.amp = vector()
tbl.bin.mg = vector()

In [7]:
%%R
# editing $file names
tbl[, file:=gsub("skewnorm_m15k_r4k-20k", "Fragment size\ndistribution: 12.5kb", file)]
tbl[, file:=gsub("skewnorm_m10k_r4k-11.5k", "Fragment size\ndistribution: 6kb", file)]

tbl.bin[, file:=gsub("skewnorm_m15k_r4k-20k", "Fragment size\ndistribution: 12.5kb", file)]
tbl.bin[, file:=gsub("skewnorm_m10k_r4k-11.5k", "Fragment size\ndistribution: 6kb", file)]


                                               genome scaffold           read
     1:                Acaryochloris_marina_MBIC11017       NA             NA
     2:                Acaryochloris_marina_MBIC11017       NA             NA
     3:                Acaryochloris_marina_MBIC11017       NA             NA
     4:                Acaryochloris_marina_MBIC11017       NA             NA
     5:                Acaryochloris_marina_MBIC11017       NA             NA
    ---                                                                      
484400: secondary_endosymbiont_of_Heteropsylla_cubana CP003547 CP003547__9612
484401: secondary_endosymbiont_of_Heteropsylla_cubana       NA             NA
484402: secondary_endosymbiont_of_Heteropsylla_cubana CP003547 CP003547__3381
484403: secondary_endosymbiont_of_Heteropsylla_cubana       NA             NA
484404: secondary_endosymbiont_of_Heteropsylla_cubana CP003547 CP003547__8890
         read_GC read_buoyant_density read_start read_length fragment_GC
     1:       NA                   NA         NA          NA          NA
     2:       NA                   NA         NA          NA          NA
     3:       NA                   NA         NA          NA          NA
     4:       NA                   NA         NA          NA          NA
     5:       NA                   NA         NA          NA          NA
    ---                                                                 
484400: 41.39535             1.700567     526604         215    49.07049
484401:       NA                   NA         NA          NA          NA
484402: 12.55814             1.672307    1698730         215    30.88353
484403:       NA                   NA         NA          NA          NA
484404: 55.81395             1.714698     523505         215    31.53460
        fragment_buoyant_density fragment_start fragment_length bin_min bin_max
     1:                       NA             NA              NA   1.701   1.702
     2:                       NA             NA              NA   1.715   1.716
     3:                       NA             NA              NA   1.725   1.726
     4:                       NA             NA              NA   1.759   1.760
     5:                       NA             NA              NA   1.768   1.769
    ---                                                                        
484400:                 1.708089         523345            5164   1.708   1.709
484401:                       NA             NA              NA   1.766   1.767
484402:                 1.690266        1697065            6259   1.672   1.673
484403:                       NA             NA              NA   1.748   1.749
484404:                 1.690904         518752            6171   1.714   1.715
        read_count frag_count   median median_rank
     1:          0          0 1.708869         511
     2:          0          0 1.708869         511
     3:          0          0 1.708869         511
     4:          0          0 1.708869         511
     5:          0          0 1.708869         511
    ---                                           
484400:         16          2 1.687726          47
484401:          0          0 1.687726          47
484402:         90          0 1.687726          47
484403:          0          0 1.687726          47
484404:          3          0 1.687726          47
                                       file genome_GC genome_GC_rank
     1: Fragment size\ndistribution: 12.5kb 0.4647920            519
     2: Fragment size\ndistribution: 12.5kb 0.4647920            519
     3: Fragment size\ndistribution: 12.5kb 0.4647920            519
     4: Fragment size\ndistribution: 12.5kb 0.4647920            519
     5: Fragment size\ndistribution: 12.5kb 0.4647920            519
    ---                                                             
484400:    Fragment size\ndistribution: 6kb 0.2842883             59
484401:    Fragment size\ndistribution: 6kb 0.2842883             59
484402:    Fragment size\ndistribution: 6kb 0.2842883             59
484403:    Fragment size\ndistribution: 6kb 0.2842883             59
484404:    Fragment size\ndistribution: 6kb 0.2842883             59
                       data
     1: amplicon\nfragments
     2: amplicon\nfragments
     3: amplicon\nfragments
     4: amplicon\nfragments
     5: amplicon\nfragments
    ---                    
484400:  shotgun\nfragments
484401:  shotgun\nfragments
484402:  shotgun\nfragments
484403:  shotgun\nfragments
484404:  shotgun\nfragments

Simulating label incorporation


In [8]:
%%R
# simulating label incorporation by moving fragment buoyant density
## 100% 15N
tbl[, label_inc_15N := fragment_buoyant_density + 0.016]

## 100% 13C
tbl[, label_inc_13C := fragment_buoyant_density + 0.036]


                                 genome scaffold           read  read_GC
       1:  Agrobacterium_fabrum_str_C58 AE007869  AE007869__499 52.55814
       2:  Agrobacterium_fabrum_str_C58 AE007869  AE007869__498 52.55814
       3:  Agrobacterium_fabrum_str_C58 AE007869  AE007869__495 52.55814
       4:  Agrobacterium_fabrum_str_C58 AE007869  AE007869__497 52.55814
       5:  Agrobacterium_fabrum_str_C58 AE007869  AE007869__496 52.55814
      ---                                                               
24001996: Salmonella_bongori_NCTC_12419 FR877557 FR877557__2790 57.20930
24001997: Salmonella_bongori_NCTC_12419 FR877557 FR877557__2797 29.76744
24001998: Salmonella_bongori_NCTC_12419 FR877557 FR877557__2796 44.18605
24001999: Salmonella_bongori_NCTC_12419 FR877557 FR877557__2795 50.23256
24002000: Salmonella_bongori_NCTC_12419 FR877557 FR877557__2794 53.95349
          read_buoyant_density read_start read_length fragment_GC
       1:             1.711507     114437         215    53.02019
       2:             1.711507     114437         215    55.41002
       3:             1.711507    5035543         215    52.92066
       4:             1.711507    5035543         215    53.45668
       5:             1.711507     114437         215    56.14006
      ---                                                        
24001996:             1.716065    1863392         215    53.96161
24001997:             1.689172    8744011         215    44.52086
24001998:             1.703302    8668917         215    46.68961
24001999:             1.709228    6849897         215    54.76029
24002000:             1.712874    3975166         215    45.70962
          fragment_buoyant_density fragment_start fragment_length
       1:                 1.711960         114273           12284
       2:                 1.714302         107900           14048
       3:                 1.711862        5024735           14911
       4:                 1.712388        5030946           10053
       5:                 1.715017         107683           12109
      ---                                                        
24001996:                 1.712882        1861101            5730
24001997:                 1.703630        8742874            6543
24001998:                 1.705756        8667460            6102
24001999:                 1.713665        6849320            5882
24002000:                 1.704795        3972279            6130
                                         file                data label_inc_15N
       1: Fragment size\ndistribution: 12.5kb amplicon\nfragments      1.727960
       2: Fragment size\ndistribution: 12.5kb amplicon\nfragments      1.730302
       3: Fragment size\ndistribution: 12.5kb amplicon\nfragments      1.727862
       4: Fragment size\ndistribution: 12.5kb amplicon\nfragments      1.728388
       5: Fragment size\ndistribution: 12.5kb amplicon\nfragments      1.731017
      ---                                                                      
24001996:    Fragment size\ndistribution: 6kb  shotgun\nfragments      1.728882
24001997:    Fragment size\ndistribution: 6kb  shotgun\nfragments      1.719630
24001998:    Fragment size\ndistribution: 6kb  shotgun\nfragments      1.721756
24001999:    Fragment size\ndistribution: 6kb  shotgun\nfragments      1.729665
24002000:    Fragment size\ndistribution: 6kb  shotgun\nfragments      1.720795
          label_inc_13C
       1:      1.747960
       2:      1.750302
       3:      1.747862
       4:      1.748388
       5:      1.751017
      ---              
24001996:      1.748882
24001997:      1.739630
24001998:      1.741756
24001999:      1.749665
24002000:      1.740795

Making heatmap of buoyant density: 12.5 kb fragment size


In [9]:
%%R
# parsing out rows & columns needed for fragment density plot
tbl.p = subset(tbl, file=="Fragment size\ndistribution: 12.5kb")
tbl.p = tbl.p[, c('data','fragment_buoyant_density', 'label_inc_15N', 'label_inc_13C'), with=F]

## editing data column
tbl.p[data == 'amplicon\nfragments', data:='amplicon-fragments']
tbl.p[data == 'shotgun\nfragments', data:='shotgun-fragments']

## melting
tbl.m = melt(tbl.p, id.vars=c('data'))

In [10]:
%%R -w 170 -h 300
## making the density plots for amplicons & shotgun sequences
make_density_plot <- function(tbl, x.axis=TRUE){
    p = ggplot(tbl, aes(value, group=variable, fill=variable, 
                                 alpha=variable, linetype=variable)) +   
            geom_density() +
            scale_linetype_manual(values=c('solid', 'dashed', 'dashed')) +
            scale_fill_manual(values=c('blue', 'red', 'green')) +
            scale_alpha_manual(values=c(0.7, 0.15, 0.15)) +
            scale_x_reverse(limits=c(1.77,1.66)) +
            coord_flip() + 
            facet_grid(data ~ .) +
            labs(y='Kernel density') +
            theme( 
                text = element_text(size=22),
                axis.title.y = element_blank(),
                axis.title.x = element_text( vjust=1 ),
                axis.text.x = element_blank(),
                axis.ticks.x = element_blank(),
                axis.text.y = element_blank(),
                axis.ticks.y = element_blank(),
                strip.text.y = element_text(size=22),
                legend.position = 'none'
                )
    # removing x axis title if needed
    if(x.axis==FALSE){   # no x-axis, top plot
        p = p + theme( axis.title.x = element_blank() )
        p = p + theme( plot.margin = unit(c(1,1,0,0), "lines") ) # top, right, bottom, left
    } else {
        p = p + theme( plot.margin = unit(c(0,1,1,0), "lines") ) # top, right, bottom, left
    }
    return(p)
}

### amplicon
p.dense.amp = make_density_plot( tbl.m[tbl.m$data=='amplicon-fragments', ], x.axis=FALSE)
#print(p.dense.amp)
### shotgun
p.dense.shotgun = make_density_plot( tbl.m[tbl.m$data=='shotgun-fragments', ] )
#print(p.dense.shotgun)
xxx=1

In [11]:
%%R
# parsing out columns & rows needed for heatmap plotting
tbl.bin.p = subset(tbl.bin, file=="Fragment size\ndistribution: 12.5kb")
tbl.bin.p = tbl.bin.p[, c('data', 'genome_GC_rank', 'bin_min', 'frag_count'), with=F]

## editing data column
tbl.bin.p[data == 'amplicon\nfragments', data:='amplicon-fragments']
tbl.bin.p[data == 'shotgun\nfragments', data:='shotgun-fragments']


                      data genome_GC_rank bin_min frag_count
     1: amplicon-fragments            519   1.701          0
     2: amplicon-fragments            519   1.715          0
     3: amplicon-fragments            519   1.725          0
     4: amplicon-fragments            519   1.759          0
     5: amplicon-fragments            519   1.768          0
    ---                                                     
242198:  shotgun-fragments             59   1.708          0
242199:  shotgun-fragments             59   1.766          0
242200:  shotgun-fragments             59   1.672          0
242201:  shotgun-fragments             59   1.748          0
242202:  shotgun-fragments             59   1.714          0

In [12]:
%%R
#--- OLD heatmap color schemes ---#
    ## heatmap colors: black only for lowest value
    cols = c(
    rep("#000000", 1),
    rep("darkblue", 99),
    rep("#1F1FFF", 100),
    rep("#3F3FFF", 100),
    rep("#5F5FFF", 100),
    rep("#7F7FFF", 100),
    rep("#9F9FFF", 100),
    rep("#BFBFFF", 100),
    rep("#DFDFFF", 100),
    rep("#FFFFFF", 100)
    )
    
    ## black and white
    cols = c(
    rep("#FFFFFF", 1),
    rep("#E6E6E6", 99),
    rep("#D9D9D9", 100),
    rep("#CCCCCC", 100),    
    rep("#BEBEBE", 100),    
    rep("#AEAEAE", 100),
    rep("#9D9D9D", 100),    
    rep("#888888", 100),    
    rep("#6F6F6F", 100),
    rep("#4D4D4D", 100)
    )

    # rev(heat_hcl)
    cols=c(
    rep("#FFFFFF", 1),
    rep("#E5D961", 99),
    rep("#E8C842", 100),
    rep("#E9B62D", 100),
    rep("#EAA428", 100),
    rep("#E89132", 100),
    rep("#E57E41", 100),
    rep("#E06B50", 100),
    rep("#DA565E", 100),
    rep("#D33F6A", 100)
    )

In [13]:
%%R -h 250 -w 900
# making heatmaps of buoyant density distributions
make_density_heatmap <- function(tbl, x.axis=TRUE){

    ## heatmap color scheme
    cols = c('white', rev(heat_hcl(99)) )
    
    ## heatmap plotting
    p.heat = ggplot(tbl, aes(genome_GC_rank, bin_min, fill=frag_count)) +
        geom_tile() +
        scale_alpha_continuous( guide=FALSE ) +
        scale_fill_gradientn( colours=cols, name='Count' ) +
        scale_x_continuous( expand = c(0,0) ) +
        scale_y_reverse( expand = c(0,0), limits=c(1.77,1.66) ) +
        labs( x='Genome (ordered by genome G+C content)', y='Fragment buoyant density') +
        #facet_grid(data ~ .) +
        theme(
            text = element_text(size=22),
            axis.text.x = element_blank(),
            axis.text.y = element_text(color='black'),            
            axis.ticks.x = element_blank(),
            axis.title.y = element_blank(),
            legend.justification=c(0,0),
            legend.position=c(0,0),
            legend.text = element_text(size=14),
            plot.margin = unit(c(0,0,0,1), "lines") # top, right, bottom, left
            )
    # removing x axis title if needed
    if(x.axis==FALSE){  # no x-axis; top plot
        p.heat = p.heat + theme( axis.title.x = element_blank() )
        p.heat = p.heat + theme( plot.margin = unit(c(1,0,0,1), "lines") ) # top, right, bottom, left
    } else {
        p.heat = p.heat + theme( plot.margin = unit(c(0,0,1,1), "lines") ) # top, right, bottom, left
    }
    
    return(p.heat)
}


### amplicon
p.heat.amp = make_density_heatmap( tbl.bin.p[tbl.bin.p$data=='amplicon-fragments', ], x.axis=FALSE )
#print(p.heat.amp)
### shotgun
p.heat.shotgun = make_density_heatmap( tbl.bin.p[tbl.bin.p$data=='shotgun-fragments', ] )
#print(p.heat.shotgun)
xxx=1

In [16]:
%%R -h 8 -w 16 -u in
# combined plots
## combined y axis
y.axis.t = expression(paste('Fragment buoyant density (g ', ml^-1, ')'))
y.axis.t = textGrob(y.axis.t, vjust = 1, rot=90, gp=gpar(cex = 1.9))

## plotting 
tiff( "/home/nick/notebook/deltaGC_notes/data/amp-mg/figures/dense-htmp_12.5kb/dense-htmp_12.5kb.tiff", 
        width=17, height=8, units='in', res=300 )
grid.arrange(p.heat.amp, p.dense.amp, p.heat.shotgun, p.dense.shotgun, 
                 nrow=2, ncol=2, widths=c(7,1), left = y.axis.t)
dev.off()


png 
  2 

In [33]:
# diplaying figure
tiff_file = "/home/nick/notebook/deltaGC_notes/data/amp-mg/figures/dense-htmp_12.5kb/dense-htmp_12.5kb.tiff"
img = PIL.Image.open(tiff_file)
png_file = re.sub(r'\.tiff$', r'.png', tiff_file)
img.save(png_file)

Image(filename=png_file)


Out[33]:

Making heatmap of buoyant density: 6kb fragment size


In [34]:
%%R
# parsing out rows & columns needed for fragment density plot
tbl.p = subset(tbl, file=="Fragment size\ndistribution: 6kb")
tbl.p = tbl.p[, c('data','fragment_buoyant_density', 'label_inc_15N', 'label_inc_13C'), with=F]

## editing data column
tbl.p[data == 'amplicon\nfragments', data:='amplicon-fragments']
tbl.p[data == 'shotgun\nfragments', data:='shotgun-fragments']

## melting
tbl.m = melt(tbl.p, id.vars=c('data'))

In [35]:
%%R -w 170 -h 300
# making density plots
### amplicon
p.dense.amp = make_density_plot( tbl.m[tbl.m$data=='amplicon-fragments', ], x.axis=FALSE)
print(p.dense.amp)
### shotgun
p.dense.shotgun = make_density_plot( tbl.m[tbl.m$data=='shotgun-fragments', ] )
print(p.dense.shotgun)



In [36]:
%%R
# parsing out columns & rows needed for heatmap plotting
tbl.bin.p = subset(tbl.bin, file=="Fragment size\ndistribution: 6kb")
tbl.bin.p = tbl.bin.p[, c('data', 'genome_GC_rank', 'bin_min', 'frag_count'), with=F]

## editing data column
tbl.bin.p[data == 'amplicon\nfragments', data:='amplicon-fragments']
tbl.bin.p[data == 'shotgun\nfragments', data:='shotgun-fragments']


                      data genome_GC_rank bin_min frag_count
     1: amplicon-fragments            519   1.701          0
     2: amplicon-fragments            519   1.715          0
     3: amplicon-fragments            519   1.725          0
     4: amplicon-fragments            519   1.759          0
     5: amplicon-fragments            519   1.768          0
    ---                                                     
242198:  shotgun-fragments             59   1.708          2
242199:  shotgun-fragments             59   1.766          0
242200:  shotgun-fragments             59   1.672          0
242201:  shotgun-fragments             59   1.748          0
242202:  shotgun-fragments             59   1.714          0

In [37]:
%%R -h 4 -w 16 -u in
# making density heatmaps 
### amplicon
p.heat.amp = make_density_heatmap( tbl.bin.p[tbl.bin.p$data=='amplicon-fragments', ], x.axis=FALSE )
print(p.heat.amp)
### shotgun
p.heat.shotgun = make_density_heatmap( tbl.bin.p[tbl.bin.p$data=='shotgun-fragments', ] )
print(p.heat.shotgun)



In [38]:
%%R -h 8 -w 16 -u in
# combined plots
## combined y axis
y.axis.t = expression(paste('Fragment buoyant density (g ', ml^-1, ')'))
y.axis.t = textGrob(y.axis.t, vjust = 1, rot=90, gp=gpar(cex = 1.75))

## plotting (save to file)
tiff_file = "/home/nick/notebook/deltaGC_notes/data/amp-mg/figures/dense-htmp_6kb/dense-htmp_6kb.tiff"
tiff( tiff_file, width=17, height=8, units='in', res=300 )
grid.arrange(p.heat.amp, p.dense.amp, p.heat.shotgun, p.dense.shotgun, 
                 nrow=2, ncol=2, widths=c(7,1), left = y.axis.t)
dev.off()


png 
  2 

In [39]:
# diplaying figure
tiff_file = "/home/nick/notebook/deltaGC_notes/data/amp-mg/figures/dense-htmp_6kb/dense-htmp_6kb.tiff"
img = PIL.Image.open(tiff_file)
png_file = re.sub(r'\.tiff$', r'.png', tiff_file)
img.save(png_file)

Image(filename=png_file)


Out[39]: