Goal
- model validation: compare the priming experiment HR-SIP sequencing data to simulations of that same data
Method
- Find taxa that are closely related to sequenced genomes
- relatedness based on 16S rRNA
- CD-HIT: OTU rep sequences & complete genome 16S seqeuences
- Select a particular 13C gradient (& matching 12C control):
- Infer total abundance of each target taxon
- Infer total richness of starting community
- Get distribution of total OTU abundances per fraction
- HR-SIP simulation with target taxa abundances set by Priming Exp
- Fill in rest of community with pseudo-genomes
- pseudo-genomes are just fragment-length/GC distributions
- GC distribution as a normal
- GC_dist_mean is drawn from normal distribution (mean=50, sd=7)
- number of pseudo-genomes: gradient community
- abundance distribution: lognormal
- total abundance: 1e9
- subsampled: based on Ashley's dataset
- Use the genomes of target taxa to simulate the subset of the HR-SIP dataset
- Fill in rest of community with pseudo-genomes
- pseudo-genomes are just fragment-length/GC distributions
- GC distribution as a normal
- GC_dist_mean is drawn from normal distribution (mean=50, sd=7)
- number of pseudo-genomes: 5000?
- Simulate HR-SIP data
- abundance distribution: lognormal
- total abundance: 1e9
- subsampled: 20k?