notebook.community

Goal

model validation: compare the priming experiment HR-SIP sequencing data to simulations of that same data

Method

Find taxa that are closely related to sequenced genomes
- relatedness based on 16S rRNA
- CD-HIT: OTU rep sequences & complete genome 16S seqeuences
  - cutoff = 97% seqID
- Select a particular 13C gradient (& matching 12C control):
  - Infer total abundance of each target taxon
  - Infer total richness of starting community
  - Get distribution of total OTU abundances per fraction
HR-SIP simulation with target taxa abundances set by Priming Exp
- Fill in rest of community with pseudo-genomes
  - pseudo-genomes are just fragment-length/GC distributions
    - GC distribution as a normal
      - GC_dist_mean is drawn from normal distribution (mean=50, sd=7)
  - number of pseudo-genomes: gradient community
- abundance distribution: lognormal
- total abundance: 1e9
- subsampled: based on Ashley's dataset



In [ ]:



In [ ]:

OLD

Use the genomes of target taxa to simulate the subset of the HR-SIP dataset
- Use inferred
Fill in rest of community with pseudo-genomes
- pseudo-genomes are just fragment-length/GC distributions
  - GC distribution as a normal
    - GC_dist_mean is drawn from normal distribution (mean=50, sd=7)
- number of pseudo-genomes: 5000?
Simulate HR-SIP data
- abundance distribution: lognormal
- total abundance: 1e9
- subsampled: 20k?