Goal

  • model validation: compare the priming experiment HR-SIP sequencing data to simulations of that same data

Method

  • Find taxa that are closely related to sequenced genomes
    • relatedness based on 16S rRNA
    • CD-HIT: OTU rep sequences & complete genome 16S seqeuences
      • cutoff = 97% seqID
    • Select a particular 13C gradient (& matching 12C control):
      • Infer total abundance of each target taxon
      • Infer total richness of starting community
      • Get distribution of total OTU abundances per fraction
  • HR-SIP simulation with target taxa abundances set by Priming Exp
    • Fill in rest of community with pseudo-genomes
      • pseudo-genomes are just fragment-length/GC distributions
        • GC distribution as a normal
          • GC_dist_mean is drawn from normal distribution (mean=50, sd=7)
      • number of pseudo-genomes: gradient community
    • abundance distribution: lognormal
    • total abundance: 1e9
    • subsampled: based on Ashley's dataset

In [ ]:


In [ ]:

OLD

  • Use the genomes of target taxa to simulate the subset of the HR-SIP dataset
    • Use inferred
  • Fill in rest of community with pseudo-genomes
    • pseudo-genomes are just fragment-length/GC distributions
      • GC distribution as a normal
        • GC_dist_mean is drawn from normal distribution (mean=50, sd=7)
    • number of pseudo-genomes: 5000?
  • Simulate HR-SIP data
    • abundance distribution: lognormal
    • total abundance: 1e9
    • subsampled: 20k?