Mock community evaluations

This notebook describes how to apply the mock community evaluations presented in (Bokulich, et al. (in preparation)) to reproduce the analyses in that paper, or to extend them to other data sets.

Structuring new results for comparison to precomputed results

To prepare results from another classifier for analysis, you'll need to have BIOM files with taxonomy assignments as an observation metadata category called taxonomy. An example of how to generate these is presented in the data generation notebook in this directory, which was used to generated the precomputed data in the tax-credit repository.

Your BIOM tables should be called table.biom, and nested in the following directory structure:

results_dir/
 mock-community/
  dataset-id/ 
   reference-db-id/
    method-id/
     parameter-combination-id/
      table.biom

results_dir is the name of the top level directory, and you will set this value in the first code cell of the analysis notebooks. You can name this directory whatever you want to. mock-community describes the specific analysis that is being run, and must be named mock-community for the framework to find your results.

This directory structure is identical to that for the precomputed results. You can review that directory structure for an example of how this should look.

Contents

  • Data Generation: retrieve and process mock community data sets for analysis. (These data are stored in tax-credit and will not need to be performed again, unless if preparing new mock community data sets for analysis.)
  • Taxonomy Assignment: Creates and executes commands for generating taxonomic assignments for the mock community contained in this package. The results of running this notebook are included in the repository, so it's not necessary to re-run this. Start here for testing new methods. Examples are provided for the following methods:
  • Analysis: Template for mock community analysis at multiple taxonomic levels.
  • Comparison of taxonomic assignment with different reference databases, for example, can be performed through modifications of the above notebooks to format reference databases to different specifications, assign taxonomy with these databases, and compare classification of mock communities assigned with different reference databases.

In [ ]: