Taxonomic assignment method evaluation framework

This IPython Notebook illustrates how to apply the evaluation framework described in (Bokulich, Rideout, et al. (in preparation)). Given a set of precomputed taxonomic assignment method evaluation results (this will likely be the ones included in the short-read-tax-assignment GitHub repository) and a set of results generated by the user for a new method, the results from the new method are analyzed in the context of the precomputed results. This allows users to rapidly determine how a new method performs, relative to the precomputed results.

The following sections present different evaluations with an example usearch-based taxonomic assigner. A parameter sweep was performed using this new taxonomic assigner (as described in the usearch-tax-assigner notebook), and the evaluations that are run here allow us to determine how the new assigner performs relative to the precomputed results and summarize the results in a report, which will be this notebook after all cells have been executed. This provides a convenient interactive framework for analyzing new methods.

This notebook could easily be applied to the results of a different taxonomic assigner, and it is therefore intended to serve as a template for report generation. Only one change would need to be made to support that, and that change is discussed inline below.

This code and other components of this framework can be found in the short-read-tax-assignment GitHub repository.

Requirements

To run this notebook, you'll need to have the following installed:

Terminology

Throughout the notebook and associated code, we refer to the precomputed results obtained from GitHub as the subject results, and the results generated by the user as the query results.

Structuring query results for comparison to subject results

To prepare results from another classifier for analysis, you'll need to have BIOM files with taxonomy assignments as an observation metadata category called taxonomy. These should be generated for all analyses (mock community, natural community), datasets, reference databases, methods, and parameter combinations of interest. An example of how to generate these is presented in the usearch-tax-assigner notebook).

Your BIOM tables should be called table.biom, and nested in the following directory structure:

query_results_dir/
 analysis/
  dataset-id/ 
   reference-db-id/
    method-id/
     parameter-combination-id/
      table.biom

query_results_dir is the name of the top level directory, and you will set this value in the first code cell of this notebook. You can name this directory whatever you want to.

This directory structure is identical to that for the subject results. You can review that directory structure for an example of how this should look.

Configure local environment-specific values

This is the only cell that you will need to edit to generate reports locally.


In [1]:
## short_read_tax_dir should be the directory where you've downloaded (or cloned) the 
## short-read-tax-assignment repository. 
short_read_tax_dir = "/home/jrideout/dev/short-read-tax-assignment/"

## query_results_dir should be the directory where the results of the method(s) to be
## compared to the pre-computed results can be found. 
query_results_dir = "/home/jrideout/dev/short-read-tax-assignment/usearch_parameter_sweep/"

Prepare the environment

First we'll configure IPython to add matplotlib plots inline. Then we'll import various functions that we'll need for generating the report.


In [2]:
%matplotlib inline

from os.path import join, exists
from taxcompare.eval_framework import (get_expected_tables_lookup, 
                                       find_and_process_result_tables,
                                       compute_prfs,
                                       compute_pearson_spearman,
                                       compute_procrustes,
                                       generate_pr_scatter_plots,
                                       generate_prf_box_plots,
                                       generate_precision_box_plots,
                                       generate_recall_box_plots,
                                       generate_fmeasure_box_plots,
                                       generate_prf_table,
                                       generate_correlation_box_plots,
                                       generate_pearson_box_plots,
                                       generate_spearman_box_plots,
                                       generate_pearson_spearman_table,
                                       generate_procrustes_table)
from cogent.draw.distribution_plots import generate_box_plots

In [3]:
# Define the subdirectories where the query mock and natural community data should be, and confirm that they exist.
mock_query_results_dir = join(query_results_dir,"mock")
natural_query_results_dir = join(query_results_dir,"natural")

assert exists(mock_query_results_dir), "Mock community query directory doesn't exist: %s" % mock_query_results_dir
assert exists(natural_query_results_dir), "Natural community query directory doesn't exist: %s" % natural_query_results_dir

# Define the subdirectories where the subject mock and natural community data should be, and confirm that they exist.
mock_subject_results_dir = join(short_read_tax_dir,'data/eval-subject-results/mock/')
natural_subject_results_dir = join(short_read_tax_dir,'data/eval-subject-results/natural/')

assert exists(mock_query_results_dir), "Mock community subject directory doesn't exist: %s" % mock_subject_results_dir
assert exists(natural_query_results_dir), "Natural community subject directory doesn't exist: %s" % natural_subject_results_dir

Find mock community pre-computed tables, expected tables, and "query" tables

Next we'll use the paths defined above to find all of the tables that will be compared. These include the pre-computed result tables (i.e., the ones that the new methods will be compared to), the expected result tables (i.e., the tables containing the known composition of the mock microbial communities), and the query result tables (i.e., the tables generated with the new method(s) that we want to compare to the pre-computed result tables).


In [4]:
query_results = find_and_process_result_tables(mock_query_results_dir)
subject_results = find_and_process_result_tables(mock_subject_results_dir)
expected_L6_tables = get_expected_tables_lookup(mock_subject_results_dir)

Evalution 1: Compute and summarize precision, recall, and F-measure for mock communities

In this evaluation, we compute and summarize precision, recall, and F-measure of each result (pre-computed and query) based on the known composition of the mock communities. We then summarize the results in two ways: first with a scatter plot, where pre-computed results are represented by blue points, and query results are represented by red points; and second with a table of the top twenty-five methods based on their F-measures.

This is a qualitative evaluation, effectively telling us about the ability of the different methods to report the taxa that are present in each sample. These metrics are not concerned with the abundance of the different taxa.


In [5]:
subject_prf = list(compute_prfs(subject_results,expected_L6_tables))

In [6]:
query_prf = list(compute_prfs(query_results,expected_L6_tables))

In [7]:
generate_pr_scatter_plots(query_prf,subject_prf)



In [8]:
generate_precision_box_plots(query_prf,subject_prf)



In [9]:
generate_recall_box_plots(query_prf,subject_prf)



In [10]:
generate_fmeasure_box_plots(query_prf,subject_prf)



In [11]:
generate_prf_table(query_prf,subject_prf)


   Data set     | Precision   |   Recall    | F-measure   |    Method      |          Parameters          
ITS2-SAG1       |       0.615 |       0.889 |       0.727 |rtax            |single                        
ITS2-SAG1       |       0.571 |       0.889 |       0.696 |rdp             |0.0                           
ITS2-SAG1       |       0.571 |       0.889 |       0.696 |mothur          |0.0                           
ITS2-SAG1       |       0.533 |       0.889 |       0.667 |mothur          |0.2                           
ITS2-SAG1       |       0.533 |       0.889 |       0.667 |mothur          |0.1                           
ITS2-SAG1       |       0.500 |       0.889 |       0.640 |rdp             |0.2                           
ITS2-SAG1       |       0.500 |       0.889 |       0.640 |rdp             |0.1                           
ITS1            |       0.600 |       0.667 |       0.632 |rdp             |0.0                           
ITS1            |       0.600 |       0.667 |       0.632 |mothur          |0.0                           
S16S-2          |       0.500 |       0.826 |       0.623 |blast           |1.0                           
S16S-2          |       0.500 |       0.826 |       0.623 |blast           |100.0                         
S16S-2          |       0.500 |       0.826 |       0.623 |blast           |0.0001                        
S16S-2          |       0.500 |       0.826 |       0.623 |blast           |1e-06                         
S16S-2          |       0.500 |       0.826 |       0.623 |blast           |1e-10                         
S16S-2          |       0.487 |       0.826 |       0.613 |blast           |1e-30                         
S16S-2          |       0.475 |       0.826 |       0.603 |rdp             |0.0                           
S16S-2          |       0.475 |       0.826 |       0.603 |mothur          |0.0                           
ITS1            |       0.545 |       0.667 |       0.600 |rdp             |0.1                           
ITS1            |       0.545 |       0.667 |       0.600 |mothur          |0.1                           
S16S-2          |       0.452 |       0.826 |       0.585 |usearch         |e1.000000_qf0.500000_ma1_c1.000000
S16S-2          |       0.452 |       0.826 |       0.585 |usearch         |e0.001000_qf0.500000_ma1_c0.750000
S16S-2          |       0.452 |       0.826 |       0.585 |usearch         |e1.000000_qf0.500000_ma1_c0.750000
S16S-2          |       0.452 |       0.826 |       0.585 |usearch         |e1.000000_qf0.500000_ma1_c0.510000
S16S-2          |       0.452 |       0.826 |       0.585 |usearch         |e0.001000_qf0.500000_ma1_c0.510000
S16S-2          |       0.452 |       0.826 |       0.585 |usearch         |e0.001000_qf0.500000_ma1_c1.000000
S16S-2          |       0.462 |       0.783 |       0.581 |usearch         |e0.001000_qf0.500000_ma3_c0.510000
S16S-2          |       0.462 |       0.783 |       0.581 |usearch         |e1.000000_qf0.500000_ma3_c0.510000
S16S-2          |       0.442 |       0.826 |       0.576 |usearch         |e1.000000_qf0.750000_ma1_c0.510000
S16S-2          |       0.442 |       0.826 |       0.576 |usearch         |e1.000000_qf0.750000_ma1_c1.000000
S16S-2          |       0.442 |       0.826 |       0.576 |usearch         |e0.001000_qf0.750000_ma1_c1.000000
S16S-2          |       0.442 |       0.826 |       0.576 |usearch         |e0.001000_qf0.750000_ma1_c0.510000
S16S-2          |       0.442 |       0.826 |       0.576 |usearch         |e0.001000_qf0.750000_ma1_c0.750000
S16S-2          |       0.442 |       0.826 |       0.576 |usearch         |e1.000000_qf0.750000_ma1_c0.750000
S16S-2          |       0.442 |       0.826 |       0.576 |rdp             |0.1                           
S16S-2          |       0.442 |       0.826 |       0.576 |mothur          |0.1                           
ITS2-SAG1       |       0.500 |       0.667 |       0.571 |blast           |1.0                           
ITS2-SAG1       |       0.500 |       0.667 |       0.571 |blast           |100.0                         
ITS2-SAG1       |       0.500 |       0.667 |       0.571 |blast           |0.0001                        
ITS2-SAG1       |       0.500 |       0.667 |       0.571 |blast           |1e-06                         
ITS2-SAG1       |       0.500 |       0.667 |       0.571 |blast           |1e-10                         
ITS2-SAG1       |       0.421 |       0.889 |       0.571 |mothur          |0.3                           
S16S-2          |       0.432 |       0.826 |       0.567 |mothur          |0.2                           
S16S-2          |       0.459 |       0.739 |       0.567 |usearch         |e1.000000_qf0.500000_ma5_c0.510000
S16S-2          |       0.459 |       0.739 |       0.567 |usearch         |e0.001000_qf0.500000_ma5_c0.510000
S16S-2          |       0.439 |       0.783 |       0.562 |usearch         |e1.000000_qf0.750000_ma3_c0.510000
S16S-2          |       0.439 |       0.783 |       0.562 |usearch         |e0.001000_qf0.750000_ma3_c0.510000
S16S-2          |       0.422 |       0.826 |       0.559 |rdp             |0.3                           
S16S-2          |       0.422 |       0.826 |       0.559 |rdp             |0.2                           
S16S-2          |       0.447 |       0.739 |       0.557 |usearch         |e0.001000_qf0.750000_ma5_c0.510000
S16S-2          |       0.447 |       0.739 |       0.557 |usearch         |e1.000000_qf0.750000_ma5_c0.510000

Evaluation 2: Compute and summarize correlations between observed and known mock community structure

In this evaluation, we compute and summarize the correlation between each result (pre-computed and query) and the known composition of the mock communities. We then summarize the results in two ways: first with a series of boxplots of correlation coefficients by method; and second with a table of the top twenty-five methods based on their Pearson correlation coefficient.

This is a quantitative evaluation, which tells us about the ability of the different methods to report the taxa that are present in each sample and accurately assess their abundance. Because many factors can affect the observed abundance of taxa beyond the accuracy of the taxonomic assigner (e.g., primer bias), the correlation coefficients are frequently low, but we expect that their relative values are informative in understanding which taxonomic assigners are more correct than others.


In [12]:
subject_pearson_spearman = list(compute_pearson_spearman(subject_results,expected_L6_tables))

In [13]:
query_pearson_spearman = list(compute_pearson_spearman(query_results,expected_L6_tables))

In [14]:
generate_pearson_box_plots(query_pearson_spearman,subject_pearson_spearman)



In [15]:
generate_spearman_box_plots(query_pearson_spearman,subject_pearson_spearman)



In [16]:
generate_pearson_spearman_table(query_pearson_spearman,subject_pearson_spearman)


   Data set     |     r       |    rho      |    Method      |          Parameters          
ITS2-SAG1       |       0.679 |       0.219 |rdp             |0.5                           
ITS2-SAG1       |       0.678 |       0.181 |rdp             |0.6                           
ITS2-SAG1       |       0.632 |       0.096 |mothur          |0.7                           
ITS2-SAG1       |       0.588 |       0.383 |mothur          |0.5                           
ITS2-SAG1       |       0.586 |       0.178 |rdp             |0.4                           
ITS2-SAG1       |       0.574 |       0.089 |mothur          |0.6                           
Turnbaugh-2     |       0.562 |      -0.223 |rdp             |1.0                           
ITS2-SAG1       |       0.557 |       0.357 |mothur          |0.3                           
Turnbaugh-2     |       0.532 |      -0.422 |mothur          |1.0                           
Turnbaugh-3     |       0.524 |       0.053 |mothur          |0.9                           
ITS1            |       0.513 |       0.025 |blast           |1.0                           
ITS1            |       0.513 |       0.025 |blast           |100.0                         
ITS1            |       0.513 |       0.025 |blast           |0.0001                        
ITS1            |       0.513 |       0.025 |blast           |1e-06                         
ITS1            |       0.513 |       0.025 |blast           |1e-10                         
ITS2-SAG1       |       0.493 |      -0.531 |mothur          |0.7                           
ITS2-SAG1       |       0.483 |      -0.532 |rdp             |0.7                           
ITS2-SAG1       |       0.477 |       0.373 |rdp             |0.3                           
ITS1            |       0.473 |      -0.402 |rdp             |0.5                           
ITS1            |       0.472 |      -0.019 |blast           |1e-30                         
Broad-2         |       0.465 |       0.427 |mothur          |0.9                           
ITS1            |       0.464 |      -0.391 |rdp             |0.6                           
Broad-2         |       0.464 |       0.336 |rdp             |0.9                           
Turnbaugh-1     |       0.463 |       0.227 |blast           |1e-10                         
Turnbaugh-1     |       0.462 |       0.227 |blast           |1.0                           
Turnbaugh-1     |       0.462 |       0.227 |blast           |100.0                         
Turnbaugh-1     |       0.462 |       0.227 |blast           |0.0001                        
Turnbaugh-1     |       0.462 |       0.227 |blast           |1e-06                         
Turnbaugh-2     |       0.459 |       0.158 |mothur          |0.8                           
ITS2-SAG1       |       0.458 |       0.285 |mothur          |0.2                           
Turnbaugh-1     |       0.458 |       0.232 |blast           |1e-30                         
ITS1            |       0.455 |       0.187 |mothur          |0.8                           
Turnbaugh-2     |       0.453 |       0.154 |blast           |1.0                           
Turnbaugh-2     |       0.453 |       0.154 |blast           |100.0                         
Turnbaugh-2     |       0.453 |       0.154 |blast           |0.0001                        
Turnbaugh-2     |       0.453 |       0.154 |blast           |1e-06                         
Turnbaugh-2     |       0.453 |       0.154 |blast           |1e-10                         
Broad-3         |       0.448 |       0.392 |rdp             |0.9                           
ITS2-SAG1       |       0.447 |       0.137 |rdp             |0.8                           
ITS1            |       0.447 |      -0.405 |rdp             |0.7                           
ITS2-SAG1       |       0.445 |       0.058 |rdp             |0.7                           
ITS1            |       0.443 |      -0.416 |mothur          |0.7                           
ITS2-SAG1       |       0.442 |      -0.071 |blast           |1e-30                         
ITS1            |       0.441 |       0.087 |rdp             |0.3                           
Turnbaugh-2     |       0.440 |       0.107 |rdp             |0.9                           
Broad-3         |       0.439 |       0.373 |mothur          |0.9                           
ITS2-SAG1       |       0.436 |       0.029 |blast           |1.0                           
ITS2-SAG1       |       0.436 |       0.029 |blast           |100.0                         
ITS2-SAG1       |       0.436 |       0.029 |blast           |0.0001                        
ITS2-SAG1       |       0.436 |       0.029 |blast           |1e-06                         

Find natural community pre-computed tables, expected tables, and "query" tables

Next we'll use the paths defined above to find all of the natural community tables that will be compared. These include the pre-computed result tables (i.e., the ones that the new methods will be compared to), the expected result tables (i.e., the OTU table), and the query result tables (i.e., the tables generated with the new method(s) that we want to compare to the pre-computed result tables).


In [17]:
query_results = find_and_process_result_tables(natural_query_results_dir,filename_pattern="table.biom")
subject_results = find_and_process_result_tables(natural_subject_results_dir)
expected_pcs = get_expected_tables_lookup(natural_subject_results_dir,
                                          filename_pattern='bray_curtis_pc.txt')

Evaluation 3: Compute and summarize Procrustes analysis results based on real community data


In [18]:
query_procrustes = list(compute_procrustes(query_results,expected_pcs))

In [19]:
subject_procrustes = list(compute_procrustes(subject_results,expected_pcs))

In [20]:
generate_procrustes_table(query_procrustes,subject_procrustes)


   Data set     |    M^2      |    Method      |          Parameters          
study_449       |       0.047 |rtax            |single                        
study_449       |       0.050 |usearch         |e0.001000_qf0.500000_ma3_c0.510000
study_449       |       0.050 |usearch         |e1.000000_qf0.500000_ma3_c0.510000
study_449       |       0.054 |usearch         |e1.000000_qf0.750000_ma3_c0.510000
study_449       |       0.054 |usearch         |e0.001000_qf0.750000_ma3_c0.510000
study_449       |       0.104 |blast           |1.0                           
study_449       |       0.104 |blast           |100.0                         
study_449       |       0.104 |blast           |1e-30                         
study_449       |       0.104 |blast           |0.0001                        
study_449       |       0.104 |blast           |1e-06                         
study_449       |       0.104 |blast           |1e-10                         
study_449       |       0.116 |usearch         |e1.000000_qf0.750000_ma1_c0.510000
study_449       |       0.116 |usearch         |e1.000000_qf0.750000_ma1_c1.000000
study_449       |       0.116 |usearch         |e0.001000_qf0.750000_ma1_c1.000000
study_449       |       0.116 |usearch         |e0.001000_qf0.750000_ma1_c0.510000
study_449       |       0.116 |usearch         |e0.001000_qf0.750000_ma1_c0.750000
study_449       |       0.116 |usearch         |e1.000000_qf0.750000_ma1_c0.750000
study_449       |       0.120 |rdp             |1.0                           
study_449       |       0.121 |usearch         |e1.000000_qf0.750000_ma3_c1.000000
study_449       |       0.121 |usearch         |e0.001000_qf0.750000_ma3_c1.000000
study_449       |       0.122 |usearch         |e0.001000_qf0.750000_ma3_c0.750000
study_449       |       0.122 |usearch         |e1.000000_qf0.750000_ma3_c0.750000
study_449       |       0.126 |usearch         |e1.000000_qf0.500000_ma1_c1.000000
study_449       |       0.126 |usearch         |e0.001000_qf0.500000_ma1_c0.750000
study_449       |       0.126 |usearch         |e1.000000_qf0.500000_ma1_c0.750000
study_449       |       0.126 |usearch         |e1.000000_qf0.500000_ma1_c0.510000
study_449       |       0.126 |usearch         |e0.001000_qf0.500000_ma1_c0.510000
study_449       |       0.126 |usearch         |e0.001000_qf0.500000_ma1_c1.000000
study_449       |       0.126 |usearch         |e0.001000_qf0.750000_ma5_c0.510000
study_449       |       0.126 |usearch         |e1.000000_qf0.750000_ma5_c0.510000
study_449       |       0.127 |usearch         |e1.000000_qf0.500000_ma3_c1.000000
study_449       |       0.127 |usearch         |e0.001000_qf0.500000_ma3_c1.000000
study_449       |       0.128 |usearch         |e1.000000_qf0.500000_ma3_c0.750000
study_449       |       0.128 |usearch         |e0.001000_qf0.500000_ma3_c0.750000
study_449       |       0.134 |usearch         |e1.000000_qf0.500000_ma5_c0.510000
study_449       |       0.134 |usearch         |e0.001000_qf0.500000_ma5_c0.510000
study_449       |       0.134 |mothur          |1.0                           
study_449       |       0.156 |usearch         |e0.001000_qf0.750000_ma5_c0.750000
study_449       |       0.156 |usearch         |e1.000000_qf0.750000_ma5_c0.750000
study_449       |       0.158 |usearch         |e0.001000_qf0.500000_ma5_c1.000000
study_449       |       0.158 |usearch         |e1.000000_qf0.500000_ma5_c1.000000
study_449       |       0.160 |usearch         |e1.000000_qf0.750000_ma5_c1.000000
study_449       |       0.160 |usearch         |e0.001000_qf0.750000_ma5_c1.000000
study_449       |       0.163 |usearch         |e1.000000_qf0.500000_ma5_c0.750000
study_449       |       0.163 |usearch         |e0.001000_qf0.500000_ma5_c0.750000
study_449       |       0.195 |rdp             |0.8                           
study_449       |       0.196 |mothur          |0.8                           
study_449       |       0.234 |rdp             |0.1                           
study_449       |       0.235 |mothur          |0.1                           
study_449       |       0.237 |rdp             |0.5                           

In [ ]: