Database comparisons

The purpose of this notebook is to evaluate classification accuracy between different reference databases. Select mock community sequences are taxonomically classified using two or more different reference databases, e.g., Greengenes 13_8 trimmed to 250 nt and the same database trimmed to 150 nt. This notebook can also be modified to provide taxonomic classification with any number of desired reference databases/versions. Limit the analysis to only a few mock communities and method/parameter combinations; the goal here is to compare the databases, not the methods.

In the example below, we compare RDP classifications of a single 16S rRNA dataset (mock-3) and a single fungal ITS dataset (mock-9) using three different reference databases: reference sequences that are full length (e.g., full 16S rRNA gene), trimmed to 250 nt (approximately full V4 gene), or trimmed to 150 nt or 100 nt (length of sequence read).

Prepare the environment

First we'll import various functions that we'll need for generating the report.



In [1]:

    
%matplotlib inline
from os.path import join, expandvars
import pandas as pd
from tax_credit.plotting_functions import (pointplot_from_data_frame,
                                           boxplot_from_data_frame,
                                           heatmap_from_data_frame,
                                           per_level_kruskal_wallis)
from tax_credit.eval_framework import (evaluate_results,
                                       method_by_reference_comparison)

Configure local environment-specific values

This is the only cell that you will need to edit to generate basic reports locally. After editing this cell, you can run all cells in this notebook to generate your analysis report. This will take a few minutes to run, as results are computed at multiple taxonomic levels.

Values in this cell will not need to be changed, with the exception of project_dir, to generate the default results contained within tax-credit. To analyze results separately from the tax-credit precomputed results, other variables in this cell will need to be set.



In [2]:

    
## project_dir should be the directory where you've downloaded (or cloned) the 
## tax-credit repository. 
project_dir = expandvars("$HOME/Desktop/projects/tax-credit")

## expected_results_dir contains expected composition data in the structure
## expected_results_dir/<dataset name>/<reference name>/expected/
expected_results_dir = join(project_dir, "data/precomputed-results/", "mock-community")

## mock_results_fp designates the files to which summary results are written.
## If this file exists, it can be read in to generate results plots, instead
## of computing new scores.
mock_results_fp = join(expected_results_dir, 'mock_results.tsv')

## results_dirs should contain the directory or directories where
## results can be found. By default, this is the same location as expected 
## results included with the project. If other results should be included, 
## absolute paths to those directories should be added to this list.
results_dirs = [expected_results_dir]

## Minimum number of times an OTU must be observed for it to be included in analyses. Edit this
## to analyze the effect of the minimum count on taxonomic results.
min_count = 1

## Define the range of taxonomic levels over which to compute accuracy scores.
## The default given below will compute order (level 2) through species (level 6)
taxonomy_level_range = range(2,7)

Find mock community pre-computed tables, expected tables, and "query" tables

Next we'll use the paths defined above to find all of the tables that will be compared. These include the pre-computed result tables (i.e., the ones that the new methods will be compared to), the expected result tables (i.e., the tables containing the known composition of the mock microbial communities), and the query result tables (i.e., the tables generated with the new method(s) that we want to compare to the pre-computed result tables).



In [3]:

    
mock_results = evaluate_results(results_dirs, 
                                expected_results_dir, 
                                mock_results_fp, 
                                taxonomy_level_range=range(2,7), 
                                min_count=1,
                                taxa_to_keep=None, 
                                md_key='taxonomy', 
                                subsample=False,
                                force=False)









    



/Users/nbokulich/Desktop/projects/tax-credit/data/precomputed-results/mock-community/mock_results.tsv already exists.
Reading in pre-computed evaluation results.
To overwrite, set force=True

Restrict analyses to a set of datasets used for purpose of reference database comparisons.



In [4]:

    
mock_results = mock_results[mock_results['Method'].isin(['q2-nb', 'rdp', 'blast+', 'uclust', 'vsearch'])]
mock_results = mock_results[mock_results['Dataset'].isin(['mock-3', 'mock-9'])]

Evalution 1: Compute and summarize precision, recall, and F-measure for mock communities

In this evaluation, we compute and summarize precision, recall, and F-measure of each result (pre-computed and query) based on the known composition of the mock communities. We then summarize the results in two ways: first with boxplots, and second with a table of the top methods based on their F-measures.

This is a qualitative evaluation, effectively telling us about the ability of the different methods to report the taxa that are present in each sample. These metrics are not concerned with the abundance of the different taxa.

As a first step, we will evaluate how accuracy metrics perform at each taxonomic level for each method within each reference dataset type. We can evaluate this for each method individually, as is currently set for q2-feature-classifier naive bayes classifier.



In [5]:

    
color_pallette = {# 16S refs
                  'gg_13_8_otus': 'orange', 
                  'gg_13_8_otus_clean': 'seagreen', # 515f-806r
                  'gg_13_8_otus_clean_trim150': 'blue', # 515f-806r
                  'gg_13_8_otus_clean_full16S': 'purple',
                  'silva_123_v4_trim250': 'pink',
                  'silva_123_clean_v4_trim250': 'red',
                  'silva_123_clean_full16S': 'gold',
                  # ITS Refs
                  'unite_20.11.2016': 'orange', 
                  'unite_20.11.2016_clean': 'seagreen', # ITS1Ff-ITS2r
                  'unite_20.11.2016_clean_trim100': 'blue', # ITS1Ff-ITS2r
                  'unite_20.11.2016_clean_fullITS': 'purple', 
                  'unite_20.11.2016_clean_trim150': 'cyan'} # ITS1Ff-ITS2r

y_vars = ["Precision", "Recall", "F-measure"]



In [6]:

    
pointplot_from_data_frame(mock_results[mock_results['Method'] == 'q2-nb'], "Level", y_vars, 
                          group_by="Dataset", color_by="Reference",
                          color_pallette=color_pallette)



In [7]:

    
pointplot_from_data_frame(mock_results[mock_results['Method'] == 'rdp'], "Level", y_vars, 
                          group_by="Dataset", color_by="Reference",
                          color_pallette=color_pallette)



In [19]:

    
pointplot_from_data_frame(mock_results[mock_results['Method'] == 'blast+'], "Level", y_vars, 
                          group_by="Dataset", color_by="Reference",
                          color_pallette=color_pallette)



In [20]:

    
pointplot_from_data_frame(mock_results[mock_results['Method'] == 'uclust'], "Level", y_vars, 
                          group_by="Dataset", color_by="Reference",
                          color_pallette=color_pallette)

Kruskal-Wallis FDR-corrected p-values comparing reference database / method combinations at each level of taxonomic assignment



In [21]:

    
result = per_level_kruskal_wallis(mock_results, y_vars, group_by='Reference', 
                                  dataset_col='Dataset', level_name='Level',
                                  levelrange=range(2,7), alpha=0.05, 
                                  pval_correction='fdr_bh')
result









    Out[21]:






  
    
      
      Dataset
      Variable
      2
      3
      4
      5
      6
    
  
  
    
      0
      mock-3
      Precision
      5.113689e-116
      1.552753e-115
      2.312131e-100
      2.498765e-207
      6.371697e-167
    
    
      1
      mock-3
      Recall
      3.100730e-64
      2.671537e-55
      3.769743e-64
      1.776997e-92
      6.718745e-171
    
    
      2
      mock-3
      F-measure
      2.958127e-69
      2.785821e-58
      2.033205e-74
      8.220949e-133
      4.999565e-175
    
    
      3
      mock-9
      Precision
      7.316974e-86
      1.845363e-82
      1.860603e-94
      4.041060e-105
      1.040389e-128
    
    
      4
      mock-9
      Recall
      3.764580e-101
      1.307634e-109
      5.804092e-103
      6.494176e-103
      7.778683e-128
    
    
      5
      mock-9
      F-measure
      6.920112e-100
      8.424047e-100
      3.355415e-108
      9.091524e-114
      1.854305e-132



In [22]:

    
heatmap_from_data_frame(mock_results, metric="Precision", rows=["Method", "Parameters"], cols=["Reference", "Level"])



In [21]:

    
heatmap_from_data_frame(mock_results, metric="Recall", rows=["Method", "Parameters"], cols=["Reference", "Level"])



In [22]:

    
heatmap_from_data_frame(mock_results, metric="F-measure", rows=["Method", "Parameters"], cols=["Reference", "Level"])



In [23]:

    
heatmap_from_data_frame(mock_results, metric="Pearson r", rows=["Method", "Parameters"], cols=["Reference", "Level"])



In [24]:

    
heatmap_from_data_frame(mock_results, metric="Spearman r", rows=["Method", "Parameters"], cols=["Reference", "Level"])

Now we will focus on results at genus level (for species level, change to level 6)



In [25]:

    
mock_results_6 = mock_results[mock_results['Level'] == 5]



In [26]:

    
boxplot_from_data_frame(mock_results_6, group_by="Reference", metric="Precision")



In [15]:

    
boxplot_from_data_frame(mock_results_6, group_by="Reference", metric="Recall")



In [16]:

    
boxplot_from_data_frame(mock_results_6, group_by="Reference", metric="F-measure")



In [17]:

    
method_by_reference_comparison(mock_results)









    Out[17]:






  
    
      
      Reference
      Level
      Method
      Parameters
      Precision
      Recall
      F-measure
    
  
  
    
      3659
      gg_13_8_otus
      4
      rdp
      0.6
      1.000000
      1.000000
      1.000000
    
    
      4859
      gg_13_8_otus_full16S
      4
      rdp
      0.1
      0.944444
      0.944444
      0.944444
    
    
      5199
      gg_13_8_otus_trim150
      4
      rdp
      0.7
      0.850000
      0.944444
      0.894737
    
    
      3663
      gg_13_8_otus
      5
      rdp
      0.6
      1.000000
      1.000000
      1.000000
    
    
      4863
      gg_13_8_otus_full16S
      5
      rdp
      0.1
      0.944444
      0.944444
      0.944444
    
    
      5123
      gg_13_8_otus_trim150
      5
      rdp
      0.3
      0.800000
      0.888889
      0.842105
    
    
      3548
      gg_13_8_otus
      6
      rdp
      0.0
      0.461538
      0.300000
      0.363636
    
    
      4848
      gg_13_8_otus_full16S
      6
      rdp
      0.0
      0.500000
      0.300000
      0.375000
    
    
      5168
      gg_13_8_otus_trim150
      6
      rdp
      0.5
      0.384615
      0.250000
      0.303030
    
    
      11263
      unite_20.11.2016
      4
      rdp
      0.4
      0.777778
      1.000000
      0.875000
    
    
      12208
      unite_20.11.2016_fullITS
      4
      rdp
      0.2
      0.875000
      1.000000
      0.933333
    
    
      12417
      unite_20.11.2016_trim100
      4
      rdp
      0.5
      0.666667
      0.857143
      0.750000
    
    
      12566
      unite_20.11.2016_trim150
      4
      rdp
      0.4
      0.777778
      1.000000
      0.875000
    
    
      11266
      unite_20.11.2016
      5
      rdp
      0.4
      0.545455
      0.666667
      0.600000
    
    
      12256
      unite_20.11.2016_fullITS
      5
      rdp
      0.5
      0.583333
      0.777778
      0.666667
    
    
      12404
      unite_20.11.2016_trim100
      5
      rdp
      0.4
      0.461538
      0.666667
      0.545455
    
    
      12554
      unite_20.11.2016_trim150
      5
      rdp
      0.3
      0.500000
      0.666667
      0.571429
    
    
      11344
      unite_20.11.2016
      6
      rdp
      0.9
      0.181818
      0.166667
      0.173913
    
    
      12244
      unite_20.11.2016_fullITS
      6
      rdp
      0.4
      0.266667
      0.333333
      0.296296
    
    
      12362
      unite_20.11.2016_trim100
      6
      rdp
      0.1
      0.200000
      0.250000
      0.222222
    
    
      12649
      unite_20.11.2016_trim150
      6
      rdp
      0.9
      0.181818
      0.166667
      0.173913

Evaluation 2: Compute and summarize correlations between observed and known mock community structure

In this evaluation, we compute and summarize the correlation between each result (pre-computed and query) and the known composition of the mock communities. We then summarize the results in two ways: first with a series of boxplots of correlation coefficients by method; and second with a table of the top methods based on their Pearson correlation coefficient.

This is a quantitative evaluation, which tells us about the ability of the different methods to report the taxa that are present in each sample and accurately assess their abundance. Because many factors can affect the observed abundance of taxa beyond the accuracy of the taxonomic assigner (e.g., primer bias), the correlation coefficients are frequently low, but we expect that their relative values are informative in understanding which taxonomic assigners are more correct than others.



In [18]:

    
boxplot_from_data_frame(mock_results_6, group_by="Reference", metric="Pearson r", y_min=None)



In [19]:

    
boxplot_from_data_frame(mock_results_6, group_by="Reference", metric="Spearman r", y_min=None)



In [20]:

    
method_by_reference_comparison(mock_results, sort_field="Spearman r",
                               display_fields=("Reference", "Level",
                                               "Method", "Parameters",
                                               "Pearson r", "Spearman r"))









    Out[20]:






  
    
      
      Reference
      Level
      Method
      Parameters
      Pearson r
      Spearman r
    
  
  
    
      3640
      gg_13_8_otus
      4
      rdp
      0.5
      0.391337
      0.834849
    
    
      4940
      gg_13_8_otus_full16S
      4
      rdp
      0.5
      0.391337
      0.834849
    
    
      5160
      gg_13_8_otus_trim150
      4
      rdp
      0.5
      0.286989
      0.562133
    
    
      3644
      gg_13_8_otus
      5
      rdp
      0.5
      0.391337
      0.834849
    
    
      4944
      gg_13_8_otus_full16S
      5
      rdp
      0.5
      0.328646
      0.834849
    
    
      5223
      gg_13_8_otus_trim150
      5
      rdp
      0.8
      0.101424
      0.235419
    
    
      3548
      gg_13_8_otus
      6
      rdp
      0.0
      -0.078212
      -0.212880
    
    
      4848
      gg_13_8_otus_full16S
      6
      rdp
      0.0
      -0.021617
      -0.100261
    
    
      5168
      gg_13_8_otus_trim150
      6
      rdp
      0.5
      0.436463
      -0.266088
    
    
      11202
      unite_20.11.2016
      4
      rdp
      0.0
      0.504201
      0.800656
    
    
      12177
      unite_20.11.2016_fullITS
      4
      rdp
      0.0
      0.509576
      0.784370
    
    
      12371
      unite_20.11.2016_trim100
      4
      rdp
      0.2
      0.356728
      0.596348
    
    
      12523
      unite_20.11.2016_trim150
      4
      rdp
      0.1
      0.333269
      0.767912
    
    
      11221
      unite_20.11.2016
      5
      rdp
      0.1
      0.334900
      0.312077
    
    
      12180
      unite_20.11.2016_fullITS
      5
      rdp
      0.0
      0.416260
      0.429480
    
    
      12375
      unite_20.11.2016_trim100
      5
      rdp
      0.2
      0.224702
      0.250478
    
    
      12526
      unite_20.11.2016_trim150
      5
      rdp
      0.1
      0.335030
      0.312077
    
    
      11222
      unite_20.11.2016
      6
      rdp
      0.1
      -0.042765
      -0.552481
    
    
      12183
      unite_20.11.2016_fullITS
      6
      rdp
      0.0
      0.066910
      -0.357226
    
    
      12348
      unite_20.11.2016_trim100
      6
      rdp
      0.0
      -0.050655
      -0.468730
    
    
      12527
      unite_20.11.2016_trim150
      6
      rdp
      0.1
      -0.042655
      -0.552581



In [ ]:

	Dataset	Variable	2	3	4	5	6
0	mock-3	Precision	5.113689e-116	1.552753e-115	2.312131e-100	2.498765e-207	6.371697e-167
1	mock-3	Recall	3.100730e-64	2.671537e-55	3.769743e-64	1.776997e-92	6.718745e-171
2	mock-3	F-measure	2.958127e-69	2.785821e-58	2.033205e-74	8.220949e-133	4.999565e-175
3	mock-9	Precision	7.316974e-86	1.845363e-82	1.860603e-94	4.041060e-105	1.040389e-128
4	mock-9	Recall	3.764580e-101	1.307634e-109	5.804092e-103	6.494176e-103	7.778683e-128
5	mock-9	F-measure	6.920112e-100	8.424047e-100	3.355415e-108	9.091524e-114	1.854305e-132

	Reference	Level	Method	Parameters	Precision	Recall	F-measure
3659	gg_13_8_otus	4	rdp	0.6	1.000000	1.000000	1.000000
4859	gg_13_8_otus_full16S	4	rdp	0.1	0.944444	0.944444	0.944444
5199	gg_13_8_otus_trim150	4	rdp	0.7	0.850000	0.944444	0.894737
3663	gg_13_8_otus	5	rdp	0.6	1.000000	1.000000	1.000000
4863	gg_13_8_otus_full16S	5	rdp	0.1	0.944444	0.944444	0.944444
5123	gg_13_8_otus_trim150	5	rdp	0.3	0.800000	0.888889	0.842105
3548	gg_13_8_otus	6	rdp	0.0	0.461538	0.300000	0.363636
4848	gg_13_8_otus_full16S	6	rdp	0.0	0.500000	0.300000	0.375000
5168	gg_13_8_otus_trim150	6	rdp	0.5	0.384615	0.250000	0.303030
11263	unite_20.11.2016	4	rdp	0.4	0.777778	1.000000	0.875000
12208	unite_20.11.2016_fullITS	4	rdp	0.2	0.875000	1.000000	0.933333
12417	unite_20.11.2016_trim100	4	rdp	0.5	0.666667	0.857143	0.750000
12566	unite_20.11.2016_trim150	4	rdp	0.4	0.777778	1.000000	0.875000
11266	unite_20.11.2016	5	rdp	0.4	0.545455	0.666667	0.600000
12256	unite_20.11.2016_fullITS	5	rdp	0.5	0.583333	0.777778	0.666667
12404	unite_20.11.2016_trim100	5	rdp	0.4	0.461538	0.666667	0.545455
12554	unite_20.11.2016_trim150	5	rdp	0.3	0.500000	0.666667	0.571429
11344	unite_20.11.2016	6	rdp	0.9	0.181818	0.166667	0.173913
12244	unite_20.11.2016_fullITS	6	rdp	0.4	0.266667	0.333333	0.296296
12362	unite_20.11.2016_trim100	6	rdp	0.1	0.200000	0.250000	0.222222
12649	unite_20.11.2016_trim150	6	rdp	0.9	0.181818	0.166667	0.173913

	Reference	Level	Method	Parameters	Pearson r	Spearman r
3640	gg_13_8_otus	4	rdp	0.5	0.391337	0.834849
4940	gg_13_8_otus_full16S	4	rdp	0.5	0.391337	0.834849
5160	gg_13_8_otus_trim150	4	rdp	0.5	0.286989	0.562133
3644	gg_13_8_otus	5	rdp	0.5	0.391337	0.834849
4944	gg_13_8_otus_full16S	5	rdp	0.5	0.328646	0.834849
5223	gg_13_8_otus_trim150	5	rdp	0.8	0.101424	0.235419
3548	gg_13_8_otus	6	rdp	0.0	-0.078212	-0.212880
4848	gg_13_8_otus_full16S	6	rdp	0.0	-0.021617	-0.100261
5168	gg_13_8_otus_trim150	6	rdp	0.5	0.436463	-0.266088
11202	unite_20.11.2016	4	rdp	0.0	0.504201	0.800656
12177	unite_20.11.2016_fullITS	4	rdp	0.0	0.509576	0.784370
12371	unite_20.11.2016_trim100	4	rdp	0.2	0.356728	0.596348
12523	unite_20.11.2016_trim150	4	rdp	0.1	0.333269	0.767912
11221	unite_20.11.2016	5	rdp	0.1	0.334900	0.312077
12180	unite_20.11.2016_fullITS	5	rdp	0.0	0.416260	0.429480
12375	unite_20.11.2016_trim100	5	rdp	0.2	0.224702	0.250478
12526	unite_20.11.2016_trim150	5	rdp	0.1	0.335030	0.312077
11222	unite_20.11.2016	6	rdp	0.1	-0.042765	-0.552481
12183	unite_20.11.2016_fullITS	6	rdp	0.0	0.066910	-0.357226
12348	unite_20.11.2016_trim100	6	rdp	0.0	-0.050655	-0.468730
12527	unite_20.11.2016_trim150	6	rdp	0.1	-0.042655	-0.552581