Evaluate mock community classification accuracy

The purpose of this notebook is to evaluate taxonomic classification accuracy of mock communities using different classification methods.

Prepare the environment

First we'll import various functions that we'll need for generating the report.



In [1]:

    
%matplotlib inline
from os.path import join, exists, expandvars
import pandas as pd
from IPython.display import display, Markdown
import seaborn.xkcd_rgb as colors
from tax_credit.plotting_functions import (pointplot_from_data_frame,
                                           boxplot_from_data_frame,
                                           heatmap_from_data_frame,
                                           per_level_kruskal_wallis,
                                           beta_diversity_pcoa,
                                           average_distance_boxplots,
                                           rank_optimized_method_performance_by_dataset)
from tax_credit.eval_framework import (evaluate_results,
                                       method_by_dataset_a1,
                                       parameter_comparisons,
                                       merge_expected_and_observed_tables,
                                       filter_df)

Configure local environment-specific values

This is the only cell that you will need to edit to generate basic reports locally. After editing this cell, you can run all cells in this notebook to generate your analysis report. This will take a few minutes to run, as results are computed at multiple taxonomic levels.

Values in this cell will not need to be changed, with the exception of project_dir, to generate the default results contained within tax-credit. To analyze results separately from the tax-credit precomputed results, other variables in this cell will need to be set.



In [2]:

    
## project_dir should be the directory where you've downloaded (or cloned) the 
## tax-credit repository. 
project_dir = expandvars("../../")

## expected_results_dir contains expected composition data in the structure
## expected_results_dir/<dataset name>/<reference name>/expected/
expected_results_dir = join(project_dir, "data/precomputed-results/", "mock-community")

## mock_results_fp designates the files to which summary results are written.
## If this file exists, it can be read in to generate results plots, instead
## of computing new scores.
mock_results_fp = join(expected_results_dir, 'mock_results.tsv')

## results_dirs should contain the directory or directories where
## results can be found. By default, this is the same location as expected 
## results included with the project. If other results should be included, 
## absolute paths to those directories should be added to this list.
results_dirs = [expected_results_dir]

## directory containing mock community data, e.g., feature table without taxonomy
mock_dir = join(project_dir, "data", "mock-community")

## Minimum number of times an OTU must be observed for it to be included in analyses. Edit this
## to analyze the effect of the minimum count on taxonomic results.
min_count = 1

## Define the range of taxonomic levels over which to compute accuracy scores.
## The default given below will compute order (level 2) through species (level 6)
taxonomy_level_range = range(2,7)


# we can save plots in this directory
outdir = join(expandvars("../../../"), 'plots')



In [3]:

    
dataset_ids = ['mock-1', 'mock-2', 'mock-3', 'mock-4', 'mock-5', 'mock-7', 'mock-8', 'mock-9',
               'mock-10', 'mock-12', 'mock-16', 'mock-18', 'mock-19', 'mock-20', 'mock-21', 
               'mock-22', 'mock-23', 'mock-24', 'mock-26-ITS1', 'mock-26-ITS9']
method_ids = ['rdp', 'sortmerna', 'uclust', 'blast', 'blast+', 'naive-bayes', 'naive-bayes-bespoke', 'vsearch']
ref_ids = ['gg_13_8_otus', 'unite_20.11.2016_clean_fullITS']

Find mock community pre-computed tables, expected tables, and "query" tables

Next we'll use the paths defined above to find all of the tables that will be compared. These include the pre-computed result tables (i.e., the ones that the new methods will be compared to), the expected result tables (i.e., the tables containing the known composition of the mock microbial communities), and the query result tables (i.e., the tables generated with the new method(s) that we want to compare to the pre-computed result tables).

Note: if you have added additional methods to add, set append=True. If you are attempting to recompute pre-computed results, set force=True.

This cell will take a few minutes to run if new results are being added, so hold onto your hat. If you are attempting to re-compute everything, it may take an hour or so, so go take a nap.



In [4]:

    
mock_results = evaluate_results(results_dirs, 
                                expected_results_dir, 
                                mock_results_fp, 
                                mock_dir,
                                taxonomy_level_range=range(2,7), 
                                min_count=min_count,
                                taxa_to_keep=None, 
                                md_key='taxonomy', 
                                subsample=False,
                                per_seq_precision=True,
                                exclude=['other'],
                                dataset_ids=dataset_ids,
                                reference_ids=ref_ids,
                                method_ids=method_ids,
                                append=False,
                                force=False,
                                backup=False)









    



../../data/precomputed-results/mock-community/mock_results.tsv already exists.
Reading in pre-computed evaluation results.
To overwrite, set force=True
Results have been filtered to only include datasets or reference databases or methods or parameters that are explicitly set by results params. To disable this function and load all results, set dataset_ids and reference_ids and method_ids and parameter_ids to None.

Restrict analyses to a set of datasets or references: e.g., exclude taxonomy assignments made for purpose of reference database comparisons. This can be performed as shown below — alternatively, specific reference databases, datasets, methods, or parameters can be chosen by setting dataset_ids, reference_ids, method_ids, and parameter_ids in the evaluate_results command above.



In [5]:

    
# mock_results = filter_df(mock_results, column_name='Method', values=['naive-bayes'], exclude=False)
mock_results = mock_results.reset_index(drop=True)

Compute and summarize precision, recall, and F-measure for mock communities

In this evaluation, we compute and summarize precision, recall, and F-measure of each result (pre-computed and query) based on the known composition of the mock communities. We then summarize the results in two ways: first with boxplots, and second with a table of the top methods based on their F-measures. Higher scores = better accuracy

As a first step, we will evaluate average method performance at each taxonomic level for each method within each reference dataset type.

Note that, as parameter configurations can cause results to vary widely, average results are not a good representation of the "best" results. See here for results using optimized parameters for each method.

First we will define our color palette and the variables we want to plot. Via seaborn, we can apply the xkcd crowdsourced color names. If that still doesn't match your hue, use hex codes.



In [6]:

    
color_palette={
    'expected': 'black', 'rdp': colors['baby shit green'], 'sortmerna': colors['macaroni and cheese'],
    'uclust': 'coral', 'blast': 'indigo', 'blast+': colors['electric purple'], 'naive-bayes': 'dodgerblue',
    'naive-bayes-bespoke': 'blue', 'vsearch': 'firebrick'
}

y_vars = ["Precision", "Recall", "F-measure", "Taxon Accuracy Rate", "Taxon Detection Rate"]



In [8]:

    
point = pointplot_from_data_frame(mock_results, "Level", y_vars, 
                                  group_by="Reference", color_by="Method",
                                  color_palette=color_palette)



In [10]:

    
for k, v in point.items():
    v.savefig(join(outdir, 'mock-{0}-lineplots.pdf'.format(k)))

Kruskal-Wallis between-method accuracy comparisons

Kruskal-Wallis FDR-corrected p-values comparing classification methods at each level of taxonomic assignment



In [11]:

    
result = per_level_kruskal_wallis(mock_results, y_vars, group_by='Method', 
                                  dataset_col='Reference', level_name='Level',
                                  levelrange=range(2,7), alpha=0.05, 
                                  pval_correction='fdr_bh')
result









    Out[11]:







  
    
      
      Reference
      Variable
      2
      3
      4
      5
      6
    
  
  
    
      0
      gg_13_8_otus
      Precision
      1.507747e-47
      1.854806e-77
      9.910988e-51
      3.609426e-65
      2.406841e-310
    
    
      1
      gg_13_8_otus
      Recall
      3.113942e-51
      6.712403e-242
      1.789303e-273
      0.000000e+00
      4.778745e-205
    
    
      2
      gg_13_8_otus
      F-measure
      2.755966e-50
      1.397661e-242
      3.850779e-272
      0.000000e+00
      5.978456e-211
    
    
      3
      gg_13_8_otus
      Taxon Accuracy Rate
      5.037725e-12
      4.870274e-76
      8.147172e-60
      2.475215e-302
      0.000000e+00
    
    
      4
      gg_13_8_otus
      Taxon Detection Rate
      1.326490e-30
      4.209898e-31
      3.360070e-19
      3.921844e-241
      0.000000e+00
    
    
      5
      unite_20.11.2016_clean_fullITS
      Precision
      3.547503e-36
      2.160250e-58
      2.998770e-96
      3.664661e-100
      2.377685e-268
    
    
      6
      unite_20.11.2016_clean_fullITS
      Recall
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
    
    
      7
      unite_20.11.2016_clean_fullITS
      F-measure
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
      0.000000e+00
    
    
      8
      unite_20.11.2016_clean_fullITS
      Taxon Accuracy Rate
      1.173900e-161
      8.258546e-169
      1.677852e-191
      1.505928e-226
      0.000000e+00
    
    
      9
      unite_20.11.2016_clean_fullITS
      Taxon Detection Rate
      1.489817e-91
      3.993707e-132
      2.571375e-147
      8.253999e-176
      0.000000e+00

Heatmaps of per-level accuracy

Heatmaps show the performance of individual method/parameter combinations at each taxonomic level, in each reference database (i.e., for bacterial and fungal mock communities individually).



In [12]:

    
heatmap_from_data_frame(mock_results, metric="Precision", rows=["Method", "Parameters"], cols=["Reference", "Level"])









    












    Out[12]:





<matplotlib.axes._subplots.AxesSubplot at 0x105546630>



In [13]:

    
heatmap_from_data_frame(mock_results, metric="Recall", rows=["Method", "Parameters"], cols=["Reference", "Level"])









    












    Out[13]:





<matplotlib.axes._subplots.AxesSubplot at 0x11341f6d8>



In [14]:

    
heatmap_from_data_frame(mock_results, metric="F-measure", rows=["Method", "Parameters"], cols=["Reference", "Level"])









    












    Out[14]:





<matplotlib.axes._subplots.AxesSubplot at 0x114ab8e10>



In [15]:

    
heatmap_from_data_frame(mock_results, metric="Taxon Accuracy Rate", rows=["Method", "Parameters"], cols=["Reference", "Level"])









    












    Out[15]:





<matplotlib.axes._subplots.AxesSubplot at 0x1166674e0>



In [16]:

    
heatmap_from_data_frame(mock_results, metric="Taxon Detection Rate", rows=["Method", "Parameters"], cols=["Reference", "Level"])









    












    Out[16]:





<matplotlib.axes._subplots.AxesSubplot at 0x1149c4c88>

Now we will focus on results at species level (for genus level, change to level 5)



In [7]:

    
mock_results_6 = mock_results[mock_results['Level'] == 6]



In [8]:

    
boxplot_from_data_frame(mock_results_6, group_by="Method", metric="Precision", color_palette=color_palette)









    












    Out[8]:





<matplotlib.axes._subplots.AxesSubplot at 0x10e569400>



In [9]:

    
boxplot_from_data_frame(mock_results_6, group_by="Method", metric="Recall", color_palette=color_palette)









    












    Out[9]:





<matplotlib.axes._subplots.AxesSubplot at 0x11086f588>



In [10]:

    
boxplot_from_data_frame(mock_results_6, group_by="Method", metric="F-measure", color_palette=color_palette)









    












    Out[10]:





<matplotlib.axes._subplots.AxesSubplot at 0x10e8aa588>



In [11]:

    
boxplot_from_data_frame(mock_results_6, group_by="Method", metric="Taxon Accuracy Rate", color_palette=color_palette)









    












    Out[11]:





<matplotlib.axes._subplots.AxesSubplot at 0x10ead5a58>



In [12]:

    
boxplot_from_data_frame(mock_results_6, group_by="Method", metric="Taxon Detection Rate", color_palette=color_palette)









    












    Out[12]:





<matplotlib.axes._subplots.AxesSubplot at 0x10e6d6b00>

Look at F-measure at genus level



In [13]:

    
mock_results_5 = mock_results[mock_results['Level'] == 5]



In [14]:

    
boxplot_from_data_frame(mock_results_5, group_by="Method", metric="F-measure", color_palette=color_palette)









    












    Out[14]:





<matplotlib.axes._subplots.AxesSubplot at 0x10e6b0550>



In [20]:

    
mock_results_5.groupby("Method").median()









    Out[20]:







  
    
      
      Level
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
    
      Method
      
      
      
      
      
      
    
  
  
    
      blast
      5
      1.0
      0.968912
      0.984211
      0.750000
      0.727273
    
    
      blast+
      5
      1.0
      0.849341
      0.897696
      0.583333
      0.538462
    
    
      naive-bayes
      5
      1.0
      0.947026
      0.951923
      0.666667
      0.636364
    
    
      naive-bayes-bespoke
      5
      1.0
      1.000000
      1.000000
      0.771429
      0.636364
    
    
      rdp
      5
      1.0
      0.999990
      0.999990
      0.750000
      0.727273
    
    
      sortmerna
      5
      1.0
      0.956708
      0.972085
      0.714286
      0.636364
    
    
      uclust
      5
      1.0
      0.713621
      0.810810
      0.636364
      0.542857
    
    
      vsearch
      5
      1.0
      0.894203
      0.932479
      0.628571
      0.545455



In [23]:

    
mock_results_5.groupby("Method").std()









    Out[23]:







  
    
      
      Level
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
    
      Method
      
      
      
      
      
      
    
  
  
    
      blast
      0.0
      0.091380
      0.113460
      0.099558
      0.144590
      0.189215
    
    
      blast+
      0.0
      0.321829
      0.357214
      0.349524
      0.250316
      0.263426
    
    
      naive-bayes
      0.0
      0.350885
      0.394216
      0.378837
      0.294774
      0.299700
    
    
      naive-bayes-bespoke
      0.0
      0.209343
      0.274806
      0.258039
      0.213479
      0.272208
    
    
      rdp
      0.0
      0.097528
      0.186111
      0.152516
      0.166270
      0.207367
    
    
      sortmerna
      0.0
      0.124508
      0.195748
      0.164674
      0.169572
      0.187481
    
    
      uclust
      0.0
      0.068539
      0.313915
      0.309204
      0.174172
      0.212127
    
    
      vsearch
      0.0
      0.095198
      0.301211
      0.266225
      0.181140
      0.215745



In [21]:

    
mock_results_6.groupby("Method").median()









    Out[21]:







  
    
      
      Level
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
    
      Method
      
      
      
      
      
      
    
  
  
    
      blast
      6
      0.911808
      0.898308
      0.905669
      0.578947
      0.545455
    
    
      blast+
      6
      0.885315
      0.406352
      0.542705
      0.410256
      0.363636
    
    
      naive-bayes
      6
      0.963514
      0.766478
      0.811590
      0.520000
      0.454545
    
    
      naive-bayes-bespoke
      6
      1.000000
      0.942217
      0.948357
      0.692308
      0.636364
    
    
      rdp
      6
      0.992066
      0.947026
      0.948357
      0.600000
      0.578947
    
    
      sortmerna
      6
      0.995403
      0.927770
      0.932479
      0.571429
      0.460000
    
    
      uclust
      6
      0.910492
      0.289107
      0.415764
      0.440000
      0.363636
    
    
      vsearch
      6
      0.992308
      0.458496
      0.580192
      0.428571
      0.363636



In [22]:

    
mock_results_6.groupby("Method").std()









    Out[22]:







  
    
      
      Level
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
    
      Method
      
      
      
      
      
      
    
  
  
    
      blast
      0.0
      0.203380
      0.226256
      0.215324
      0.127848
      0.148844
    
    
      blast+
      0.0
      0.405526
      0.391825
      0.388711
      0.241750
      0.230588
    
    
      naive-bayes
      0.0
      0.361718
      0.404387
      0.390941
      0.255675
      0.236117
    
    
      naive-bayes-bespoke
      0.0
      0.247967
      0.330187
      0.299690
      0.207314
      0.251360
    
    
      rdp
      0.0
      0.177257
      0.305923
      0.267272
      0.178491
      0.168614
    
    
      sortmerna
      0.0
      0.249246
      0.313537
      0.284816
      0.168591
      0.164231
    
    
      uclust
      0.0
      0.185041
      0.293794
      0.291339
      0.170592
      0.168700
    
    
      vsearch
      0.0
      0.202364
      0.369768
      0.341499
      0.202229
      0.197371

In the following heatmaps, we assess accuracy rates in each dataset X method configuration combination. This allows us to assess how evenly configurations affect performance, whether specific mock communities outperform of underperform relative to others, and generally assess how increasing/decreasing specific parameters affects accuracy.



In [25]:

    
heatmap_from_data_frame(mock_results_6, "Precision")









    












    Out[25]:





<matplotlib.axes._subplots.AxesSubplot at 0x11fd49630>



In [26]:

    
heatmap_from_data_frame(mock_results_6, "Recall")









    












    Out[26]:





<matplotlib.axes._subplots.AxesSubplot at 0x1201fdc18>



In [27]:

    
heatmap_from_data_frame(mock_results_6, "F-measure")









    












    Out[27]:





<matplotlib.axes._subplots.AxesSubplot at 0x120085cc0>



In [28]:

    
heatmap_from_data_frame(mock_results_6, "Taxon Accuracy Rate")









    












    Out[28]:





<matplotlib.axes._subplots.AxesSubplot at 0x1202b2828>



In [29]:

    
heatmap_from_data_frame(mock_results_6, "Taxon Detection Rate")









    












    Out[29]:





<matplotlib.axes._subplots.AxesSubplot at 0x121fcecc0>

Method Optimization

Which method/parameter configuration performed "best" for a given score? We can rank the top-performing configuration by dataset, method, and taxonomic level.

First, the top-performing method/configuration combination by dataset.



In [30]:

    
for dataset in mock_results_6['Dataset'].unique():
    display(Markdown('## {0}'.format(dataset)))
    best = method_by_dataset_a1(mock_results_6, dataset)
    display(best)









    




mock-8






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      4422
      naive-bayes-bespoke
      0.001:prior:char:8192:[16,16]:0.5
      0.985273
      0.817671
      0.893681
      0.860000
      0.796296
    
    
      1662
      naive-bayes
      0.001:char:8192:[18,18]:0.5
      0.978409
      0.800016
      0.880266
      0.679245
      0.666667
    
    
      2654
      rdp
      0.4
      0.937163
      0.705506
      0.805000
      0.603774
      0.592593
    
    
      3284
      uclust
      0.51:0.97:3
      0.662203
      0.509471
      0.575883
      0.551020
      0.500000
    
    
      2054
      vsearch
      1:0.99:0.99
      0.669105
      0.494585
      0.568759
      0.565217
      0.481481
    
    
      2789
      sortmerna
      0.51:0.99:1:0.9:1.0
      0.659963
      0.413790
      0.508657
      0.511111
      0.425926
    
    
      299
      blast+
      0.001:1:0.51:0.8
      0.624731
      0.422299
      0.503946
      0.510638
      0.444444
    
    
      2879
      blast
      1e-10
      0.624731
      0.422299
      0.503946
      0.510638
      0.444444
    
  








    




mock-26-ITS1






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      18993
      naive-bayes-bespoke
      0.001:prior:char:8192:[8,8]:0.0
      1.000000
      1.000000
      1.000000
      0.777778
      0.636364
    
    
      13233
      vsearch
      1:0.75:0.97
      1.000000
      1.000000
      1.000000
      0.777778
      0.636364
    
    
      6033
      blast+
      0.001:1:0.99:0.99
      1.000000
      1.000000
      1.000000
      0.777778
      0.636364
    
    
      11313
      naive-bayes
      0.001:char:8192:[14,14]:0.0
      1.000000
      1.000000
      1.000000
      0.777778
      0.636364
    
    
      16173
      sortmerna
      0.51:0.99:5:0.9:1.0
      1.000000
      1.000000
      1.000000
      0.777778
      0.636364
    
    
      15753
      rdp
      0.2
      1.000000
      1.000000
      1.000000
      0.777778
      0.636364
    
    
      16413
      blast
      1e-10
      0.984163
      0.984163
      0.984163
      0.666667
      0.545455
    
    
      17973
      uclust
      0.76:0.9:1
      0.984163
      0.984163
      0.984163
      0.666667
      0.545455
    
  








    




mock-7






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      28647
      naive-bayes-bespoke
      0.001:prior:char:8192:[16,16]:0.0
      0.979330
      0.823178
      0.894490
      0.840000
      0.777778
    
    
      26187
      naive-bayes
      0.001:char:8192:[18,18]:0.5
      0.936550
      0.756941
      0.837221
      0.705882
      0.666667
    
    
      27194
      rdp
      0.3
      0.862963
      0.668879
      0.753626
      0.627451
      0.592593
    
    
      27779
      uclust
      0.51:0.9:3
      0.752492
      0.466084
      0.575630
      0.571429
      0.444444
    
    
      27359
      sortmerna
      0.51:0.9:1:0.9:1.0
      0.670887
      0.496033
      0.570360
      0.545455
      0.444444
    
    
      24554
      blast+
      0.001:1:0.51:0.97
      0.640563
      0.500886
      0.562179
      0.543478
      0.462963
    
    
      27404
      blast
      1e-10
      0.640563
      0.500886
      0.562179
      0.543478
      0.462963
    
    
      26684
      vsearch
      1:0.99:0.8
      0.625602
      0.456603
      0.527907
      0.577778
      0.481481
    
  








    




mock-2






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      30704
      naive-bayes-bespoke
      0.001:prior:char:8192:[10,10]:0.0
      1.000000
      0.905037
      0.950152
      0.725000
      0.783784
    
    
      29774
      naive-bayes
      0.001:char:8192:[9,9]:0.5
      1.000000
      0.862307
      0.926063
      0.555556
      0.675676
    
    
      30304
      rdp
      0.7
      1.000000
      0.779486
      0.876080
      0.511628
      0.594595
    
    
      30129
      vsearch
      10:0.51:0.99
      0.549971
      0.476456
      0.510581
      0.487805
      0.540541
    
    
      30459
      uclust
      1.0:0.9:1
      0.523875
      0.431704
      0.473344
      0.525000
      0.567568
    
    
      30344
      sortmerna
      1.0:0.9:1:0.9:1.0
      0.494340
      0.391467
      0.436930
      0.485714
      0.459459
    
    
      29504
      blast+
      0.001:100:0.51:0.99
      0.508952
      0.273960
      0.356189
      0.463415
      0.513514
    
    
      30384
      blast
      1000
      0.350540
      0.288866
      0.316729
      0.488372
      0.567568
    
  








    




mock-3






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      36102
      naive-bayes-bespoke
      0.001:prior:char:8192:[12,12]:0.5
      1.000000
      1.000000
      1.000000
      0.950000
      0.95
    
    
      35244
      uclust
      0.51:0.9:1
      0.988422
      0.953779
      0.970792
      0.769231
      0.50
    
    
      31564
      blast+
      0.001:1:0.75:0.97
      0.945743
      0.912596
      0.928874
      0.692308
      0.45
    
    
      34884
      blast
      1000
      0.945743
      0.912596
      0.928874
      0.692308
      0.45
    
    
      34001
      vsearch
      1:0.51:0.9
      0.949057
      0.896317
      0.921933
      0.705882
      0.60
    
    
      34821
      sortmerna
      0.51:0.99:5:0.9:1.0
      0.914761
      0.328205
      0.483085
      0.555556
      0.50
    
    
      34663
      rdp
      0.8
      0.915142
      0.304494
      0.456948
      0.538462
      0.35
    
    
      32683
      naive-bayes
      0.001:char:8192:[8,8]:0.0
      0.907022
      0.304494
      0.455929
      0.538462
      0.35
    
  








    




mock-10






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      41518
      naive-bayes-bespoke
      0.001:prior:char:8192:[7,7]:0.94
      0.997564
      0.522668
      0.685941
      0.461538
      0.500000
    
    
      40378
      sortmerna
      1.0:0.99:5:0.9:1.0
      1.000000
      0.435936
      0.607180
      0.416667
      0.416667
    
    
      40003
      vsearch
      100:0.75:0.97
      0.642512
      0.435936
      0.519439
      0.428571
      0.500000
    
    
      40198
      rdp
      0.6
      0.576611
      0.441873
      0.500330
      0.352941
      0.500000
    
    
      37678
      blast+
      0.001:1:0.51:0.99
      0.476931
      0.435936
      0.455513
      0.428571
      0.500000
    
    
      40543
      uclust
      0.76:0.97:3
      1.000000
      0.289107
      0.448539
      0.333333
      0.166667
    
    
      38653
      naive-bayes
      0.001:char:8192:[11,11]:0.96
      0.574485
      0.289107
      0.384644
      0.250000
      0.250000
    
    
      40483
      blast
      1e-10
      0.346096
      0.346096
      0.346096
      0.333333
      0.416667
    
  








    




mock-23






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      43644
      naive-bayes-bespoke
      0.001:prior:char:8192:[11,11]:0.92
      0.999958
      0.999906
      0.999932
      0.629630
      0.894737
    
    
      43469
      blast
      1e-10
      0.777240
      0.736385
      0.756261
      0.444444
      0.631579
    
    
      42544
      blast+
      0.001:1:0.99:0.97
      0.777240
      0.736385
      0.756261
      0.444444
      0.631579
    
    
      42764
      naive-bayes
      0.001:char:8192:[6,6]:0.0
      0.945478
      0.628949
      0.755396
      0.288889
      0.684211
    
    
      43374
      rdp
      0.6
      0.945470
      0.626301
      0.753480
      0.315789
      0.631579
    
    
      43239
      vsearch
      1:0.75:0.8
      0.745446
      0.706217
      0.725301
      0.480000
      0.631579
    
    
      43549
      uclust
      0.51:0.99:3
      0.744739
      0.703471
      0.723517
      0.458333
      0.578947
    
    
      43424
      sortmerna
      1.0:0.9:1:0.9:1.0
      0.742659
      0.703551
      0.722577
      0.392857
      0.578947
    
  








    




mock-18






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      45609
      naive-bayes-bespoke
      0.001:prior:char:8192:[6,6]:0.5
      1.000000
      0.942504
      0.970401
      0.800000
      0.800000
    
    
      44589
      naive-bayes
      0.001:char:8192:[12,12]:0.5
      0.942771
      0.888565
      0.914866
      0.578947
      0.733333
    
    
      44174
      blast+
      0.001:1:0.99:0.8
      0.866472
      0.816653
      0.840825
      0.500000
      0.666667
    
    
      45099
      blast
      1000
      0.866472
      0.816653
      0.840825
      0.526316
      0.666667
    
    
      45049
      rdp
      0.2
      0.820735
      0.773546
      0.796442
      0.523810
      0.733333
    
    
      45234
      uclust
      0.76:0.9:1
      0.819872
      0.772732
      0.795605
      0.647059
      0.733333
    
    
      44989
      vsearch
      1:0.51:0.99
      0.743117
      0.700391
      0.721122
      0.600000
      0.600000
    
    
      45059
      sortmerna
      1.0:0.9:1:0.9:1.0
      0.743117
      0.700391
      0.721122
      0.450000
      0.600000
    
  








    




mock-4






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      50417
      naive-bayes-bespoke
      0.001:prior:char:8192:[32,32]:0.5
      0.999752
      0.999752
      0.999752
      0.863636
      0.95
    
    
      49898
      uclust
      1.0:0.9:1
      0.987503
      0.955909
      0.971449
      0.538462
      0.70
    
    
      46178
      blast+
      0.001:1:0.51:0.8
      0.946067
      0.918123
      0.931886
      0.560000
      0.70
    
    
      49618
      blast
      1e-10
      0.946067
      0.918123
      0.931886
      0.538462
      0.70
    
    
      49036
      vsearch
      1:0.51:0.97
      0.955963
      0.906693
      0.930676
      0.619048
      0.65
    
    
      49516
      sortmerna
      0.51:0.9:5:0.9:1.0
      0.937503
      0.406352
      0.566960
      0.521739
      0.60
    
    
      47596
      naive-bayes
      0.001:char:8192:[32,32]:0.98
      0.895793
      0.378540
      0.532190
      0.434783
      0.50
    
    
      49299
      rdp
      0.5
      0.908339
      0.285428
      0.434365
      0.464286
      0.65
    
  








    




mock-24






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      53139
      vsearch
      100:0.51:0.9
      1.000000
      0.889646
      0.941601
      0.095238
      0.250
    
    
      52329
      blast+
      0.001:1:0.51:0.97
      1.000000
      0.889646
      0.941601
      0.210526
      0.500
    
    
      53184
      rdp
      0.6
      1.000000
      0.889646
      0.941601
      0.285714
      0.750
    
    
      53239
      sortmerna
      1.0:0.99:1:0.9:1.0
      1.000000
      0.889646
      0.941601
      0.400000
      0.500
    
    
      53284
      uclust
      0.76:0.99:1
      1.000000
      0.889646
      0.941601
      0.333333
      0.375
    
    
      52814
      naive-bayes
      0.001:char:8192:[8,8]:0.96
      1.000000
      0.889646
      0.941601
      0.227273
      0.625
    
    
      53644
      naive-bayes-bespoke
      0.001:prior:char:8192:[11,11]:0.96
      1.000000
      0.889646
      0.941601
      0.315789
      0.750
    
    
      53279
      blast
      1e-10
      0.889646
      0.889646
      0.889646
      0.227273
      0.625
    
  








    




mock-22






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      55199
      naive-bayes-bespoke
      0.001:prior:char:8192:[14,14]:0.0
      0.999990
      0.999990
      0.999990
      0.692308
      0.947368
    
    
      54909
      blast
      1000
      0.826671
      0.735173
      0.778242
      0.413793
      0.631579
    
    
      54079
      blast+
      0.001:1:0.75:0.97
      0.826671
      0.735173
      0.778242
      0.428571
      0.631579
    
    
      54689
      vsearch
      1:0.51:0.9
      0.825662
      0.734231
      0.777267
      0.428571
      0.631579
    
    
      54994
      uclust
      0.51:0.99:3
      0.813507
      0.676203
      0.738528
      0.423077
      0.578947
    
    
      54884
      sortmerna
      0.51:0.99:1:0.9:1.0
      0.760641
      0.676426
      0.716066
      0.379310
      0.578947
    
    
      54209
      naive-bayes
      0.001:char:8192:[6,6]:0.0
      0.774044
      0.650819
      0.707103
      0.279070
      0.631579
    
    
      54834
      rdp
      0.5
      0.757617
      0.593107
      0.665344
      0.268293
      0.578947
    
  








    




mock-5






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      60227
      naive-bayes-bespoke
      0.001:prior:char:8192:[32,32]:0.5
      0.998426
      0.998426
      0.998426
      0.826087
      0.95
    
    
      55628
      blast+
      0.001:1:0.51:0.97
      0.956439
      0.936893
      0.946565
      0.518519
      0.70
    
    
      59428
      blast
      1e-10
      0.956405
      0.936133
      0.946160
      0.518519
      0.70
    
    
      58526
      vsearch
      1:0.51:0.9
      0.958658
      0.925608
      0.941843
      0.560000
      0.70
    
    
      59368
      sortmerna
      0.51:0.9:1:0.9:1.0
      0.947126
      0.927770
      0.937348
      0.464286
      0.65
    
    
      59808
      uclust
      1.0:0.99:1
      0.950255
      0.921753
      0.935787
      0.481481
      0.65
    
    
      56946
      naive-bayes
      0.001:char:8192:[4,4]:0.0
      0.840326
      0.427025
      0.566284
      0.520000
      0.65
    
    
      59049
      rdp
      0.6
      0.903279
      0.265198
      0.410017
      0.481481
      0.65
    
  








    




mock-26-ITS9






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      127529
      naive-bayes-bespoke
      0.001:prior:char:8192:[8,8]:0.98
      1.0
      1.0
      1.0
      0.700000
      0.636364
    
    
      89107
      naive-bayes
      0.001:char:8192:[6,6]:0.5
      1.0
      1.0
      1.0
      0.600000
      0.272727
    
    
      89529
      vsearch
      1:0.99:0.99
      1.0
      1.0
      1.0
      0.750000
      0.545455
    
    
      99729
      sortmerna
      0.51:0.99:5:0.9:1.0
      1.0
      1.0
      1.0
      0.777778
      0.636364
    
    
      100506
      blast
      1e-10
      1.0
      1.0
      1.0
      0.500000
      0.363636
    
    
      98329
      rdp
      0.2
      1.0
      1.0
      1.0
      0.700000
      0.636364
    
    
      105704
      uclust
      0.76:0.9:1
      1.0
      1.0
      1.0
      0.400000
      0.181818
    
    
      63520
      blast+
      0.001:1:0.99:0.97
      1.0
      1.0
      1.0
      0.666667
      0.363636
    
  








    




mock-12






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      128139
      naive-bayes
      0.001:char:8192:[11,11]:0.7
      0.996677
      0.954140
      0.974945
      0.309524
      0.65
    
    
      128404
      rdp
      0.7
      0.996674
      0.953750
      0.974739
      0.292683
      0.60
    
    
      128799
      naive-bayes-bespoke
      0.001:prior:char:8192:[7,7]:0.5
      0.957738
      0.957738
      0.957738
      0.393939
      0.65
    
    
      128569
      uclust
      0.51:0.99:3
      0.954174
      0.953804
      0.953989
      0.324324
      0.60
    
    
      128289
      vsearch
      1:0.75:0.99
      0.953804
      0.953804
      0.953804
      0.315789
      0.60
    
    
      127599
      blast+
      0.001:100:0.51:0.8
      0.991209
      0.856243
      0.918796
      0.277778
      0.50
    
    
      128469
      sortmerna
      0.51:0.99:5:0.9:1.0
      0.949030
      0.859894
      0.902266
      0.315789
      0.60
    
    
      128484
      blast
      1000
      0.911808
      0.477596
      0.626853
      0.272727
      0.45
    
  








    




mock-19






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      130669
      naive-bayes-bespoke
      0.001:prior:char:8192:[6,6]:0.98
      0.758964
      0.643289
      0.696355
      0.722222
      0.866667
    
    
      130194
      uclust
      1.0:0.9:1
      0.829256
      0.534118
      0.649742
      0.666667
      0.800000
    
    
      129719
      naive-bayes
      0.001:char:8192:[18,18]:0.5
      0.686593
      0.599645
      0.640180
      0.571429
      0.800000
    
    
      129924
      vsearch
      1:0.75:0.99
      0.755874
      0.486854
      0.592246
      0.625000
      0.666667
    
    
      129189
      blast+
      0.001:1:0.51:0.99
      0.617034
      0.569362
      0.592240
      0.500000
      0.800000
    
    
      130119
      blast
      1000
      0.597664
      0.569362
      0.583170
      0.500000
      0.800000
    
    
      130074
      rdp
      1.0
      0.905908
      0.428101
      0.581436
      0.500000
      0.600000
    
    
      130084
      sortmerna
      1.0:0.99:1:0.9:1.0
      0.606123
      0.503654
      0.550158
      0.476190
      0.666667
    
  








    




mock-9






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      134968
      naive-bayes-bespoke
      0.001:prior:char:8192:[11,11]:0.98
      0.658864
      0.384047
      0.485247
      0.500000
      0.416667
    
    
      133573
      sortmerna
      1.0:0.99:5:0.9:1.0
      0.658864
      0.384047
      0.485247
      0.416667
      0.416667
    
    
      133393
      rdp
      0.6
      0.505002
      0.388435
      0.439114
      0.375000
      0.500000
    
    
      132913
      vsearch
      100:0.51:0.97
      0.489466
      0.384047
      0.430395
      0.428571
      0.500000
    
    
      133739
      uclust
      0.76:0.97:3
      0.736017
      0.277288
      0.402818
      0.333333
      0.166667
    
    
      130873
      blast+
      0.001:1:0.51:0.99
      0.418402
      0.384047
      0.400489
      0.428571
      0.500000
    
    
      131504
      naive-bayes
      0.001:char:8192:[16,16]:0.9
      0.440021
      0.277288
      0.340196
      0.272727
      0.250000
    
    
      133679
      blast
      1e-10
      0.277288
      0.277288
      0.277288
      0.312500
      0.416667
    
  








    




mock-1






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      135934
      naive-bayes
      0.001:char:8192:[11,11]:0.0
      0.801097
      0.659208
      0.723259
      0.469388
      0.621622
    
    
      136964
      naive-bayes-bespoke
      0.001:prior:char:8192:[12,12]:0.5
      0.911128
      0.555923
      0.690524
      0.608696
      0.756757
    
    
      136584
      rdp
      0.5
      0.748552
      0.468953
      0.576648
      0.428571
      0.567568
    
    
      136514
      vsearch
      1:0.99:0.9
      0.483693
      0.311953
      0.379288
      0.444444
      0.540541
    
    
      136624
      sortmerna
      1.0:0.99:1:0.9:1.0
      0.482285
      0.302594
      0.371870
      0.466667
      0.567568
    
    
      136764
      uclust
      0.51:0.9:5
      0.414754
      0.272723
      0.329067
      0.326087
      0.405405
    
    
      135779
      blast+
      0.001:100:0.51:0.99
      0.347856
      0.131147
      0.190480
      0.361702
      0.459459
    
    
      136659
      blast
      1000
      0.212792
      0.131297
      0.162394
      0.446809
      0.567568
    
  








    




mock-16






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      138924
      naive-bayes-bespoke
      0.001:prior:char:8192:[6,6]:0.7
      0.895613
      0.877584
      0.886507
      0.547945
      0.80
    
    
      137759
      naive-bayes
      0.001:char:8192:[32,32]:0.5
      0.809916
      0.755805
      0.781925
      0.392405
      0.62
    
    
      137359
      blast+
      0.001:1:0.75:0.8
      0.777234
      0.777234
      0.777234
      0.421053
      0.64
    
    
      138294
      blast
      1000
      0.777234
      0.777234
      0.777234
      0.421053
      0.64
    
    
      138394
      uclust
      1.0:0.99:1
      0.790336
      0.734302
      0.761289
      0.416667
      0.60
    
    
      138264
      sortmerna
      1.0:0.99:5:0.9:1.0
      0.891800
      0.626707
      0.736114
      0.323944
      0.46
    
    
      138134
      vsearch
      100:0.75:0.99
      0.855187
      0.632913
      0.727449
      0.338028
      0.48
    
    
      138204
      rdp
      0.6
      0.742193
      0.680423
      0.709967
      0.367089
      0.58
    
  








    




mock-20






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      140249
      naive-bayes-bespoke
      0.001:prior:char:8192:[10,10]:0.0
      1.000000
      1.000000
      1.000000
      0.772727
      0.894737
    
    
      139099
      blast+
      0.001:1:0.75:0.97
      0.838606
      0.784018
      0.810394
      0.545455
      0.631579
    
    
      139934
      blast
      1e-10
      0.838606
      0.784018
      0.810394
      0.545455
      0.631579
    
    
      139789
      vsearch
      1:0.51:0.97
      0.828898
      0.774943
      0.801013
      0.545455
      0.631579
    
    
      140014
      uclust
      0.51:0.99:3
      0.936591
      0.679946
      0.787896
      0.545455
      0.631579
    
    
      139889
      sortmerna
      1.0:0.9:1:0.9:1.0
      0.755563
      0.706381
      0.730144
      0.500000
      0.578947
    
    
      139229
      naive-bayes
      0.001:char:8192:[6,6]:0.0
      0.729441
      0.621587
      0.671209
      0.521739
      0.631579
    
    
      139859
      rdp
      0.4
      0.705767
      0.553025
      0.620129
      0.458333
      0.578947
    
  








    




mock-21






    







  
    
      
      Method
      Parameters
      Precision
      Recall
      F-measure
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      141864
      naive-bayes-bespoke
      0.001:prior:char:8192:[12,12]:0.94
      1.000000
      1.000000
      1.000000
      0.761905
      0.842105
    
    
      140714
      blast+
      0.001:1:0.75:0.99
      0.822001
      0.798174
      0.809912
      0.550000
      0.578947
    
    
      141564
      blast
      1000
      0.822001
      0.798174
      0.809912
      0.578947
      0.578947
    
    
      141454
      vsearch
      1:0.51:0.99
      0.788340
      0.765489
      0.776746
      0.550000
      0.578947
    
    
      141539
      sortmerna
      0.51:0.99:1:0.9:1.0
      0.785973
      0.763191
      0.774414
      0.500000
      0.526316
    
    
      141639
      uclust
      1.0:0.9:1
      0.780039
      0.760019
      0.769899
      0.473684
      0.473684
    
    
      140864
      naive-bayes
      0.001:char:8192:[6,6]:0.0
      0.955869
      0.552149
      0.699968
      0.523810
      0.578947
    
    
      141494
      rdp
      0.4
      0.955693
      0.549850
      0.698071
      0.476190
      0.526316

Now we can determine which parameter configuration performed best for each method. Count best values in each column indicate how many samples a given method achieved within one mean absolute deviation of the best result (which is why they may sum to more than the total number of samples).



In [31]:

    
for method in mock_results_6['Method'].unique():
    top_params = parameter_comparisons(
        mock_results_6, method, 
        metrics=['Taxon Accuracy Rate', 'Taxon Detection Rate', 'Precision', 'Recall', 'F-measure'])
    display(Markdown('## {0}'.format(method)))
    display(top_params[:5])









    




blast+






    







  
    
      
      F-measure
      Precision
      Recall
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      0.001:1:0.99:0.99
      85.0
      61
      86.0
      85.0
      85.0
    
    
      0.001:1:0.99:0.97
      85.0
      60
      86.0
      83.0
      86.0
    
    
      0.001:1:0.99:0.8
      85.0
      60
      86.0
      83.0
      86.0
    
    
      0.001:1:0.75:0.99
      85.0
      61
      86.0
      85.0
      85.0
    
    
      0.001:1:0.75:0.97
      85.0
      60
      86.0
      83.0
      86.0
    
  








    




naive-bayes






    







  
    
      
      F-measure
      Precision
      Recall
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      0.001:char:8192:[6,6]:0.5
      80.0
      66
      75.0
      73.0
      71.0
    
    
      0.001:char:8192:[6,6]:0.0
      80.0
      63
      80.0
      80.0
      83.0
    
    
      0.001:char:8192:[7,7]:0.5
      80.0
      66
      78.0
      77.0
      76.0
    
    
      0.001:char:8192:[8,8]:0.0
      79.0
      62
      79.0
      81.0
      87.0
    
    
      0.001:char:8192:[9,9]:0.5
      78.0
      65
      77.0
      74.0
      76.0
    
  








    




vsearch






    







  
    
      
      F-measure
      Precision
      Recall
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      1:0.75:0.9
      83
      58
      87
      86.0
      87.0
    
    
      1:0.51:0.9
      83
      58
      87
      86.0
      87.0
    
    
      1:0.99:0.9
      83
      58
      87
      86.0
      87.0
    
    
      1:0.75:0.99
      80
      58
      87
      84.0
      77.0
    
    
      1:0.51:0.97
      80
      58
      87
      83.0
      77.0
    
  








    




rdp






    







  
    
      
      F-measure
      Precision
      Recall
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      0.6
      85
      63
      85
      83
      85
    
    
      0.5
      82
      57
      86
      86
      87
    
    
      0.4
      80
      56
      87
      87
      87
    
    
      0.7
      80
      62
      80
      72
      81
    
    
      0.0
      77
      52
      87
      82
      87
    
  








    




sortmerna






    







  
    
      
      F-measure
      Precision
      Recall
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      0.51:0.99:1:0.9:1.0
      73
      59
      87
      82
      81
    
    
      0.51:0.9:1:0.9:1.0
      73
      58
      87
      76
      86
    
    
      1.0:0.99:1:0.9:1.0
      73
      59
      87
      82
      81
    
    
      1.0:0.9:1:0.9:1.0
      73
      58
      87
      76
      86
    
    
      0.51:0.99:5:0.9:1.0
      71
      64
      73
      61
      68
    
  








    




blast






    







  
    
      
      F-measure
      Precision
      Recall
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      1000
      87
      87
      87
      87
      87
    
    
      1e-10
      87
      87
      87
      87
      87
    
  








    




uclust






    







  
    
      
      F-measure
      Precision
      Recall
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      0.76:0.9:1
      71
      39
      81
      66.0
      85
    
    
      1.0:0.9:1
      71
      39
      81
      66.0
      85
    
    
      0.51:0.9:1
      71
      39
      81
      66.0
      85
    
    
      1.0:0.97:3
      40
      35
      23
      40.0
      31
    
    
      0.76:0.97:3
      40
      35
      23
      40.0
      31
    
  








    




naive-bayes-bespoke






    







  
    
      
      F-measure
      Precision
      Recall
      Taxon Accuracy Rate
      Taxon Detection Rate
    
  
  
    
      0.001:prior:char:8192:[32,32]:0.5
      81.0
      77
      82.0
      27.0
      78.0
    
    
      0.001:prior:char:8192:[32,32]:0.0
      81.0
      77
      82.0
      29.0
      78.0
    
    
      0.001:prior:char:8192:[18,18]:0.0
      76.0
      69
      79.0
      26.0
      79.0
    
    
      0.001:prior:char:8192:[12,12]:0.0
      76.0
      69
      79.0
      27.0
      79.0
    
    
      0.001:prior:char:8192:[18,18]:0.5
      76.0
      69
      79.0
      26.0
      76.0

Optimized method performance

And, finally, which method performed best at each individual taxonomic level for each reference dataset (i.e., for across all fungal and bacterial mock communities combined)?

For this analysis, we rank the top-performing method/parameter combination for each method at family through species levels. Methods are ranked by top F-measure, and the average value for each metric is shown (rather than count best as above). F-measure distributions are plotted for each method, and compared using paired t-tests with FDR-corrected P-values. This cell does not need to be altered, unless if you wish to change the metric used for sorting best methods and for plotting.



In [15]:

    
boxes = rank_optimized_method_performance_by_dataset(mock_results,
                                                     dataset="Reference",
                                                     metric="F-measure",
                                                     level_range=range(4,7),
                                                     display_fields=["Method",
                                                                     "Parameters",
                                                                     "Taxon Accuracy Rate",
                                                                     "Taxon Detection Rate",
                                                                     "Precision",
                                                                     "Recall",
                                                                     "F-measure"],
                                                     paired=True,
                                                     parametric=True,
                                                     color=None,
                                                     color_palette=color_palette)









    




gg_13_8_otus level 4






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      7
      vsearch
      10:0.51:0.97
      0.756907
      0.905563
      0.999280
      0.988253
      0.992761
    
    
      6
      uclust
      0.51:0.9:5
      0.765879
      0.905563
      0.999276
      0.988235
      0.992750
    
    
      1
      blast+
      0.001:10:0.51:0.97
      0.746911
      0.905563
      0.991090
      0.991087
      0.991089
    
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[14,14]:0.98
      0.777150
      0.906435
      0.991992
      0.989532
      0.990705
    
    
      0
      blast
      1000
      0.760281
      0.905563
      0.990108
      0.990093
      0.990101
    
    
      2
      naive-bayes
      0.001:char:8192:[8,8]:0.5
      0.757606
      0.905563
      0.990679
      0.989468
      0.990060
    
    
      5
      sortmerna
      0.51:0.99:1:0.9:1.0
      0.757977
      0.905563
      0.990406
      0.986630
      0.988457
    
    
      4
      rdp
      0.5
      0.752147
      0.892958
      0.986117
      0.981087
      0.983548
    
  








    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      1.007887
      0.322457
      0.401566
    
    
      vsearch
      -0.998911
      0.326706
      0.401566
    
    
      rdp
      2.631308
      0.013885
      0.055542
    
    
      sortmerna
      2.895350
      0.007414
      0.055542
    
    
      blast
      1.002986
      0.324772
      0.401566
    
    
      uclust
      -0.992307
      0.329858
      0.401566
    
    
      naive-bayes-bespoke
      0.286863
      0.776407
      0.776407
    
    
      naive-bayes
      vsearch
      -1.002315
      0.325090
      0.401566
    
    
      rdp
      2.768654
      0.010048
      0.055542
    
    
      sortmerna
      1.005882
      0.323402
      0.401566
    
    
      blast
      -1.131486
      0.267800
      0.401566
    
    
      uclust
      -0.998217
      0.327037
      0.401566
    
    
      naive-bayes-bespoke
      -0.888411
      0.382166
      0.445861
    
    
      vsearch
      rdp
      2.226266
      0.034534
      0.108149
    
    
      sortmerna
      2.703855
      0.011715
      0.055542
    
    
      blast
      1.000424
      0.325988
      0.401566
    
    
      uclust
      1.618347
      0.117212
      0.298358
    
    
      naive-bayes-bespoke
      0.710590
      0.483432
      0.503712
    
    
      rdp
      sortmerna
      -1.807567
      0.081829
      0.229120
    
    
      blast
      -2.771857
      0.009972
      0.055542
    
    
      uclust
      -2.223222
      0.034762
      0.108149
    
    
      naive-bayes-bespoke
      -3.030049
      0.005337
      0.055542
    
    
      sortmerna
      blast
      -1.051756
      0.302234
      0.401566
    
    
      uclust
      -2.695341
      0.011952
      0.055542
    
    
      naive-bayes-bespoke
      -1.205398
      0.238508
      0.401566
    
    
      blast
      uclust
      -0.996268
      0.327965
      0.401566
    
    
      naive-bayes-bespoke
      -0.824819
      0.416702
      0.466706
    
    
      uclust
      naive-bayes-bespoke
      0.706839
      0.485723
      0.503712
    
  








    












    




gg_13_8_otus level 5






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[18,18]:0.92
      0.747524
      0.854652
      0.983594
      0.972504
      0.977910
    
    
      2
      naive-bayes
      0.001:char:8192:[8,8]:0.0
      0.692003
      0.855205
      0.961400
      0.957798
      0.959554
    
    
      4
      rdp
      0.4
      0.695065
      0.833314
      0.960625
      0.941653
      0.950748
    
    
      1
      blast+
      0.001:1:0.51:0.8
      0.642848
      0.762265
      0.961073
      0.864836
      0.908411
    
    
      0
      blast
      1000
      0.642922
      0.762265
      0.961071
      0.864836
      0.908410
    
    
      5
      sortmerna
      0.51:0.99:1:0.9:1.0
      0.624157
      0.734949
      0.968661
      0.857607
      0.907877
    
    
      7
      vsearch
      1:0.51:0.97
      0.633192
      0.732345
      0.979106
      0.847672
      0.904776
    
    
      6
      uclust
      0.51:0.9:3
      0.639338
      0.728730
      0.981724
      0.838934
      0.898139
    
  








    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      -4.442545
      0.000136
      0.000293
    
    
      vsearch
      0.642398
      0.526032
      0.566580
    
    
      rdp
      -3.920839
      0.000545
      0.001018
    
    
      sortmerna
      0.118816
      0.906301
      0.906430
    
    
      blast
      1.000000
      0.326189
      0.397100
    
    
      uclust
      1.283153
      0.210344
      0.280503
    
    
      naive-bayes-bespoke
      -5.166133
      0.000019
      0.000061
    
    
      naive-bayes
      vsearch
      5.716772
      0.000004
      0.000042
    
    
      rdp
      1.448078
      0.159109
      0.247503
    
    
      sortmerna
      5.359695
      0.000012
      0.000061
    
    
      blast
      4.442552
      0.000136
      0.000293
    
    
      uclust
      4.790140
      0.000054
      0.000136
    
    
      naive-bayes-bespoke
      -2.128674
      0.042555
      0.070090
    
    
      vsearch
      rdp
      -5.532257
      0.000007
      0.000051
    
    
      sortmerna
      -0.675267
      0.505248
      0.566580
    
    
      blast
      -0.642277
      0.526110
      0.566580
    
    
      uclust
      1.051171
      0.302498
      0.384997
    
    
      naive-bayes-bespoke
      -6.200592
      0.000001
      0.000035
    
    
      rdp
      sortmerna
      5.081310
      0.000024
      0.000069
    
    
      blast
      3.920841
      0.000545
      0.001018
    
    
      uclust
      5.203513
      0.000018
      0.000061
    
    
      naive-bayes-bespoke
      -3.396978
      0.002127
      0.003722
    
    
      sortmerna
      blast
      -0.118651
      0.906430
      0.906430
    
    
      uclust
      1.360447
      0.184939
      0.272541
    
    
      naive-bayes-bespoke
      -5.915944
      0.000003
      0.000037
    
    
      blast
      uclust
      1.283055
      0.210377
      0.280503
    
    
      naive-bayes-bespoke
      -5.166153
      0.000019
      0.000061
    
    
      uclust
      naive-bayes-bespoke
      -5.295295
      0.000014
      0.000061
    
  








    












    




gg_13_8_otus level 6






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[32,32]:0.5
      0.774533
      0.829707
      0.966365
      0.913330
      0.936868
    
    
      7
      vsearch
      1:0.99:0.97
      0.558085
      0.587260
      0.800121
      0.722006
      0.756215
    
    
      1
      blast+
      0.001:1:0.51:0.8
      0.543011
      0.590632
      0.781600
      0.699559
      0.734336
    
    
      0
      blast
      1000
      0.541281
      0.588847
      0.781597
      0.699503
      0.734306
    
    
      6
      uclust
      0.76:0.9:1
      0.551996
      0.584026
      0.773162
      0.622255
      0.675783
    
    
      2
      naive-bayes
      0.001:char:8192:[6,6]:0.0
      0.539866
      0.645513
      0.775514
      0.494970
      0.582100
    
    
      4
      rdp
      0.4
      0.508823
      0.597903
      0.781453
      0.487433
      0.579487
    
    
      5
      sortmerna
      0.51:0.9:1:0.9:1.0
      0.514926
      0.555999
      0.609748
      0.537393
      0.567987
    
  








    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      2.088036
      4.635904e-02
      7.639518e-02
    
    
      vsearch
      -1.298109
      2.052297e-01
      2.298573e-01
    
    
      rdp
      2.088857
      4.627924e-02
      7.639518e-02
    
    
      sortmerna
      2.954274
      6.425307e-03
      1.802145e-02
    
    
      blast
      1.440004
      1.613617e-01
      1.964403e-01
    
    
      uclust
      1.502020
      1.446982e-01
      1.964403e-01
    
    
      naive-bayes-bespoke
      -6.356296
      8.320207e-07
      3.328083e-06
    
    
      naive-bayes
      vsearch
      -2.609719
      1.460058e-02
      3.442351e-02
    
    
      rdp
      0.742658
      4.641017e-01
      4.998018e-01
    
    
      sortmerna
      0.274238
      7.859889e-01
      8.150996e-01
    
    
      blast
      -2.087791
      4.638279e-02
      7.639518e-02
    
    
      uclust
      -1.451109
      1.582698e-01
      1.964403e-01
    
    
      naive-bayes-bespoke
      -7.573601
      3.800203e-08
      4.912339e-07
    
    
      vsearch
      rdp
      2.605247
      1.475293e-02
      3.442351e-02
    
    
      sortmerna
      3.612115
      1.222973e-03
      4.280406e-03
    
    
      blast
      1.300049
      2.045736e-01
      2.298573e-01
    
    
      uclust
      2.317250
      2.831299e-02
      6.098182e-02
    
    
      naive-bayes-bespoke
      -6.998645
      1.597891e-07
      1.118524e-06
    
    
      rdp
      sortmerna
      0.216540
      8.301950e-01
      8.301950e-01
    
    
      blast
      -2.088614
      4.630287e-02
      7.639518e-02
    
    
      uclust
      -1.479014
      1.507099e-01
      1.964403e-01
    
    
      naive-bayes-bespoke
      -7.441775
      5.263220e-08
      4.912339e-07
    
    
      sortmerna
      blast
      -2.953578
      6.436231e-03
      1.802145e-02
    
    
      uclust
      -1.482821
      1.497014e-01
      1.964403e-01
    
    
      naive-bayes-bespoke
      -8.154994
      9.279879e-09
      2.598366e-07
    
    
      blast
      uclust
      1.501824
      1.447486e-01
      1.964403e-01
    
    
      naive-bayes-bespoke
      -6.358289
      8.277176e-07
      3.328083e-06
    
    
      uclust
      naive-bayes-bespoke
      -6.821250
      2.508812e-07
      1.404934e-06
    
  








    












    




unite_20.11.2016_clean_fullITS level 4






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[8,8]:0.0
      0.812025
      0.645609
      0.980435
      0.980435
      0.980435
    
    
      5
      sortmerna
      0.51:0.99:5:0.9:1.0
      0.766152
      0.591239
      0.977034
      0.950119
      0.961174
    
    
      4
      rdp
      0.0
      0.794482
      0.645609
      0.959546
      0.959546
      0.959546
    
    
      7
      vsearch
      1:0.99:0.99
      0.776349
      0.578472
      0.959308
      0.958882
      0.959094
    
    
      0
      blast
      1000
      0.804539
      0.643187
      0.957139
      0.957139
      0.957139
    
    
      1
      blast+
      0.001:1:0.99:0.8
      0.803329
      0.643187
      0.957139
      0.957139
      0.957139
    
    
      2
      naive-bayes
      0.001:char:8192:[7,7]:0.0
      0.777382
      0.645609
      0.956028
      0.956028
      0.956028
    
    
      6
      uclust
      1.0:0.9:1
      0.716396
      0.573850
      0.955623
      0.945129
      0.949945
    
  








    



/Users/nbokulich/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/statsmodels/stats/multitest.py:320: RuntimeWarning: invalid value encountered in less_equal
  reject = pvals_sorted <= ecdffactor*alpha






    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      0.434012
      0.665890
      NaN
    
    
      vsearch
      -1.044782
      0.300459
      NaN
    
    
      rdp
      -1.285267
      0.203808
      NaN
    
    
      sortmerna
      -1.874409
      0.065913
      NaN
    
    
      blast
      NaN
      NaN
      NaN
    
    
      uclust
      2.436316
      0.017929
      NaN
    
    
      naive-bayes-bespoke
      -2.732427
      0.008319
      NaN
    
    
      naive-bayes
      vsearch
      -1.811657
      0.075218
      NaN
    
    
      rdp
      -1.887903
      0.064045
      NaN
    
    
      sortmerna
      -1.848973
      0.069560
      NaN
    
    
      blast
      -0.434012
      0.665890
      NaN
    
    
      uclust
      2.440440
      0.017745
      NaN
    
    
      naive-bayes-bespoke
      -2.509888
      0.014890
      NaN
    
    
      vsearch
      rdp
      -2.458122
      0.016975
      NaN
    
    
      sortmerna
      -1.837460
      0.071266
      NaN
    
    
      blast
      1.044782
      0.300459
      NaN
    
    
      uclust
      2.640136
      0.010629
      NaN
    
    
      naive-bayes-bespoke
      -2.556342
      0.013219
      NaN
    
    
      rdp
      sortmerna
      -1.689226
      0.096546
      NaN
    
    
      blast
      1.285267
      0.203808
      NaN
    
    
      uclust
      2.663527
      0.009994
      NaN
    
    
      naive-bayes-bespoke
      -2.550530
      0.013418
      NaN
    
    
      sortmerna
      blast
      1.874409
      0.065913
      NaN
    
    
      uclust
      2.606787
      0.011599
      NaN
    
    
      naive-bayes-bespoke
      -2.561454
      0.013046
      NaN
    
    
      blast
      uclust
      2.436316
      0.017929
      NaN
    
    
      naive-bayes-bespoke
      -2.732427
      0.008319
      NaN
    
    
      uclust
      naive-bayes-bespoke
      -2.687164
      0.009387
      NaN
    
  








    












    




unite_20.11.2016_clean_fullITS level 5






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[8,8]:0.5
      0.805223
      0.645609
      0.980435
      0.980435
      0.980435
    
    
      5
      sortmerna
      0.51:0.99:5:0.9:1.0
      0.779259
      0.595724
      0.977034
      0.950119
      0.961174
    
    
      7
      vsearch
      1:0.99:0.99
      0.779533
      0.569530
      0.959308
      0.958882
      0.959094
    
    
      4
      rdp
      0.5
      0.777914
      0.627119
      0.958893
      0.957333
      0.958104
    
    
      1
      blast+
      0.001:1:0.99:0.97
      0.781244
      0.625000
      0.958344
      0.954926
      0.956571
    
    
      0
      blast
      1000
      0.781527
      0.625000
      0.954926
      0.954926
      0.954926
    
    
      2
      naive-bayes
      0.001:char:8192:[6,6]:0.0
      0.766466
      0.617874
      0.953815
      0.953815
      0.953815
    
    
      6
      uclust
      0.76:0.9:1
      0.697358
      0.542758
      0.953015
      0.942916
      0.947551
    
  








    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      1.247486
      0.217233
      0.225279
    
    
      vsearch
      -2.187984
      0.032713
      0.049045
    
    
      rdp
      -1.527398
      0.132098
      0.142259
    
    
      sortmerna
      -2.502258
      0.015182
      0.035424
    
    
      blast
      1.790685
      0.078563
      0.091657
    
    
      uclust
      2.652495
      0.010289
      0.035424
    
    
      naive-bayes-bespoke
      -2.659315
      0.010105
      0.035424
    
    
      naive-bayes
      vsearch
      -2.180630
      0.033280
      0.049045
    
    
      rdp
      -2.016902
      0.048343
      0.061527
    
    
      sortmerna
      -2.112928
      0.038921
      0.054490
    
    
      blast
      -0.434012
      0.665890
      0.665890
    
    
      uclust
      2.467433
      0.016581
      0.035484
    
    
      naive-bayes-bespoke
      -2.516014
      0.014659
      0.035424
    
    
      vsearch
      rdp
      2.291140
      0.025607
      0.044812
    
    
      sortmerna
      -1.837460
      0.071266
      0.086759
    
    
      blast
      2.030208
      0.046930
      0.061527
    
    
      uclust
      2.703904
      0.008978
      0.035424
    
    
      naive-bayes-bespoke
      -2.556342
      0.013219
      0.035424
    
    
      rdp
      sortmerna
      -2.206099
      0.031352
      0.049045
    
    
      blast
      1.656405
      0.103041
      0.115405
    
    
      uclust
      2.686972
      0.009392
      0.035424
    
    
      naive-bayes-bespoke
      -2.552293
      0.013357
      0.035424
    
    
      sortmerna
      blast
      2.407623
      0.019258
      0.035949
    
    
      uclust
      2.658218
      0.010135
      0.035424
    
    
      naive-bayes-bespoke
      -2.561454
      0.013046
      0.035424
    
    
      blast
      uclust
      2.440505
      0.017742
      0.035484
    
    
      naive-bayes-bespoke
      -2.722625
      0.008540
      0.035424
    
    
      uclust
      naive-bayes-bespoke
      -2.680432
      0.009556
      0.035424
    
  








    












    




unite_20.11.2016_clean_fullITS level 6






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[7,7]:0.94
      0.669836
      0.546032
      0.972887
      0.929592
      0.944469
    
    
      5
      sortmerna
      1.0:0.99:5:0.9:1.0
      0.666131
      0.534669
      0.971207
      0.925385
      0.939629
    
    
      4
      rdp
      0.6
      0.658445
      0.556626
      0.941823
      0.925922
      0.932765
    
    
      7
      vsearch
      1:0.75:0.9
      0.661119
      0.554507
      0.934220
      0.925922
      0.929675
    
    
      1
      blast+
      0.001:1:0.99:0.99
      0.675403
      0.542180
      0.931071
      0.925258
      0.928022
    
    
      2
      naive-bayes
      0.001:char:8192:[11,11]:0.98
      0.618735
      0.501348
      0.935384
      0.878948
      0.902295
    
    
      0
      blast
      1000
      0.575196
      0.481317
      0.864703
      0.864703
      0.864703
    
    
      6
      uclust
      0.51:0.9:1
      0.475223
      0.385208
      0.861520
      0.856609
      0.858862
    
  








    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      4.870672
      8.955166e-06
      2.089539e-05
    
    
      vsearch
      -1.226348
      2.250235e-01
      2.250235e-01
    
    
      rdp
      -2.585395
      1.226227e-02
      1.907465e-02
    
    
      sortmerna
      -2.266427
      2.717148e-02
      3.170006e-02
    
    
      blast
      7.416245
      5.853776e-10
      2.731762e-09
    
    
      uclust
      8.305885
      1.884853e-11
      3.122788e-10
    
    
      naive-bayes-bespoke
      -2.345272
      2.245560e-02
      2.733725e-02
    
    
      naive-bayes
      vsearch
      -4.758296
      1.338066e-05
      2.881987e-05
    
    
      rdp
      -5.038219
      4.891520e-06
      1.245114e-05
    
    
      sortmerna
      -4.462116
      3.790479e-05
      7.580959e-05
    
    
      blast
      3.342155
      1.459689e-03
      2.404193e-03
    
    
      uclust
      4.163979
      1.051109e-04
      1.839441e-04
    
    
      naive-bayes-bespoke
      -4.261095
      7.565123e-05
      1.412156e-04
    
    
      vsearch
      rdp
      -2.554263
      1.328990e-02
      1.958512e-02
    
    
      sortmerna
      -2.438366
      1.783744e-02
      2.378326e-02
    
    
      blast
      7.325742
      8.310027e-10
      3.324011e-09
    
    
      uclust
      8.156795
      3.345844e-11
      3.122788e-10
    
    
      naive-bayes-bespoke
      -2.393299
      1.995446e-02
      2.539659e-02
    
    
      rdp
      sortmerna
      -2.022239
      4.777186e-02
      4.962695e-02
    
    
      blast
      7.462607
      4.892038e-10
      2.731762e-09
    
    
      uclust
      8.261010
      2.239977e-11
      3.122788e-10
    
    
      naive-bayes-bespoke
      -2.182273
      3.315302e-02
      3.713139e-02
    
    
      sortmerna
      blast
      6.915088
      4.069873e-09
      1.266183e-08
    
    
      uclust
      7.443678
      5.263979e-10
      2.731762e-09
    
    
      naive-bayes-bespoke
      -2.021463
      4.785456e-02
      4.962695e-02
    
    
      blast
      uclust
      2.474556
      1.628602e-02
      2.280042e-02
    
    
      naive-bayes-bespoke
      -6.618927
      1.275875e-08
      3.572450e-08
    
    
      uclust
      naive-bayes-bespoke
      -6.996678
      2.969028e-09
      1.039160e-08



In [16]:

    
for k, v in boxes.items():
    v.get_figure().savefig(join(outdir, 'mock-fmeasure-{0}-boxplots.pdf'.format(k)))



In [17]:

    
for metric in ["Taxon Accuracy Rate", "Taxon Detection Rate", "Precision", "Recall", "F-measure"]:
    display(Markdown('## {0}'.format(metric)))
    boxes = rank_optimized_method_performance_by_dataset(mock_results,
                                                         dataset="Reference",
                                                         metric=metric,
                                                         level_range=range(6,7),
                                                         display_fields=["Method",
                                                                         "Parameters",
                                                                         "Taxon Accuracy Rate",
                                                                         "Taxon Detection Rate",
                                                                         "Precision",
                                                                         "Recall",
                                                                         "F-measure"],
                                                         paired=True,
                                                         parametric=True,
                                                         color=None,
                                                         color_palette=color_palette)
    for k, v in boxes.items():
        v.get_figure().savefig(join(outdir, 'mock-{0}-{1}-boxplots.pdf'.format(metric, k)))









    




Taxon Accuracy Rate






    




gg_13_8_otus level 6






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[4,4]:0.0
      0.927823
      0.597525
      0.516160
      0.425484
      0.458826
    
    
      7
      vsearch
      1:0.99:0.99
      0.562169
      0.587260
      0.799432
      0.720764
      0.755187
    
    
      6
      uclust
      0.76:0.9:1
      0.551996
      0.584026
      0.773162
      0.622255
      0.675783
    
    
      1
      blast+
      0.001:1:0.99:0.99
      0.544272
      0.586863
      0.781846
      0.698317
      0.733760
    
    
      0
      blast
      1000
      0.541281
      0.588847
      0.781597
      0.699503
      0.734306
    
    
      2
      naive-bayes
      0.001:char:8192:[6,6]:0.0
      0.539866
      0.645513
      0.775514
      0.494970
      0.582100
    
    
      5
      sortmerna
      0.51:0.99:1:0.9:1.0
      0.525377
      0.554015
      0.611236
      0.536151
      0.567907
    
    
      4
      rdp
      0.4
      0.508823
      0.597903
      0.781453
      0.487433
      0.579487
    
  








    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      0.212505
      8.333096e-01
      8.641729e-01
    
    
      vsearch
      -2.425184
      2.226441e-02
      4.156023e-02
    
    
      rdp
      2.015700
      5.388359e-02
      8.381892e-02
    
    
      sortmerna
      3.848723
      6.594780e-04
      1.678671e-03
    
    
      blast
      0.614763
      5.438580e-01
      6.091210e-01
    
    
      uclust
      -0.555331
      5.832409e-01
      6.281055e-01
    
    
      naive-bayes-bespoke
      -27.588393
      2.562691e-21
      3.587768e-20
    
    
      naive-bayes
      vsearch
      -1.264896
      2.167174e-01
      3.193730e-01
    
    
      rdp
      4.735549
      6.199705e-05
      1.928797e-04
    
    
      sortmerna
      0.750834
      4.592468e-01
      5.590830e-01
    
    
      blast
      -0.076405
      9.396601e-01
      9.396601e-01
    
    
      uclust
      -0.627457
      5.356306e-01
      6.091210e-01
    
    
      naive-bayes-bespoke
      -16.486575
      1.279261e-15
      5.117046e-15
    
    
      vsearch
      rdp
      3.993466
      4.502844e-04
      1.260796e-03
    
    
      sortmerna
      5.067429
      2.540923e-05
      8.893231e-05
    
    
      blast
      2.773121
      9.942136e-03
      2.141383e-02
    
    
      uclust
      1.132319
      2.674567e-01
      3.744393e-01
    
    
      naive-bayes-bespoke
      -25.700375
      1.623596e-20
      1.136517e-19
    
    
      rdp
      sortmerna
      -1.049403
      3.032955e-01
      4.043940e-01
    
    
      blast
      -2.119006
      4.343360e-02
      7.153770e-02
    
    
      uclust
      -3.123757
      4.232996e-03
      9.876990e-03
    
    
      naive-bayes-bespoke
      -20.249627
      7.436687e-18
      4.164545e-17
    
    
      sortmerna
      blast
      -2.528026
      1.762822e-02
      3.525644e-02
    
    
      uclust
      -2.241176
      3.343682e-02
      5.851443e-02
    
    
      naive-bayes-bespoke
      -25.840557
      1.409610e-20
      1.136517e-19
    
    
      blast
      uclust
      -0.822193
      4.181689e-01
      5.322149e-01
    
    
      naive-bayes-bespoke
      -28.023011
      1.703655e-21
      3.587768e-20
    
    
      uclust
      naive-bayes-bespoke
      -19.500750
      1.931703e-17
      9.014613e-17
    
  








    












    




unite_20.11.2016_clean_fullITS level 6






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[4,4]:0.0
      0.864407
      0.240370
      0.592765
      0.592765
      0.592765
    
    
      1
      blast+
      0.001:1:0.99:0.99
      0.675403
      0.542180
      0.931071
      0.925258
      0.928022
    
    
      5
      sortmerna
      0.51:0.99:1:0.9:1.0
      0.670776
      0.526194
      0.925307
      0.916357
      0.920407
    
    
      4
      rdp
      0.3
      0.662121
      0.556626
      0.925922
      0.925922
      0.925922
    
    
      7
      vsearch
      1:0.75:0.9
      0.661119
      0.554507
      0.934220
      0.925922
      0.929675
    
    
      2
      naive-bayes
      0.001:char:8192:[16,16]:0.7
      0.631612
      0.514831
      0.905683
      0.899683
      0.902162
    
    
      0
      blast
      1000
      0.575196
      0.481317
      0.864703
      0.864703
      0.864703
    
    
      6
      uclust
      0.76:0.99:3
      0.542615
      0.289869
      0.940503
      0.433466
      0.525933
    
  








    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      2.960666
      4.441401e-03
      6.908846e-03
    
    
      vsearch
      1.893375
      6.330007e-02
      7.706096e-02
    
    
      rdp
      1.802789
      7.661724e-02
      8.938678e-02
    
    
      sortmerna
      0.965708
      3.381996e-01
      3.507255e-01
    
    
      blast
      10.935829
      1.015963e-15
      5.689390e-15
    
    
      uclust
      12.309256
      8.128480e-18
      7.586582e-17
    
    
      naive-bayes-bespoke
      -5.694165
      4.327017e-07
      8.077098e-07
    
    
      naive-bayes
      vsearch
      -2.430404
      1.819612e-02
      2.426150e-02
    
    
      rdp
      -2.395286
      1.985656e-02
      2.527199e-02
    
    
      sortmerna
      -2.866801
      5.769954e-03
      8.503090e-03
    
    
      blast
      4.033016
      1.628866e-04
      2.682838e-04
    
    
      uclust
      4.796721
      1.166833e-05
      2.041958e-05
    
    
      naive-bayes-bespoke
      -6.545632
      1.691660e-08
      4.306044e-08
    
    
      vsearch
      rdp
      -0.686158
      4.953473e-01
      4.953473e-01
    
    
      sortmerna
      -1.378781
      1.732569e-01
      1.940477e-01
    
    
      blast
      14.430259
      7.525972e-21
      1.053636e-19
    
    
      uclust
      9.883011
      4.792097e-14
      1.677234e-13
    
    
      naive-bayes-bespoke
      -5.944326
      1.683483e-07
      3.855858e-07
    
    
      rdp
      sortmerna
      -1.263380
      2.115076e-01
      2.277774e-01
    
    
      blast
      14.848080
      2.033422e-21
      5.693581e-20
    
    
      uclust
      10.117131
      2.012525e-14
      8.050099e-14
    
    
      naive-bayes-bespoke
      -5.928103
      1.790220e-07
      3.855858e-07
    
    
      sortmerna
      blast
      10.424247
      6.506941e-15
      3.036572e-14
    
    
      uclust
      11.900729
      3.334295e-17
      2.334007e-16
    
    
      naive-bayes-bespoke
      -5.750872
      3.496136e-07
      6.992271e-07
    
    
      blast
      uclust
      2.744934
      8.043711e-03
      1.126119e-02
    
    
      naive-bayes-bespoke
      -8.396074
      1.332759e-11
      3.731726e-11
    
    
      uclust
      naive-bayes-bespoke
      -8.697020
      4.207145e-12
      1.308889e-11
    
  








    












    




Taxon Detection Rate






    




gg_13_8_otus level 6






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[18,18]:0.5
      0.779585
      0.843331
      0.802810
      0.747975
      0.772246
    
    
      2
      naive-bayes
      0.001:char:8192:[6,6]:0.0
      0.539866
      0.645513
      0.775514
      0.494970
      0.582100
    
    
      4
      rdp
      0.0
      0.494357
      0.609701
      0.765818
      0.488003
      0.574161
    
    
      7
      vsearch
      1:0.75:0.9
      0.560528
      0.590832
      0.799200
      0.722006
      0.755926
    
    
      1
      blast+
      0.001:1:0.51:0.8
      0.543011
      0.590632
      0.781600
      0.699559
      0.734336
    
    
      0
      blast
      1000
      0.541281
      0.588847
      0.781597
      0.699503
      0.734306
    
    
      6
      uclust
      0.76:0.9:1
      0.551996
      0.584026
      0.773162
      0.622255
      0.675783
    
    
      5
      sortmerna
      0.51:0.9:1:0.9:1.0
      0.514926
      0.555999
      0.609748
      0.537393
      0.567987
    
  








    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      -3.146365
      4.001401e-03
      7.002452e-03
    
    
      vsearch
      -0.021954
      9.826459e-01
      9.826459e-01
    
    
      rdp
      -1.065996
      2.958653e-01
      3.765559e-01
    
    
      sortmerna
      5.085631
      2.419690e-05
      6.159212e-05
    
    
      blast
      1.000000
      3.261889e-01
      3.970995e-01
    
    
      uclust
      0.586579
      5.623593e-01
      6.298424e-01
    
    
      naive-bayes-bespoke
      -16.172081
      2.054684e-15
      1.438279e-14
    
    
      naive-bayes
      vsearch
      3.474414
      1.744457e-03
      3.488914e-03
    
    
      rdp
      6.071805
      1.751311e-06
      6.129590e-06
    
    
      sortmerna
      5.158391
      1.990316e-05
      5.572885e-05
    
    
      blast
      3.307334
      2.670682e-03
      4.985273e-03
    
    
      uclust
      3.538778
      1.478264e-03
      3.183954e-03
    
    
      naive-bayes-bespoke
      -11.253051
      1.064159e-11
      4.256636e-11
    
    
      vsearch
      rdp
      -1.124412
      2.707355e-01
      3.609807e-01
    
    
      sortmerna
      5.970468
      2.286950e-06
      7.114955e-06
    
    
      blast
      0.214412
      8.318365e-01
      8.626453e-01
    
    
      uclust
      0.589668
      5.603157e-01
      6.298424e-01
    
    
      naive-bayes-bespoke
      -23.087499
      2.598622e-19
      7.276142e-18
    
    
      rdp
      sortmerna
      3.099045
      4.500807e-03
      7.413094e-03
    
    
      blast
      1.189406
      2.446349e-01
      3.424888e-01
    
    
      uclust
      1.548325
      1.331869e-01
      1.962754e-01
    
    
      naive-bayes-bespoke
      -11.845335
      3.337428e-12
      1.557467e-11
    
    
      sortmerna
      blast
      -4.764394
      5.737240e-05
      1.338689e-04
    
    
      uclust
      -2.500941
      1.875395e-02
      2.917281e-02
    
    
      naive-bayes-bespoke
      -21.467696
      1.679305e-18
      2.351026e-17
    
    
      blast
      uclust
      0.411067
      6.842682e-01
      7.369042e-01
    
    
      naive-bayes-bespoke
      -16.413983
      1.426167e-15
      1.331089e-14
    
    
      uclust
      naive-bayes-bespoke
      -13.861357
      8.559562e-14
      4.793355e-13
    
  








    












    




unite_20.11.2016_clean_fullITS level 6






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[10,10]:0.0
      0.679121
      0.569337
      0.931322
      0.931322
      0.931322
    
    
      4
      rdp
      0.0
      0.660617
      0.556626
      0.925922
      0.925922
      0.925922
    
    
      7
      vsearch
      1:0.75:0.9
      0.661119
      0.554507
      0.934220
      0.925922
      0.929675
    
    
      5
      sortmerna
      0.51:0.9:1:0.9:1.0
      0.653569
      0.543914
      0.919525
      0.916357
      0.917861
    
    
      1
      blast+
      0.001:1:0.99:0.99
      0.675403
      0.542180
      0.931071
      0.925258
      0.928022
    
    
      2
      naive-bayes
      0.001:char:8192:[9,9]:0.0
      0.625942
      0.521186
      0.880926
      0.880926
      0.880926
    
    
      0
      blast
      1000
      0.575196
      0.481317
      0.864703
      0.864703
      0.864703
    
    
      6
      uclust
      0.76:0.9:1
      0.475223
      0.385208
      0.861520
      0.856609
      0.858862
    
  








    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      2.112055
      3.899930e-02
      4.199925e-02
    
    
      vsearch
      -2.656508
      1.018057e-02
      1.246009e-02
    
    
      rdp
      -2.882191
      5.529512e-03
      8.148755e-03
    
    
      sortmerna
      -0.271243
      7.871675e-01
      7.871675e-01
    
    
      blast
      9.613323
      1.310932e-13
      3.058842e-13
    
    
      uclust
      16.525722
      1.316026e-23
      7.369743e-23
    
    
      naive-bayes-bespoke
      -3.985024
      1.909369e-04
      3.144843e-04
    
    
      naive-bayes
      vsearch
      -4.194349
      9.487211e-05
      1.770946e-04
    
    
      rdp
      -4.734417
      1.456617e-05
      3.102620e-05
    
    
      sortmerna
      -3.197671
      2.244873e-03
      3.492025e-03
    
    
      blast
      4.716673
      1.551310e-05
      3.102620e-05
    
    
      uclust
      16.360482
      2.130221e-23
      9.941031e-23
    
    
      naive-bayes-bespoke
      -4.052056
      1.528982e-04
      2.675718e-04
    
    
      vsearch
      rdp
      -1.000000
      3.214644e-01
      3.333705e-01
    
    
      sortmerna
      2.752889
      7.873227e-03
      1.094725e-02
    
    
      blast
      15.875842
      8.910020e-23
      3.118507e-22
    
    
      uclust
      20.353985
      4.298311e-28
      6.017636e-27
    
    
      naive-bayes-bespoke
      -2.654483
      1.023507e-02
      1.246009e-02
    
    
      rdp
      sortmerna
      2.417288
      1.880108e-02
      2.105720e-02
    
    
      blast
      16.675607
      8.525563e-24
      5.967894e-23
    
    
      uclust
      20.424674
      3.601502e-28
      6.017636e-27
    
    
      naive-bayes-bespoke
      -2.417288
      1.880108e-02
      2.105720e-02
    
    
      sortmerna
      blast
      10.095747
      2.177971e-14
      5.543925e-14
    
    
      uclust
      19.893413
      1.376530e-27
      1.284761e-26
    
    
      naive-bayes-bespoke
      -2.737302
      8.210440e-03
      1.094725e-02
    
    
      blast
      uclust
      14.530769
      5.482526e-21
      1.705675e-20
    
    
      naive-bayes-bespoke
      -12.257340
      9.713986e-18
      2.719916e-17
    
    
      uclust
      naive-bayes-bespoke
      -15.993910
      6.271512e-23
      2.508605e-22
    
  








    












    




Precision






    




gg_13_8_otus level 6






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      7
      vsearch
      100:0.99:0.97
      0.191464
      0.186578
      0.988600
      0.157820
      0.260185
    
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[6,6]:0.98
      0.709284
      0.723084
      0.986527
      0.536697
      0.675678
    
    
      1
      blast+
      0.001:100:0.99:0.97
      0.182957
      0.183682
      0.976018
      0.158007
      0.257228
    
    
      4
      rdp
      1.0
      0.172868
      0.194529
      0.941480
      0.160176
      0.238734
    
    
      5
      sortmerna
      1.0:0.99:5:0.9:1.0
      0.378407
      0.392316
      0.911405
      0.292386
      0.428211
    
    
      2
      naive-bayes
      0.001:char:8192:[32,32]:0.94
      0.444776
      0.513583
      0.906138
      0.429301
      0.552925
    
    
      6
      uclust
      1.0:0.9:5
      0.309652
      0.326077
      0.862718
      0.255452
      0.381379
    
    
      0
      blast
      1000
      0.541281
      0.588847
      0.781597
      0.699503
      0.734306
    
  








    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      4.069772
      3.678877e-04
      0.001288
    
    
      vsearch
      -1.000000
      3.261889e-01
      0.374225
    
    
      rdp
      1.282581
      2.105411e-01
      0.280721
    
    
      sortmerna
      2.518376
      1.802194e-02
      0.031538
    
    
      blast
      4.875675
      4.253952e-05
      0.000199
    
    
      uclust
      2.757891
      1.030878e-02
      0.019243
    
    
      naive-bayes-bespoke
      -0.983424
      3.341298e-01
      0.374225
    
    
      naive-bayes
      vsearch
      -5.672167
      5.039983e-06
      0.000047
    
    
      rdp
      -1.241121
      2.252367e-01
      0.286665
    
    
      sortmerna
      -0.182603
      8.564735e-01
      0.856474
    
    
      blast
      2.772028
      9.968046e-03
      0.019243
    
    
      uclust
      1.003192
      3.246748e-01
      0.374225
    
    
      naive-bayes-bespoke
      -5.872902
      2.959248e-06
      0.000041
    
    
      vsearch
      rdp
      2.166629
      3.925591e-02
      0.064657
    
    
      sortmerna
      3.091001
      4.591423e-03
      0.009920
    
    
      blast
      5.387723
      1.076473e-05
      0.000061
    
    
      uclust
      3.381684
      2.211221e-03
      0.006191
    
    
      naive-bayes-bespoke
      0.323798
      7.485860e-01
      0.776311
    
    
      rdp
      sortmerna
      0.955995
      3.475567e-01
      0.374292
    
    
      blast
      4.122572
      3.197686e-04
      0.001279
    
    
      uclust
      1.797141
      8.350843e-02
      0.116912
    
    
      naive-bayes-bespoke
      -1.999219
      5.574173e-02
      0.086709
    
    
      sortmerna
      blast
      6.476723
      6.085954e-07
      0.000017
    
    
      uclust
      1.955395
      6.095940e-02
      0.089835
    
    
      naive-bayes-bespoke
      -3.089727
      4.605942e-03
      0.009920
    
    
      blast
      uclust
      -3.527097
      1.523446e-03
      0.004740
    
    
      naive-bayes-bespoke
      -5.381372
      1.094920e-05
      0.000061
    
    
      uclust
      naive-bayes-bespoke
      -3.269367
      2.939558e-03
      0.007483
    
  








    












    




unite_20.11.2016_clean_fullITS level 6






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[9,9]:0.94
      0.665330
      0.548151
      0.973077
      0.924192
      0.939285
    
    
      5
      sortmerna
      1.0:0.99:5:0.9:1.0
      0.666131
      0.534669
      0.971207
      0.925385
      0.939629
    
    
      1
      blast+
      0.001:10:0.99:0.99
      0.511648
      0.404468
      0.966035
      0.801723
      0.864648
    
    
      4
      rdp
      1.0
      0.473145
      0.366140
      0.942963
      0.742323
      0.821206
    
    
      6
      uclust
      0.76:0.99:3
      0.542615
      0.289869
      0.940503
      0.433466
      0.525933
    
    
      7
      vsearch
      100:0.51:0.97
      0.446615
      0.358243
      0.938101
      0.754332
      0.829143
    
    
      2
      naive-bayes
      0.001:char:8192:[11,11]:0.98
      0.618735
      0.501348
      0.935384
      0.878948
      0.902295
    
    
      0
      blast
      1000
      0.575196
      0.481317
      0.864703
      0.864703
      0.864703
    
  








    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      2.213613
      3.080216e-02
      4.934976e-02
    
    
      vsearch
      2.441905
      1.768009e-02
      3.300283e-02
    
    
      rdp
      1.804243
      7.638628e-02
      1.018484e-01
    
    
      sortmerna
      -1.778564
      8.055237e-02
      1.025212e-01
    
    
      blast
      4.799663
      1.154644e-05
      4.618575e-05
    
    
      uclust
      2.199155
      3.186780e-02
      4.934976e-02
    
    
      naive-bayes-bespoke
      -3.318700
      1.566458e-03
      5.482603e-03
    
    
      naive-bayes
      vsearch
      -0.804590
      4.243424e-01
      4.400588e-01
    
    
      rdp
      -1.409506
      1.640252e-01
      1.996829e-01
    
    
      sortmerna
      -2.443592
      1.760549e-02
      3.300283e-02
    
    
      blast
      6.961497
      3.401614e-09
      9.524518e-08
    
    
      uclust
      -2.030143
      4.693685e-02
      6.571158e-02
    
    
      naive-bayes-bespoke
      -2.607397
      1.158017e-02
      2.702041e-02
    
    
      vsearch
      rdp
      -1.068476
      2.897326e-01
      3.245005e-01
    
    
      sortmerna
      -2.783531
      7.247130e-03
      2.029196e-02
    
    
      blast
      6.110295
      8.959916e-08
      7.133179e-07
    
    
      uclust
      -1.171414
      2.462224e-01
      2.872594e-01
    
    
      naive-bayes-bespoke
      -2.926724
      4.885017e-03
      1.519783e-02
    
    
      rdp
      sortmerna
      -2.177976
      3.348734e-02
      4.934976e-02
    
    
      blast
      6.076501
      1.019026e-07
      7.133179e-07
    
    
      uclust
      0.548422
      5.855073e-01
      5.855073e-01
    
    
      naive-bayes-bespoke
      -2.360708
      2.162280e-02
      3.783990e-02
    
    
      sortmerna
      blast
      4.925350
      7.356967e-06
      3.433251e-05
    
    
      uclust
      2.481147
      1.601693e-02
      3.300283e-02
    
    
      naive-bayes-bespoke
      -1.000000
      3.214644e-01
      3.461924e-01
    
    
      blast
      uclust
      -6.491661
      2.081736e-08
      2.914430e-07
    
    
      naive-bayes-bespoke
      -5.030011
      5.039397e-06
      2.822062e-05
    
    
      uclust
      naive-bayes-bespoke
      -2.681775
      9.522331e-03
      2.423866e-02
    
  








    












    




Recall






    




gg_13_8_otus level 6






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[32,32]:0.0
      0.779569
      0.829707
      0.964806
      0.913330
      0.936223
    
    
      7
      vsearch
      1:0.75:0.9
      0.560528
      0.590832
      0.799200
      0.722006
      0.755926
    
    
      1
      blast+
      0.001:1:0.51:0.8
      0.543011
      0.590632
      0.781600
      0.699559
      0.734336
    
    
      0
      blast
      1000
      0.541281
      0.588847
      0.781597
      0.699503
      0.734306
    
    
      6
      uclust
      1.0:0.9:1
      0.551996
      0.584026
      0.773162
      0.622255
      0.675783
    
    
      5
      sortmerna
      0.51:0.9:1:0.9:1.0
      0.514926
      0.555999
      0.609748
      0.537393
      0.567987
    
    
      2
      naive-bayes
      0.001:char:8192:[32,32]:0.0
      0.487359
      0.602108
      0.563041
      0.512789
      0.535682
    
    
      4
      rdp
      0.0
      0.494357
      0.609701
      0.765818
      0.488003
      0.574161
    
  








    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      2.219574
      3.503735e-02
      5.775258e-02
    
    
      vsearch
      -1.077105
      2.909632e-01
      3.133450e-01
    
    
      rdp
      2.593803
      1.514960e-02
      3.032139e-02
    
    
      sortmerna
      2.928088
      6.848186e-03
      1.748781e-02
    
    
      blast
      1.440160
      1.613178e-01
      2.053135e-01
    
    
      uclust
      1.599400
      1.213698e-01
      1.700039e-01
    
    
      naive-bayes-bespoke
      -6.993639
      1.618284e-07
      9.062388e-07
    
    
      naive-bayes
      vsearch
      -2.689181
      1.212630e-02
      2.829469e-02
    
    
      rdp
      1.391082
      1.755587e-01
      2.137237e-01
    
    
      sortmerna
      -0.431088
      6.698265e-01
      6.698265e-01
    
    
      blast
      -2.219221
      3.506407e-02
      5.775258e-02
    
    
      uclust
      -1.449248
      1.587845e-01
      2.053135e-01
    
    
      naive-bayes-bespoke
      -6.550125
      5.033497e-07
      2.013399e-06
    
    
      vsearch
      rdp
      3.140113
      4.064197e-03
      1.264417e-02
    
    
      sortmerna
      3.648799
      1.111936e-03
      3.891775e-03
    
    
      blast
      1.079968
      2.897097e-01
      3.133450e-01
    
    
      uclust
      2.337065
      2.710111e-02
      5.058874e-02
    
    
      naive-bayes-bespoke
      -8.091499
      1.080140e-08
      1.512197e-07
    
    
      rdp
      sortmerna
      -0.858511
      3.981678e-01
      4.129147e-01
    
    
      blast
      -2.593487
      1.516069e-02
      3.032139e-02
    
    
      uclust
      -1.922180
      6.519395e-02
      1.014128e-01
    
    
      naive-bayes-bespoke
      -7.466465
      4.950939e-08
      4.620876e-07
    
    
      sortmerna
      blast
      -2.926767
      6.870211e-03
      1.748781e-02
    
    
      uclust
      -1.103587
      2.795113e-01
      3.133450e-01
    
    
      naive-bayes-bespoke
      -8.440399
      4.720679e-09
      1.321790e-07
    
    
      blast
      uclust
      1.599123
      1.214314e-01
      1.700039e-01
    
    
      naive-bayes-bespoke
      -6.997849
      1.601118e-07
      9.062388e-07
    
    
      uclust
      naive-bayes-bespoke
      -6.850799
      2.326614e-07
      1.085753e-06
    
  








    












    




unite_20.11.2016_clean_fullITS level 6






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[10,10]:0.0
      0.679121
      0.569337
      0.931322
      0.931322
      0.931322
    
    
      4
      rdp
      0.0
      0.660617
      0.556626
      0.925922
      0.925922
      0.925922
    
    
      7
      vsearch
      1:0.75:0.9
      0.661119
      0.554507
      0.934220
      0.925922
      0.929675
    
    
      5
      sortmerna
      0.51:0.99:5:0.9:1.0
      0.662872
      0.534669
      0.947962
      0.925385
      0.934021
    
    
      1
      blast+
      0.001:1:0.99:0.99
      0.675403
      0.542180
      0.931071
      0.925258
      0.928022
    
    
      2
      naive-bayes
      0.001:char:8192:[16,16]:0.5
      0.631337
      0.516371
      0.903318
      0.899748
      0.901327
    
    
      0
      blast
      1000
      0.575196
      0.481317
      0.864703
      0.864703
      0.864703
    
    
      6
      uclust
      0.76:0.9:1
      0.475223
      0.385208
      0.861520
      0.856609
      0.858862
    
  








    



/Users/nbokulich/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/statsmodels/stats/multitest.py:320: RuntimeWarning: invalid value encountered in less_equal
  reject = pvals_sorted <= ecdffactor*alpha






    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      2.126819
      3.770042e-02
      NaN
    
    
      vsearch
      -2.952960
      4.538704e-03
      NaN
    
    
      rdp
      -2.952960
      4.538704e-03
      NaN
    
    
      sortmerna
      -1.426218
      1.591661e-01
      NaN
    
    
      blast
      7.154296
      1.613669e-09
      NaN
    
    
      uclust
      8.141534
      3.548490e-11
      NaN
    
    
      naive-bayes-bespoke
      -1.877182
      6.552553e-02
      NaN
    
    
      naive-bayes
      vsearch
      -2.148669
      3.584739e-02
      NaN
    
    
      rdp
      -2.148669
      3.584739e-02
      NaN
    
    
      sortmerna
      -2.138312
      3.671557e-02
      NaN
    
    
      blast
      2.518036
      1.458360e-02
      NaN
    
    
      uclust
      3.429051
      1.121125e-03
      NaN
    
    
      naive-bayes-bespoke
      -2.080895
      4.186940e-02
      NaN
    
    
      vsearch
      rdp
      NaN
      NaN
      NaN
    
    
      sortmerna
      2.531773
      1.408018e-02
      NaN
    
    
      blast
      7.195561
      1.375488e-09
      NaN
    
    
      uclust
      8.153072
      3.394187e-11
      NaN
    
    
      naive-bayes-bespoke
      -1.762370
      8.327568e-02
      NaN
    
    
      rdp
      sortmerna
      2.531773
      1.408018e-02
      NaN
    
    
      blast
      7.195561
      1.375488e-09
      NaN
    
    
      uclust
      8.153072
      3.394187e-11
      NaN
    
    
      naive-bayes-bespoke
      -1.762370
      8.327568e-02
      NaN
    
    
      sortmerna
      blast
      7.159719
      1.580157e-09
      NaN
    
    
      uclust
      8.147637
      3.466015e-11
      NaN
    
    
      naive-bayes-bespoke
      -1.836242
      7.144875e-02
      NaN
    
    
      blast
      uclust
      3.213745
      2.141039e-03
      NaN
    
    
      naive-bayes-bespoke
      -7.149526
      1.643733e-09
      NaN
    
    
      uclust
      naive-bayes-bespoke
      -7.620513
      2.655004e-10
      NaN
    
  








    












    




F-measure






    




gg_13_8_otus level 6






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[32,32]:0.5
      0.774533
      0.829707
      0.966365
      0.913330
      0.936868
    
    
      7
      vsearch
      1:0.99:0.97
      0.558085
      0.587260
      0.800121
      0.722006
      0.756215
    
    
      1
      blast+
      0.001:1:0.51:0.8
      0.543011
      0.590632
      0.781600
      0.699559
      0.734336
    
    
      0
      blast
      1000
      0.541281
      0.588847
      0.781597
      0.699503
      0.734306
    
    
      6
      uclust
      0.76:0.9:1
      0.551996
      0.584026
      0.773162
      0.622255
      0.675783
    
    
      2
      naive-bayes
      0.001:char:8192:[6,6]:0.0
      0.539866
      0.645513
      0.775514
      0.494970
      0.582100
    
    
      4
      rdp
      0.4
      0.508823
      0.597903
      0.781453
      0.487433
      0.579487
    
    
      5
      sortmerna
      0.51:0.9:1:0.9:1.0
      0.514926
      0.555999
      0.609748
      0.537393
      0.567987
    
  








    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      2.088036
      4.635904e-02
      7.639518e-02
    
    
      vsearch
      -1.298109
      2.052297e-01
      2.298573e-01
    
    
      rdp
      2.088857
      4.627924e-02
      7.639518e-02
    
    
      sortmerna
      2.954274
      6.425307e-03
      1.802145e-02
    
    
      blast
      1.440004
      1.613617e-01
      1.964403e-01
    
    
      uclust
      1.502020
      1.446982e-01
      1.964403e-01
    
    
      naive-bayes-bespoke
      -6.356296
      8.320207e-07
      3.328083e-06
    
    
      naive-bayes
      vsearch
      -2.609719
      1.460058e-02
      3.442351e-02
    
    
      rdp
      0.742658
      4.641017e-01
      4.998018e-01
    
    
      sortmerna
      0.274238
      7.859889e-01
      8.150996e-01
    
    
      blast
      -2.087791
      4.638279e-02
      7.639518e-02
    
    
      uclust
      -1.451109
      1.582698e-01
      1.964403e-01
    
    
      naive-bayes-bespoke
      -7.573601
      3.800203e-08
      4.912339e-07
    
    
      vsearch
      rdp
      2.605247
      1.475293e-02
      3.442351e-02
    
    
      sortmerna
      3.612115
      1.222973e-03
      4.280406e-03
    
    
      blast
      1.300049
      2.045736e-01
      2.298573e-01
    
    
      uclust
      2.317250
      2.831299e-02
      6.098182e-02
    
    
      naive-bayes-bespoke
      -6.998645
      1.597891e-07
      1.118524e-06
    
    
      rdp
      sortmerna
      0.216540
      8.301950e-01
      8.301950e-01
    
    
      blast
      -2.088614
      4.630287e-02
      7.639518e-02
    
    
      uclust
      -1.479014
      1.507099e-01
      1.964403e-01
    
    
      naive-bayes-bespoke
      -7.441775
      5.263220e-08
      4.912339e-07
    
    
      sortmerna
      blast
      -2.953578
      6.436231e-03
      1.802145e-02
    
    
      uclust
      -1.482821
      1.497014e-01
      1.964403e-01
    
    
      naive-bayes-bespoke
      -8.154994
      9.279879e-09
      2.598366e-07
    
    
      blast
      uclust
      1.501824
      1.447486e-01
      1.964403e-01
    
    
      naive-bayes-bespoke
      -6.358289
      8.277176e-07
      3.328083e-06
    
    
      uclust
      naive-bayes-bespoke
      -6.821250
      2.508812e-07
      1.404934e-06
    
  








    












    




unite_20.11.2016_clean_fullITS level 6






    







  
    
      
      Method
      Parameters
      Taxon Accuracy Rate
      Taxon Detection Rate
      Precision
      Recall
      F-measure
    
  
  
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[7,7]:0.94
      0.669836
      0.546032
      0.972887
      0.929592
      0.944469
    
    
      5
      sortmerna
      1.0:0.99:5:0.9:1.0
      0.666131
      0.534669
      0.971207
      0.925385
      0.939629
    
    
      4
      rdp
      0.6
      0.658445
      0.556626
      0.941823
      0.925922
      0.932765
    
    
      7
      vsearch
      1:0.75:0.9
      0.661119
      0.554507
      0.934220
      0.925922
      0.929675
    
    
      1
      blast+
      0.001:1:0.99:0.99
      0.675403
      0.542180
      0.931071
      0.925258
      0.928022
    
    
      2
      naive-bayes
      0.001:char:8192:[11,11]:0.98
      0.618735
      0.501348
      0.935384
      0.878948
      0.902295
    
    
      0
      blast
      1000
      0.575196
      0.481317
      0.864703
      0.864703
      0.864703
    
    
      6
      uclust
      0.51:0.9:1
      0.475223
      0.385208
      0.861520
      0.856609
      0.858862
    
  








    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      naive-bayes
      4.870672
      8.955166e-06
      2.089539e-05
    
    
      vsearch
      -1.226348
      2.250235e-01
      2.250235e-01
    
    
      rdp
      -2.585395
      1.226227e-02
      1.907465e-02
    
    
      sortmerna
      -2.266427
      2.717148e-02
      3.170006e-02
    
    
      blast
      7.416245
      5.853776e-10
      2.731762e-09
    
    
      uclust
      8.305885
      1.884853e-11
      3.122788e-10
    
    
      naive-bayes-bespoke
      -2.345272
      2.245560e-02
      2.733725e-02
    
    
      naive-bayes
      vsearch
      -4.758296
      1.338066e-05
      2.881987e-05
    
    
      rdp
      -5.038219
      4.891520e-06
      1.245114e-05
    
    
      sortmerna
      -4.462116
      3.790479e-05
      7.580959e-05
    
    
      blast
      3.342155
      1.459689e-03
      2.404193e-03
    
    
      uclust
      4.163979
      1.051109e-04
      1.839441e-04
    
    
      naive-bayes-bespoke
      -4.261095
      7.565123e-05
      1.412156e-04
    
    
      vsearch
      rdp
      -2.554263
      1.328990e-02
      1.958512e-02
    
    
      sortmerna
      -2.438366
      1.783744e-02
      2.378326e-02
    
    
      blast
      7.325742
      8.310027e-10
      3.324011e-09
    
    
      uclust
      8.156795
      3.345844e-11
      3.122788e-10
    
    
      naive-bayes-bespoke
      -2.393299
      1.995446e-02
      2.539659e-02
    
    
      rdp
      sortmerna
      -2.022239
      4.777186e-02
      4.962695e-02
    
    
      blast
      7.462607
      4.892038e-10
      2.731762e-09
    
    
      uclust
      8.261010
      2.239977e-11
      3.122788e-10
    
    
      naive-bayes-bespoke
      -2.182273
      3.315302e-02
      3.713139e-02
    
    
      sortmerna
      blast
      6.915088
      4.069873e-09
      1.266183e-08
    
    
      uclust
      7.443678
      5.263979e-10
      2.731762e-09
    
    
      naive-bayes-bespoke
      -2.021463
      4.785456e-02
      4.962695e-02
    
    
      blast
      uclust
      2.474556
      1.628602e-02
      2.280042e-02
    
    
      naive-bayes-bespoke
      -6.618927
      1.275875e-08
      3.572450e-08
    
    
      uclust
      naive-bayes-bespoke
      -6.996678
      2.969028e-09
      1.039160e-08

Beta diversity method/parameter comparisons

Principal coordinate analysis offers a neat way to assess the relative performance of multiple methods to reconstruct expected compositions. Methods that cluster with the "expected" composition probably outperform those that appear more distant on a PCoA plot. First, we need to merge biom tables from each method/parameter configuration for each dataset/reference/level combination, so that we can compare each method/parameter as a separate "sample".

Note: if you have added additional methods and are attempting to recompute results, set force=True.



In [33]:

    
merge_expected_and_observed_tables(expected_results_dir, results_dirs, taxonomy_level=6, force=True, 
                                   dataset_ids=list(mock_results.Dataset.unique()), 
                                   reference_ids=list(mock_results.Reference.unique()), 
                                   method_ids=list(mock_results.Method.unique()))

Now we can manually select which table we want to view. This will output a Bray-Curtis PCoA plot, in addition to ANOSIM test results, which indicate whether at least two methods are significantly different from each other.

These plots are useful for visualizing the relative performance of different methods and their configurations relative to each other and to expected compositions, but are primarily a qualitative technique and do not really give us an idea of whether method X actually performs better than method Y.

Note that 2D plots will only appear if you are running notebooks locally. If viewing static notebooks online, make sure you are viewing this notebook in nbviewer. (if viewing on GitHub, just copy the URL and paste into the search bar in nbviewer.)



In [34]:

    
table = join(expected_results_dir, 'mock-18', 'gg_13_8_otus', 'merged_table.biom')
sample_md, results, pc, dm = beta_diversity_pcoa(table, method="braycurtis", dim=2,
                                                 permutations=99, col='method', 
                                                 colormap=color_palette)









    



R =  0.286742732174 ; P =  0.01






    



/home/ben/miniconda3/envs/qiime2-2017.6/lib/python3.5/site-packages/skbio/stats/ordination/_principal_coordinate_analysis.py:111: RuntimeWarning: The result contains negative eigenvalues. Please compare their magnitude with the magnitude of some of the largest positive eigenvalues. If the negative ones are smaller, it's probably safe to ignore them, but if they are large in magnitude, the results won't be useful. See the Notes section for more details. The smallest eigenvalue is -1.2081736462109671 and the largest is 13.524061791272043.
  RuntimeWarning






    





    
        
        Loading BokehJS ...

You can also view all beta diversity plots with a single command, batch_beta_diversity(), but we will only show single dataset examples in these example notebooks.

Average dissimilarity between expected results and observed results for each method

As we already discussed, PCoA plots are good for a qualitative overview, but don't offer much in the way of quantitative comparison. Instead, we can directly compare the Bray-Curtis dissimilarity between methods, and utilize pairwise Mann-Whitney U tests to determine precisely which methods perform better (lower dissimilarity = more accurate classification). In the cell below, we will use distance comparisons to determine:

1) Whether the dissimilarity between taxonomic assignment with different parameters of the same method is greater or less than the dissimilarity between taxonomic assignments with different methods, including the expected composition.
2) which method (averaged across all configurations) most closely reproduces the expected composition.

You can generate boxplots for individual datasets one-by-one with per_method_boxplots(), or for all datasets individually with fastlane_boxplots(). However, here we are most interested in the average performance of methods across each dataset.

The command below violin plots of distribution of distances between expected composition and predicted compositions for each method (all parameter configurations) across all samples/datasets, and pairwise Mann Whitney U tests between these distributions.



In [12]:

    
boxes, best = average_distance_boxplots(expected_results_dir, paired=False,
                                        use_best=False, color_palette=color_palette)









    




gg_13_8_otus






    












    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      blast
      5.123550
      3.750830e-07
      5.834625e-07
    
    
      naive-bayes-bespoke
      21.057707
      3.604650e-93
      3.364340e-92
    
    
      naive-bayes
      -5.447897
      5.422066e-08
      1.002903e-07
    
    
      rdp
      5.876506
      5.603218e-09
      1.206847e-08
    
    
      sortmerna
      4.623602
      4.276776e-06
      5.702368e-06
    
    
      uclust
      0.460602
      6.451502e-01
      6.451502e-01
    
    
      vsearch
      -1.097503
      2.725715e-01
      2.935385e-01
    
    
      blast
      naive-bayes-bespoke
      2.376851
      1.752196e-02
      2.133108e-02
    
    
      naive-bayes
      -6.096060
      1.222337e-09
      3.111402e-09
    
    
      rdp
      -2.224597
      2.672431e-02
      3.117836e-02
    
    
      sortmerna
      -2.829360
      5.003881e-03
      6.368576e-03
    
    
      uclust
      -5.754990
      1.228829e-08
      2.457658e-08
    
    
      vsearch
      -5.466148
      5.730876e-08
      1.002903e-07
    
    
      naive-bayes-bespoke
      naive-bayes
      -42.058853
      0.000000e+00
      0.000000e+00
    
    
      rdp
      -9.228368
      4.752963e-20
      2.218049e-19
    
    
      sortmerna
      -8.506349
      2.700078e-17
      1.080031e-16
    
    
      uclust
      -20.972279
      1.805146e-92
      1.263602e-91
    
    
      vsearch
      -24.920575
      1.297107e-127
      1.815949e-126
    
    
      naive-bayes
      rdp
      9.348021
      1.589160e-20
      8.899297e-20
    
    
      sortmerna
      7.413175
      1.564983e-13
      5.477441e-13
    
    
      uclust
      6.050850
      1.581107e-09
      3.689249e-09
    
    
      vsearch
      4.824764
      1.453539e-06
      2.034954e-06
    
    
      rdp
      sortmerna
      -0.674307
      5.004101e-01
      5.189439e-01
    
    
      uclust
      -6.197996
      8.170584e-10
      2.287763e-09
    
    
      vsearch
      -6.822651
      1.360274e-11
      4.231963e-11
    
    
      sortmerna
      uclust
      -4.920069
      1.014636e-06
      1.495254e-06
    
    
      vsearch
      -5.386507
      8.600462e-08
      1.416547e-07
    
    
      uclust
      vsearch
      -1.648283
      9.947297e-02
      1.114097e-01
    
  








    




unite_20.11.2016_clean_fullITS






    












    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      blast
      9.640919
      1.830050e-21
      3.202588e-21
    
    
      naive-bayes-bespoke
      38.653451
      8.482023e-300
      1.187483e-298
    
    
      naive-bayes
      18.640706
      6.034712e-76
      1.689719e-75
    
    
      rdp
      24.251627
      9.078087e-116
      5.083729e-115
    
    
      sortmerna
      18.640578
      7.000215e-72
      1.781873e-71
    
    
      uclust
      -1.433858
      1.517095e-01
      1.633794e-01
    
    
      vsearch
      9.129660
      1.092766e-19
      1.699857e-19
    
    
      blast
      naive-bayes-bespoke
      -0.962448
      3.358603e-01
      3.482995e-01
    
    
      naive-bayes
      -4.421985
      9.939306e-06
      1.265003e-05
    
    
      rdp
      2.141749
      3.252363e-02
      3.642646e-02
    
    
      sortmerna
      0.162795
      8.707347e-01
      8.707347e-01
    
    
      uclust
      -11.672483
      2.316028e-30
      4.632056e-30
    
    
      vsearch
      -7.488710
      9.886499e-14
      1.384110e-13
    
    
      naive-bayes-bespoke
      naive-bayes
      -21.953107
      7.022254e-105
      3.277052e-104
    
    
      rdp
      6.359202
      2.155121e-10
      2.873495e-10
    
    
      sortmerna
      2.174683
      2.968796e-02
      3.463596e-02
    
    
      uclust
      -42.716667
      0.000000e+00
      0.000000e+00
    
    
      vsearch
      -29.307596
      5.548754e-180
      5.178838e-179
    
    
      naive-bayes
      rdp
      13.291442
      7.827892e-40
      1.686008e-39
    
    
      sortmerna
      8.948023
      4.591237e-19
      6.766033e-19
    
    
      uclust
      -20.969526
      4.381587e-95
      1.533556e-94
    
    
      vsearch
      -9.434689
      4.964277e-21
      8.176456e-21
    
    
      rdp
      sortmerna
      -3.054186
      2.309178e-03
      2.811173e-03
    
    
      uclust
      -28.723443
      4.156486e-155
      2.909540e-154
    
    
      vsearch
      -19.882069
      2.013703e-82
      6.264853e-82
    
    
      sortmerna
      uclust
      -22.252051
      1.168845e-98
      4.675380e-98
    
    
      vsearch
      -14.571551
      2.558118e-46
      5.968942e-46
    
    
      uclust
      vsearch
      11.379295
      1.595565e-29
      2.978388e-29
    
  








    





<matplotlib.figure.Figure at 0x11b67aa20>



In [35]:

    
for k, v in boxes.items():
    v.get_figure().savefig(join(outdir, 'mock-nonopt-distance-{0}-boxplots.pdf'.format(k)))

Average distance between expected results and observed results for each method with optimized parameters

Reports the top-performing parameter configuration for each method, violin plots of distribution of distances between expected composition and predicted compositions for the top parameter for each method across all samples/datasets, and pairwise paired Wilcoxon signed rank tests between these distributions.



In [7]:

    
boxes, best = average_distance_boxplots(expected_results_dir, paired=False,
                                        color_palette=color_palette)









    




gg_13_8_otus






    







  
    
      
      method
      params
      distance
    
  
  
    
      6
      uclust
      0.76:0.9:1
      0.697222
    
    
      7
      vsearch
      1:0.99:0.99
      0.675437
    
    
      0
      blast
      1000
      0.674382
    
    
      4
      rdp
      0.3
      0.671381
    
    
      5
      sortmerna
      0.51:0.99:1:0.9:1.0
      0.671113
    
    
      1
      blast+
      0.001:1:0.99:0.8
      0.666258
    
    
      2
      naive-bayes
      0.001:char:8192:[6,6]:0.0
      0.655735
    
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[9,9]:0.0
      0.467257
    
  








    












    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      blast
      -0.226415
      8.217339e-01
      9.923969e-01
    
    
      naive-bayes-bespoke
      6.532120
      2.365461e-08
      9.461843e-08
    
    
      naive-bayes
      0.314748
      7.541657e-01
      9.923969e-01
    
    
      rdp
      -0.156043
      8.765806e-01
      9.923969e-01
    
    
      sortmerna
      -0.155769
      8.767957e-01
      9.923969e-01
    
    
      uclust
      -0.919897
      3.617176e-01
      9.923969e-01
    
    
      vsearch
      -0.258380
      7.970958e-01
      9.923969e-01
    
    
      blast
      naive-bayes-bespoke
      6.708680
      1.224007e-08
      5.712032e-08
    
    
      naive-bayes
      0.551557
      5.835275e-01
      9.923969e-01
    
    
      rdp
      0.090361
      9.283350e-01
      9.923969e-01
    
    
      sortmerna
      0.103580
      9.178865e-01
      9.923969e-01
    
    
      uclust
      -0.671160
      5.049795e-01
      9.923969e-01
    
    
      vsearch
      -0.029409
      9.766465e-01
      9.923969e-01
    
    
      naive-bayes-bespoke
      naive-bayes
      -6.732156
      1.121256e-08
      5.712032e-08
    
    
      rdp
      -7.485316
      6.713150e-10
      6.265607e-09
    
    
      sortmerna
      -8.076811
      7.417049e-11
      1.038387e-09
    
    
      uclust
      -8.136249
      5.949654e-11
      1.038387e-09
    
    
      vsearch
      -6.834442
      7.651112e-09
      5.355778e-08
    
    
      naive-bayes
      rdp
      -0.512120
      6.106552e-01
      9.923969e-01
    
    
      sortmerna
      -0.534798
      5.949852e-01
      9.923969e-01
    
    
      uclust
      -1.319438
      1.925898e-01
      6.740644e-01
    
    
      vsearch
      -0.589343
      5.580890e-01
      9.923969e-01
    
    
      rdp
      sortmerna
      0.009573
      9.923969e-01
      9.923969e-01
    
    
      uclust
      -0.839043
      4.051442e-01
      9.923969e-01
    
    
      vsearch
      -0.123561
      9.021217e-01
      9.923969e-01
    
    
      sortmerna
      uclust
      -0.899840
      3.722010e-01
      9.923969e-01
    
    
      vsearch
      -0.138786
      8.901355e-01
      9.923969e-01
    
    
      uclust
      vsearch
      0.647288
      5.201881e-01
      9.923969e-01
    
  








    




unite_20.11.2016_clean_fullITS






    







  
    
      
      method
      params
      distance
    
  
  
    
      6
      uclust
      1.0:0.9:1
      0.599239
    
    
      0
      blast
      1000
      0.586838
    
    
      2
      naive-bayes
      0.001:char:8192:[16,16]:0.7
      0.574774
    
    
      7
      vsearch
      1:0.99:0.99
      0.549004
    
    
      5
      sortmerna
      1.0:0.99:5:0.9:1.0
      0.548545
    
    
      4
      rdp
      0.3
      0.548401
    
    
      3
      naive-bayes-bespoke
      0.001:prior:char:8192:[8,8]:0.0
      0.546291
    
    
      1
      blast+
      0.001:1:0.99:0.99
      0.543117
    
  








    












    







  
    
      
      
      stat
      P
      FDR P
    
    
      Method A
      Method B
      
      
      
    
  
  
    
      blast+
      blast
      -2.437728
      0.016271
      0.068501
    
    
      naive-bayes-bespoke
      -0.173448
      0.862599
      0.994093
    
    
      naive-bayes
      -1.378974
      0.170533
      0.434085
    
    
      rdp
      -0.277334
      0.782009
      0.994093
    
    
      sortmerna
      -0.277090
      0.782196
      0.994093
    
    
      uclust
      -2.998939
      0.003305
      0.046270
    
    
      vsearch
      -0.306423
      0.759823
      0.994093
    
    
      blast
      naive-bayes-bespoke
      2.418521
      0.017125
      0.068501
    
    
      naive-bayes
      0.554515
      0.580285
      0.902665
    
    
      rdp
      2.182101
      0.031084
      0.104714
    
    
      sortmerna
      2.105168
      0.037398
      0.104714
    
    
      uclust
      -0.719140
      0.473477
      0.779844
    
    
      vsearch
      2.127227
      0.035483
      0.104714
    
    
      naive-bayes-bespoke
      naive-bayes
      -1.287859
      0.200358
      0.467503
    
    
      rdp
      -0.117415
      0.906732
      0.994093
    
    
      sortmerna
      -0.121500
      0.903504
      0.994093
    
    
      uclust
      -3.006743
      0.003233
      0.046270
    
    
      vsearch
      -0.149526
      0.881396
      0.994093
    
    
      naive-bayes
      rdp
      1.161629
      0.247750
      0.485216
    
    
      sortmerna
      1.132521
      0.259731
      0.485216
    
    
      uclust
      -1.091567
      0.277266
      0.485216
    
    
      vsearch
      1.128359
      0.261477
      0.485216
    
    
      rdp
      sortmerna
      -0.007419
      0.994093
      0.994093
    
    
      uclust
      -2.761886
      0.006667
      0.047930
    
    
      vsearch
      -0.031868
      0.974631
      0.994093
    
    
      sortmerna
      uclust
      -2.673968
      0.008559
      0.047930
    
    
      vsearch
      -0.023629
      0.981189
      0.994093
    
    
      uclust
      vsearch
      2.705077
      0.007840
      0.047930
    
  








    





<matplotlib.figure.Figure at 0x11b599c18>



In [38]:

    
for k, v in boxes.items():
    v.get_figure().savefig(join(outdir, 'mock-opt-distance-{0}-boxplots.pdf'.format(k)))



In [24]:

    
a = best['gg_13_8_otus'].groupby(['method', 'params']).mean()
a = a.sort_values('distance')
a[a['distance'] < 0.51]









    Out[24]:







  
    
      
      
      distance
    
    
      method
      params
      
    
  
  
    
      naive-bayes-bespoke
      0.001:prior:char:8192:[9,9]:0.0
      0.467257
    
    
      0.001:prior:char:8192:[6,6]:0.0
      0.467649
    
    
      0.001:prior:char:8192:[10,10]:0.0
      0.468700
    
    
      0.001:prior:char:8192:[7,7]:0.0
      0.469980
    
    
      0.001:prior:char:8192:[9,9]:0.5
      0.470198
    
    
      0.001:prior:char:8192:[11,11]:0.0
      0.470591
    
    
      0.001:prior:char:8192:[7,7]:0.5
      0.470635
    
    
      0.001:prior:char:8192:[8,8]:0.0
      0.471006
    
    
      0.001:prior:char:8192:[6,6]:0.5
      0.471651
    
    
      0.001:prior:char:8192:[8,8]:0.5
      0.472379
    
    
      0.001:prior:char:8192:[11,11]:0.5
      0.476747
    
    
      0.001:prior:char:8192:[10,10]:0.5
      0.477313
    
    
      0.001:prior:char:8192:[16,16]:0.0
      0.488763
    
    
      0.001:prior:char:8192:[16,16]:0.5
      0.490282
    
    
      0.001:prior:char:8192:[14,14]:0.0
      0.498809
    
    
      0.001:prior:char:8192:[14,14]:0.5
      0.502069
    
    
      0.001:prior:char:8192:[12,12]:0.0
      0.503721
    
    
      0.001:prior:char:8192:[12,12]:0.5
      0.505855



In [ ]:

	Reference	Variable	2	3	4	5	6
0	gg_13_8_otus	Precision	1.507747e-47	1.854806e-77	9.910988e-51	3.609426e-65	2.406841e-310
1	gg_13_8_otus	Recall	3.113942e-51	6.712403e-242	1.789303e-273	0.000000e+00	4.778745e-205
2	gg_13_8_otus	F-measure	2.755966e-50	1.397661e-242	3.850779e-272	0.000000e+00	5.978456e-211
3	gg_13_8_otus	Taxon Accuracy Rate	5.037725e-12	4.870274e-76	8.147172e-60	2.475215e-302	0.000000e+00
4	gg_13_8_otus	Taxon Detection Rate	1.326490e-30	4.209898e-31	3.360070e-19	3.921844e-241	0.000000e+00
5	unite_20.11.2016_clean_fullITS	Precision	3.547503e-36	2.160250e-58	2.998770e-96	3.664661e-100	2.377685e-268
6	unite_20.11.2016_clean_fullITS	Recall	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
7	unite_20.11.2016_clean_fullITS	F-measure	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
8	unite_20.11.2016_clean_fullITS	Taxon Accuracy Rate	1.173900e-161	8.258546e-169	1.677852e-191	1.505928e-226	0.000000e+00
9	unite_20.11.2016_clean_fullITS	Taxon Detection Rate	1.489817e-91	3.993707e-132	2.571375e-147	8.253999e-176	0.000000e+00

	Level	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
Method
blast	5	1.0	0.968912	0.984211	0.750000	0.727273
blast+	5	1.0	0.849341	0.897696	0.583333	0.538462
naive-bayes	5	1.0	0.947026	0.951923	0.666667	0.636364
naive-bayes-bespoke	5	1.0	1.000000	1.000000	0.771429	0.636364
rdp	5	1.0	0.999990	0.999990	0.750000	0.727273
sortmerna	5	1.0	0.956708	0.972085	0.714286	0.636364
uclust	5	1.0	0.713621	0.810810	0.636364	0.542857
vsearch	5	1.0	0.894203	0.932479	0.628571	0.545455

	Level	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
Method
blast	0.0	0.091380	0.113460	0.099558	0.144590	0.189215
blast+	0.0	0.321829	0.357214	0.349524	0.250316	0.263426
naive-bayes	0.0	0.350885	0.394216	0.378837	0.294774	0.299700
naive-bayes-bespoke	0.0	0.209343	0.274806	0.258039	0.213479	0.272208
rdp	0.0	0.097528	0.186111	0.152516	0.166270	0.207367
sortmerna	0.0	0.124508	0.195748	0.164674	0.169572	0.187481
uclust	0.0	0.068539	0.313915	0.309204	0.174172	0.212127
vsearch	0.0	0.095198	0.301211	0.266225	0.181140	0.215745

	Level	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
Method
blast	6	0.911808	0.898308	0.905669	0.578947	0.545455
blast+	6	0.885315	0.406352	0.542705	0.410256	0.363636
naive-bayes	6	0.963514	0.766478	0.811590	0.520000	0.454545
naive-bayes-bespoke	6	1.000000	0.942217	0.948357	0.692308	0.636364
rdp	6	0.992066	0.947026	0.948357	0.600000	0.578947
sortmerna	6	0.995403	0.927770	0.932479	0.571429	0.460000
uclust	6	0.910492	0.289107	0.415764	0.440000	0.363636
vsearch	6	0.992308	0.458496	0.580192	0.428571	0.363636

	Level	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
Method
blast	0.0	0.203380	0.226256	0.215324	0.127848	0.148844
blast+	0.0	0.405526	0.391825	0.388711	0.241750	0.230588
naive-bayes	0.0	0.361718	0.404387	0.390941	0.255675	0.236117
naive-bayes-bespoke	0.0	0.247967	0.330187	0.299690	0.207314	0.251360
rdp	0.0	0.177257	0.305923	0.267272	0.178491	0.168614
sortmerna	0.0	0.249246	0.313537	0.284816	0.168591	0.164231
uclust	0.0	0.185041	0.293794	0.291339	0.170592	0.168700
vsearch	0.0	0.202364	0.369768	0.341499	0.202229	0.197371

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
4422	naive-bayes-bespoke	0.001:prior:char:8192:[16,16]:0.5	0.985273	0.817671	0.893681	0.860000	0.796296
1662	naive-bayes	0.001:char:8192:[18,18]:0.5	0.978409	0.800016	0.880266	0.679245	0.666667
2654	rdp	0.4	0.937163	0.705506	0.805000	0.603774	0.592593
3284	uclust	0.51:0.97:3	0.662203	0.509471	0.575883	0.551020	0.500000
2054	vsearch	1:0.99:0.99	0.669105	0.494585	0.568759	0.565217	0.481481
2789	sortmerna	0.51:0.99:1:0.9:1.0	0.659963	0.413790	0.508657	0.511111	0.425926
299	blast+	0.001:1:0.51:0.8	0.624731	0.422299	0.503946	0.510638	0.444444
2879	blast	1e-10	0.624731	0.422299	0.503946	0.510638	0.444444

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
18993	naive-bayes-bespoke	0.001:prior:char:8192:[8,8]:0.0	1.000000	1.000000	1.000000	0.777778	0.636364
13233	vsearch	1:0.75:0.97	1.000000	1.000000	1.000000	0.777778	0.636364
6033	blast+	0.001:1:0.99:0.99	1.000000	1.000000	1.000000	0.777778	0.636364
11313	naive-bayes	0.001:char:8192:[14,14]:0.0	1.000000	1.000000	1.000000	0.777778	0.636364
16173	sortmerna	0.51:0.99:5:0.9:1.0	1.000000	1.000000	1.000000	0.777778	0.636364
15753	rdp	0.2	1.000000	1.000000	1.000000	0.777778	0.636364
16413	blast	1e-10	0.984163	0.984163	0.984163	0.666667	0.545455
17973	uclust	0.76:0.9:1	0.984163	0.984163	0.984163	0.666667	0.545455

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
28647	naive-bayes-bespoke	0.001:prior:char:8192:[16,16]:0.0	0.979330	0.823178	0.894490	0.840000	0.777778
26187	naive-bayes	0.001:char:8192:[18,18]:0.5	0.936550	0.756941	0.837221	0.705882	0.666667
27194	rdp	0.3	0.862963	0.668879	0.753626	0.627451	0.592593
27779	uclust	0.51:0.9:3	0.752492	0.466084	0.575630	0.571429	0.444444
27359	sortmerna	0.51:0.9:1:0.9:1.0	0.670887	0.496033	0.570360	0.545455	0.444444
24554	blast+	0.001:1:0.51:0.97	0.640563	0.500886	0.562179	0.543478	0.462963
27404	blast	1e-10	0.640563	0.500886	0.562179	0.543478	0.462963
26684	vsearch	1:0.99:0.8	0.625602	0.456603	0.527907	0.577778	0.481481

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
30704	naive-bayes-bespoke	0.001:prior:char:8192:[10,10]:0.0	1.000000	0.905037	0.950152	0.725000	0.783784
29774	naive-bayes	0.001:char:8192:[9,9]:0.5	1.000000	0.862307	0.926063	0.555556	0.675676
30304	rdp	0.7	1.000000	0.779486	0.876080	0.511628	0.594595
30129	vsearch	10:0.51:0.99	0.549971	0.476456	0.510581	0.487805	0.540541
30459	uclust	1.0:0.9:1	0.523875	0.431704	0.473344	0.525000	0.567568
30344	sortmerna	1.0:0.9:1:0.9:1.0	0.494340	0.391467	0.436930	0.485714	0.459459
29504	blast+	0.001:100:0.51:0.99	0.508952	0.273960	0.356189	0.463415	0.513514
30384	blast	1000	0.350540	0.288866	0.316729	0.488372	0.567568

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
36102	naive-bayes-bespoke	0.001:prior:char:8192:[12,12]:0.5	1.000000	1.000000	1.000000	0.950000	0.95
35244	uclust	0.51:0.9:1	0.988422	0.953779	0.970792	0.769231	0.50
31564	blast+	0.001:1:0.75:0.97	0.945743	0.912596	0.928874	0.692308	0.45
34884	blast	1000	0.945743	0.912596	0.928874	0.692308	0.45
34001	vsearch	1:0.51:0.9	0.949057	0.896317	0.921933	0.705882	0.60
34821	sortmerna	0.51:0.99:5:0.9:1.0	0.914761	0.328205	0.483085	0.555556	0.50
34663	rdp	0.8	0.915142	0.304494	0.456948	0.538462	0.35
32683	naive-bayes	0.001:char:8192:[8,8]:0.0	0.907022	0.304494	0.455929	0.538462	0.35

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
41518	naive-bayes-bespoke	0.001:prior:char:8192:[7,7]:0.94	0.997564	0.522668	0.685941	0.461538	0.500000
40378	sortmerna	1.0:0.99:5:0.9:1.0	1.000000	0.435936	0.607180	0.416667	0.416667
40003	vsearch	100:0.75:0.97	0.642512	0.435936	0.519439	0.428571	0.500000
40198	rdp	0.6	0.576611	0.441873	0.500330	0.352941	0.500000
37678	blast+	0.001:1:0.51:0.99	0.476931	0.435936	0.455513	0.428571	0.500000
40543	uclust	0.76:0.97:3	1.000000	0.289107	0.448539	0.333333	0.166667
38653	naive-bayes	0.001:char:8192:[11,11]:0.96	0.574485	0.289107	0.384644	0.250000	0.250000
40483	blast	1e-10	0.346096	0.346096	0.346096	0.333333	0.416667

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
43644	naive-bayes-bespoke	0.001:prior:char:8192:[11,11]:0.92	0.999958	0.999906	0.999932	0.629630	0.894737
43469	blast	1e-10	0.777240	0.736385	0.756261	0.444444	0.631579
42544	blast+	0.001:1:0.99:0.97	0.777240	0.736385	0.756261	0.444444	0.631579
42764	naive-bayes	0.001:char:8192:[6,6]:0.0	0.945478	0.628949	0.755396	0.288889	0.684211
43374	rdp	0.6	0.945470	0.626301	0.753480	0.315789	0.631579
43239	vsearch	1:0.75:0.8	0.745446	0.706217	0.725301	0.480000	0.631579
43549	uclust	0.51:0.99:3	0.744739	0.703471	0.723517	0.458333	0.578947
43424	sortmerna	1.0:0.9:1:0.9:1.0	0.742659	0.703551	0.722577	0.392857	0.578947

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
45609	naive-bayes-bespoke	0.001:prior:char:8192:[6,6]:0.5	1.000000	0.942504	0.970401	0.800000	0.800000
44589	naive-bayes	0.001:char:8192:[12,12]:0.5	0.942771	0.888565	0.914866	0.578947	0.733333
44174	blast+	0.001:1:0.99:0.8	0.866472	0.816653	0.840825	0.500000	0.666667
45099	blast	1000	0.866472	0.816653	0.840825	0.526316	0.666667
45049	rdp	0.2	0.820735	0.773546	0.796442	0.523810	0.733333
45234	uclust	0.76:0.9:1	0.819872	0.772732	0.795605	0.647059	0.733333
44989	vsearch	1:0.51:0.99	0.743117	0.700391	0.721122	0.600000	0.600000
45059	sortmerna	1.0:0.9:1:0.9:1.0	0.743117	0.700391	0.721122	0.450000	0.600000

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
50417	naive-bayes-bespoke	0.001:prior:char:8192:[32,32]:0.5	0.999752	0.999752	0.999752	0.863636	0.95
49898	uclust	1.0:0.9:1	0.987503	0.955909	0.971449	0.538462	0.70
46178	blast+	0.001:1:0.51:0.8	0.946067	0.918123	0.931886	0.560000	0.70
49618	blast	1e-10	0.946067	0.918123	0.931886	0.538462	0.70
49036	vsearch	1:0.51:0.97	0.955963	0.906693	0.930676	0.619048	0.65
49516	sortmerna	0.51:0.9:5:0.9:1.0	0.937503	0.406352	0.566960	0.521739	0.60
47596	naive-bayes	0.001:char:8192:[32,32]:0.98	0.895793	0.378540	0.532190	0.434783	0.50
49299	rdp	0.5	0.908339	0.285428	0.434365	0.464286	0.65

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
53139	vsearch	100:0.51:0.9	1.000000	0.889646	0.941601	0.095238	0.250
52329	blast+	0.001:1:0.51:0.97	1.000000	0.889646	0.941601	0.210526	0.500
53184	rdp	0.6	1.000000	0.889646	0.941601	0.285714	0.750
53239	sortmerna	1.0:0.99:1:0.9:1.0	1.000000	0.889646	0.941601	0.400000	0.500
53284	uclust	0.76:0.99:1	1.000000	0.889646	0.941601	0.333333	0.375
52814	naive-bayes	0.001:char:8192:[8,8]:0.96	1.000000	0.889646	0.941601	0.227273	0.625
53644	naive-bayes-bespoke	0.001:prior:char:8192:[11,11]:0.96	1.000000	0.889646	0.941601	0.315789	0.750
53279	blast	1e-10	0.889646	0.889646	0.889646	0.227273	0.625

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
55199	naive-bayes-bespoke	0.001:prior:char:8192:[14,14]:0.0	0.999990	0.999990	0.999990	0.692308	0.947368
54909	blast	1000	0.826671	0.735173	0.778242	0.413793	0.631579
54079	blast+	0.001:1:0.75:0.97	0.826671	0.735173	0.778242	0.428571	0.631579
54689	vsearch	1:0.51:0.9	0.825662	0.734231	0.777267	0.428571	0.631579
54994	uclust	0.51:0.99:3	0.813507	0.676203	0.738528	0.423077	0.578947
54884	sortmerna	0.51:0.99:1:0.9:1.0	0.760641	0.676426	0.716066	0.379310	0.578947
54209	naive-bayes	0.001:char:8192:[6,6]:0.0	0.774044	0.650819	0.707103	0.279070	0.631579
54834	rdp	0.5	0.757617	0.593107	0.665344	0.268293	0.578947

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
60227	naive-bayes-bespoke	0.001:prior:char:8192:[32,32]:0.5	0.998426	0.998426	0.998426	0.826087	0.95
55628	blast+	0.001:1:0.51:0.97	0.956439	0.936893	0.946565	0.518519	0.70
59428	blast	1e-10	0.956405	0.936133	0.946160	0.518519	0.70
58526	vsearch	1:0.51:0.9	0.958658	0.925608	0.941843	0.560000	0.70
59368	sortmerna	0.51:0.9:1:0.9:1.0	0.947126	0.927770	0.937348	0.464286	0.65
59808	uclust	1.0:0.99:1	0.950255	0.921753	0.935787	0.481481	0.65
56946	naive-bayes	0.001:char:8192:[4,4]:0.0	0.840326	0.427025	0.566284	0.520000	0.65
59049	rdp	0.6	0.903279	0.265198	0.410017	0.481481	0.65

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
127529	naive-bayes-bespoke	0.001:prior:char:8192:[8,8]:0.98	1.0	1.0	1.0	0.700000	0.636364
89107	naive-bayes	0.001:char:8192:[6,6]:0.5	1.0	1.0	1.0	0.600000	0.272727
89529	vsearch	1:0.99:0.99	1.0	1.0	1.0	0.750000	0.545455
99729	sortmerna	0.51:0.99:5:0.9:1.0	1.0	1.0	1.0	0.777778	0.636364
100506	blast	1e-10	1.0	1.0	1.0	0.500000	0.363636
98329	rdp	0.2	1.0	1.0	1.0	0.700000	0.636364
105704	uclust	0.76:0.9:1	1.0	1.0	1.0	0.400000	0.181818
63520	blast+	0.001:1:0.99:0.97	1.0	1.0	1.0	0.666667	0.363636

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
128139	naive-bayes	0.001:char:8192:[11,11]:0.7	0.996677	0.954140	0.974945	0.309524	0.65
128404	rdp	0.7	0.996674	0.953750	0.974739	0.292683	0.60
128799	naive-bayes-bespoke	0.001:prior:char:8192:[7,7]:0.5	0.957738	0.957738	0.957738	0.393939	0.65
128569	uclust	0.51:0.99:3	0.954174	0.953804	0.953989	0.324324	0.60
128289	vsearch	1:0.75:0.99	0.953804	0.953804	0.953804	0.315789	0.60
127599	blast+	0.001:100:0.51:0.8	0.991209	0.856243	0.918796	0.277778	0.50
128469	sortmerna	0.51:0.99:5:0.9:1.0	0.949030	0.859894	0.902266	0.315789	0.60
128484	blast	1000	0.911808	0.477596	0.626853	0.272727	0.45

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
130669	naive-bayes-bespoke	0.001:prior:char:8192:[6,6]:0.98	0.758964	0.643289	0.696355	0.722222	0.866667
130194	uclust	1.0:0.9:1	0.829256	0.534118	0.649742	0.666667	0.800000
129719	naive-bayes	0.001:char:8192:[18,18]:0.5	0.686593	0.599645	0.640180	0.571429	0.800000
129924	vsearch	1:0.75:0.99	0.755874	0.486854	0.592246	0.625000	0.666667
129189	blast+	0.001:1:0.51:0.99	0.617034	0.569362	0.592240	0.500000	0.800000
130119	blast	1000	0.597664	0.569362	0.583170	0.500000	0.800000
130074	rdp	1.0	0.905908	0.428101	0.581436	0.500000	0.600000
130084	sortmerna	1.0:0.99:1:0.9:1.0	0.606123	0.503654	0.550158	0.476190	0.666667

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
134968	naive-bayes-bespoke	0.001:prior:char:8192:[11,11]:0.98	0.658864	0.384047	0.485247	0.500000	0.416667
133573	sortmerna	1.0:0.99:5:0.9:1.0	0.658864	0.384047	0.485247	0.416667	0.416667
133393	rdp	0.6	0.505002	0.388435	0.439114	0.375000	0.500000
132913	vsearch	100:0.51:0.97	0.489466	0.384047	0.430395	0.428571	0.500000
133739	uclust	0.76:0.97:3	0.736017	0.277288	0.402818	0.333333	0.166667
130873	blast+	0.001:1:0.51:0.99	0.418402	0.384047	0.400489	0.428571	0.500000
131504	naive-bayes	0.001:char:8192:[16,16]:0.9	0.440021	0.277288	0.340196	0.272727	0.250000
133679	blast	1e-10	0.277288	0.277288	0.277288	0.312500	0.416667

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
135934	naive-bayes	0.001:char:8192:[11,11]:0.0	0.801097	0.659208	0.723259	0.469388	0.621622
136964	naive-bayes-bespoke	0.001:prior:char:8192:[12,12]:0.5	0.911128	0.555923	0.690524	0.608696	0.756757
136584	rdp	0.5	0.748552	0.468953	0.576648	0.428571	0.567568
136514	vsearch	1:0.99:0.9	0.483693	0.311953	0.379288	0.444444	0.540541
136624	sortmerna	1.0:0.99:1:0.9:1.0	0.482285	0.302594	0.371870	0.466667	0.567568
136764	uclust	0.51:0.9:5	0.414754	0.272723	0.329067	0.326087	0.405405
135779	blast+	0.001:100:0.51:0.99	0.347856	0.131147	0.190480	0.361702	0.459459
136659	blast	1000	0.212792	0.131297	0.162394	0.446809	0.567568

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
138924	naive-bayes-bespoke	0.001:prior:char:8192:[6,6]:0.7	0.895613	0.877584	0.886507	0.547945	0.80
137759	naive-bayes	0.001:char:8192:[32,32]:0.5	0.809916	0.755805	0.781925	0.392405	0.62
137359	blast+	0.001:1:0.75:0.8	0.777234	0.777234	0.777234	0.421053	0.64
138294	blast	1000	0.777234	0.777234	0.777234	0.421053	0.64
138394	uclust	1.0:0.99:1	0.790336	0.734302	0.761289	0.416667	0.60
138264	sortmerna	1.0:0.99:5:0.9:1.0	0.891800	0.626707	0.736114	0.323944	0.46
138134	vsearch	100:0.75:0.99	0.855187	0.632913	0.727449	0.338028	0.48
138204	rdp	0.6	0.742193	0.680423	0.709967	0.367089	0.58

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
140249	naive-bayes-bespoke	0.001:prior:char:8192:[10,10]:0.0	1.000000	1.000000	1.000000	0.772727	0.894737
139099	blast+	0.001:1:0.75:0.97	0.838606	0.784018	0.810394	0.545455	0.631579
139934	blast	1e-10	0.838606	0.784018	0.810394	0.545455	0.631579
139789	vsearch	1:0.51:0.97	0.828898	0.774943	0.801013	0.545455	0.631579
140014	uclust	0.51:0.99:3	0.936591	0.679946	0.787896	0.545455	0.631579
139889	sortmerna	1.0:0.9:1:0.9:1.0	0.755563	0.706381	0.730144	0.500000	0.578947
139229	naive-bayes	0.001:char:8192:[6,6]:0.0	0.729441	0.621587	0.671209	0.521739	0.631579
139859	rdp	0.4	0.705767	0.553025	0.620129	0.458333	0.578947

	Method	Parameters	Precision	Recall	F-measure	Taxon Accuracy Rate	Taxon Detection Rate
141864	naive-bayes-bespoke	0.001:prior:char:8192:[12,12]:0.94	1.000000	1.000000	1.000000	0.761905	0.842105
140714	blast+	0.001:1:0.75:0.99	0.822001	0.798174	0.809912	0.550000	0.578947
141564	blast	1000	0.822001	0.798174	0.809912	0.578947	0.578947
141454	vsearch	1:0.51:0.99	0.788340	0.765489	0.776746	0.550000	0.578947
141539	sortmerna	0.51:0.99:1:0.9:1.0	0.785973	0.763191	0.774414	0.500000	0.526316
141639	uclust	1.0:0.9:1	0.780039	0.760019	0.769899	0.473684	0.473684
140864	naive-bayes	0.001:char:8192:[6,6]:0.0	0.955869	0.552149	0.699968	0.523810	0.578947
141494	rdp	0.4	0.955693	0.549850	0.698071	0.476190	0.526316

	F-measure	Precision	Recall	Taxon Accuracy Rate	Taxon Detection Rate
0.001:1:0.99:0.99	85.0	61	86.0	85.0	85.0
0.001:1:0.99:0.97	85.0	60	86.0	83.0	86.0
0.001:1:0.99:0.8	85.0	60	86.0	83.0	86.0
0.001:1:0.75:0.99	85.0	61	86.0	85.0	85.0
0.001:1:0.75:0.97	85.0	60	86.0	83.0	86.0

	F-measure	Precision	Recall	Taxon Accuracy Rate	Taxon Detection Rate
0.001:char:8192:[6,6]:0.5	80.0	66	75.0	73.0	71.0
0.001:char:8192:[6,6]:0.0	80.0	63	80.0	80.0	83.0
0.001:char:8192:[7,7]:0.5	80.0	66	78.0	77.0	76.0
0.001:char:8192:[8,8]:0.0	79.0	62	79.0	81.0	87.0
0.001:char:8192:[9,9]:0.5	78.0	65	77.0	74.0	76.0

	F-measure	Precision	Recall	Taxon Accuracy Rate	Taxon Detection Rate
1:0.75:0.9	83	58	87	86.0	87.0
1:0.51:0.9	83	58	87	86.0	87.0
1:0.99:0.9	83	58	87	86.0	87.0
1:0.75:0.99	80	58	87	84.0	77.0
1:0.51:0.97	80	58	87	83.0	77.0

	F-measure	Precision	Recall	Taxon Accuracy Rate	Taxon Detection Rate
0.6	85	63	85	83	85
0.5	82	57	86	86	87
0.4	80	56	87	87	87
0.7	80	62	80	72	81
0.0	77	52	87	82	87

	F-measure	Precision	Recall	Taxon Accuracy Rate	Taxon Detection Rate
0.76:0.9:1	71	39	81	66.0	85
1.0:0.9:1	71	39	81	66.0	85
0.51:0.9:1	71	39	81	66.0	85
1.0:0.97:3	40	35	23	40.0	31
0.76:0.97:3	40	35	23	40.0	31