This notebook demonstrates how to evaluate classification accuracy of "cross-validated" simulated communities. Due to the unique nature of this analysis, the metrics that we use to evaluate classification accuracy are different from those used for mock.
The key measure here is rate of match
vs. overclassification
, hence P/R/F are not useful metrics. Instead, we define and measure the following as percentages:
Where L
= taxonomic level being tested
In [1]:
from tax_credit.framework_functions import (novel_taxa_classification_evaluation,
extract_per_level_accuracy)
from tax_credit.eval_framework import parameter_comparisons
from tax_credit.plotting_functions import (pointplot_from_data_frame,
heatmap_from_data_frame,
per_level_kruskal_wallis,
rank_optimized_method_performance_by_dataset)
import seaborn.xkcd_rgb as colors
import pandas as pd
from os.path import expandvars, join, exists
from glob import glob
from IPython.display import display, Markdown
In [2]:
project_dir = expandvars("../../")
analysis_name = "cross-validated"
precomputed_results_dir = join(project_dir, "data", "precomputed-results", analysis_name)
expected_results_dir = join(project_dir, "data", analysis_name)
summary_fp = join(precomputed_results_dir, 'evaluate_classification_summary.csv')
results_dirs = glob(join(precomputed_results_dir, '*', '*', '*', '*'))
# we can save plots in this directory
outdir = expandvars("../../plots/")
This cell performs the classification evaluation and should not be modified.
In [4]:
force = True
if force or not exists(summary_fp):
accuracy_results = novel_taxa_classification_evaluation(results_dirs, expected_results_dir,
summary_fp, test_type='cross-validated')
else:
accuracy_results = pd.DataFrame.from_csv(summary_fp)
Finally, we plot our results. Line plots show the mean +/- 95% confidence interval for each classification result at each taxonomic level (1 = phylum, 6 = species) in each dataset tested. Do not modify the cell below, except to adjust the color_palette used for plotting. This palette can be a dictionary of colors for each group, as shown below, or a seaborn color palette.
match_ratio = proportion of correct matches.
underclassification_ratio = proportion of assignments to correct lineage but to a lower level than expected.
misclassification_ratio = proportion of assignments to an incorrect lineage.
In [5]:
color_palette={
'expected': 'black', 'rdp': colors['baby shit green'], 'sortmerna': colors['macaroni and cheese'],
'uclust': 'coral', 'blast': 'indigo', 'blast+': colors['electric purple'], 'naive-bayes': 'dodgerblue',
'naive-bayes-bespoke': 'blue', 'vsearch': 'firebrick'
}
level_results = extract_per_level_accuracy(accuracy_results)
y_vars = ['Precision', 'Recall', 'F-measure']
In [6]:
point = pointplot_from_data_frame(level_results, "level", y_vars,
group_by="Dataset", color_by="Method",
color_palette=color_palette)
In [7]:
for k, v in point.items():
v.savefig(join(outdir, 'cross-val-{0}-lineplots.pdf'.format(k)))
In [8]:
result = per_level_kruskal_wallis(level_results, y_vars, group_by='Method',
dataset_col='Dataset', alpha=0.05,
pval_correction='fdr_bh')
result
Out[8]:
In [9]:
heatmap_from_data_frame(level_results, metric="Precision", rows=["Method", "Parameters"], cols=["Dataset", "level"])