Quality control (QC) pipeline results

Introduction

When your sample data is in the Pathogen Informatics databases, it becomes available to the automated analysis pipelines. After the analysis pipelines have been requested and run, you can use the pf scripts to return the results of each of the automated analysis pipelines.

First up, we're going to look at how you can get the output from the QC pipeline. The QC pipeline generates a series of QC statistics about your data and runs Kraken which assigns each read to a taxon and will broadly tell you what's been sequenced. To get the QC results, we use pf qc which returns the location of the Kraken report for a given study, sample or lane.

In this section of the tutorial we will cover:

  • using pf qc to get Kraken reports
  • using pf qc to get a summary of the Kraken report at different taxonomic levels

Exercise 5

First, let's tell the system the location of our tutorial configuration file.


In [ ]:
export PF_CONFIG_FILE=$PWD/data/pathfind.conf

Let's take a look at the pf qc usage.


In [ ]:
pf qc -h

Now, let's get the QC pipeline results for lane 5477_6#1.


In [ ]:
pf qc -t lane -i 5477_6#1

This returned the location of the Kraken report on disk.

Let's take a quick look at the Kraken report.


In [ ]:
pf qc -t lane -i 5477_6#1 | xargs head

Notice that we used xargs to give the filename that was returned to another command, in this case head.

We can get a summary of this Kraken report using the --summary or -s option that will generate a new file called "qc_summary.csv" containing the taxon level Kraken results.

Let's get our taxon (strain) level QC summary for lane 5477_6#1.


In [ ]:
pf qc -t lane -i 5477_6#1 -s

In [ ]:
head qc_summary.csv

Here you can see the taxon level Kraken results i.e 1.08% of the reads were assigned to the Streptococcus pneumoniae strain Hungary19A-6.

We can look at the results for different taxonomic levels using the --level or -L option.

Let's try looking at the species level QC results for lane 5477_6#1.


In [ ]:
pf qc -t lane -i 5477_6#1 -L S -s qc_species_summary.csv -F

In [ ]:
head qc_species_summary.csv

Here we can see that 87.61% of the reads were classified as Streptococcus pneumoniae. This is promising as the sample is from Streptococcus pneumoniae.

QC Grind

The QC pipeline also generates a series of QC statistics for a given study, sample or lane which can be found on QC Grind.

Questions

Q1: What percentage of the reads from lane 10018_1#1 were "unclassified" by Kraken?
Hint: you can use xargs and head to look at the start of the Kraken report returned by pf qc


In [ ]:
# Enter your answer here

Q2: What percentage of the reads from the lane 10018_1#1 were classified to the genus Actinobacillus by Kraken?
Hint: look at the level options in the pf qc usage


In [ ]:
# Enter your answer here

In [ ]:
# Enter your answer here

What's next?

For a quick recap of how to get metadata and accessions, head back to analysis pipeline status.

Otherwise, let's move on to how to get your mapping pipeline results.