When your sample data is in the Pathogen Informatics databases, it becomes available to the automated analysis pipelines. After the analysis pipelines have been requested and run, you can use the pf
scripts to return the results of each of the automated analysis pipelines.
First up, we're going to look at how you can get the output from the QC pipeline. The QC pipeline generates a series of QC statistics about your data and runs Kraken which assigns each read to a taxon and will broadly tell you what's been sequenced. To get the QC results, we use pf qc
which returns the location of the Kraken report for a given study, sample or lane.
In this section of the tutorial we will cover:
pf qc
to get Kraken reportspf qc
to get a summary of the Kraken report at different taxonomic levelsFirst, let's tell the system the location of our tutorial configuration file.
In [ ]:
export PF_CONFIG_FILE=$PWD/data/pathfind.conf
Let's take a look at the pf qc
usage.
In [ ]:
pf qc -h
Now, let's get the QC pipeline results for lane 5477_6#1.
In [ ]:
pf qc -t lane -i 5477_6#1
This returned the location of the Kraken report on disk.
Let's take a quick look at the Kraken report.
In [ ]:
pf qc -t lane -i 5477_6#1 | xargs head
Notice that we used xargs
to give the filename that was returned to another command, in this case head
.
We can get a summary of this Kraken report using the --summary
or -s
option that will generate a new file called "qc_summary.csv" containing the taxon level Kraken results.
Let's get our taxon (strain) level QC summary for lane 5477_6#1.
In [ ]:
pf qc -t lane -i 5477_6#1 -s
In [ ]:
head qc_summary.csv
Here you can see the taxon level Kraken results i.e 1.08% of the reads were assigned to the Streptococcus pneumoniae strain Hungary19A-6.
We can look at the results for different taxonomic levels using the --level
or -L
option.
Let's try looking at the species level QC results for lane 5477_6#1.
In [ ]:
pf qc -t lane -i 5477_6#1 -L S -s qc_species_summary.csv -F
In [ ]:
head qc_species_summary.csv
Here we can see that 87.61% of the reads were classified as Streptococcus pneumoniae. This is promising as the sample is from Streptococcus pneumoniae.
The QC pipeline also generates a series of QC statistics for a given study, sample or lane which can be found on QC Grind.
In [ ]:
# Enter your answer here
Q2: What percentage of the reads from the lane 10018_1#1 were classified to the genus Actinobacillus by Kraken?
Hint: look at the level options in the pf qc
usage
In [ ]:
# Enter your answer here
In [ ]:
# Enter your answer here
For a quick recap of how to get metadata and accessions, head back to analysis pipeline status.
Otherwise, let's move on to how to get your mapping pipeline results.