Mapping pipeline results

Introduction

When your sample data is in the Pathogen Informatics databases, it becomes available to the automated analysis pipelines. After the analysis pipelines have been requested and run, you can use the pf scripts to return the results of each of the automated analysis pipelines.

The mapping pipeline maps your raw sequence reads to a reference that you selected. We can use pf map to return the location of the BAM files that were produced by the mapping pipeline.

In this section of the tutorial we will cover:

  • using pf map to get BAM files generated by the mapping pipeline
  • filtering pf map results by mapper and reference
  • using pf map to symlink BAM files generated by the mapping pipeline
  • using pf map to get mapping statistics

Exercise 6

First, let's tell the system the location of our tutorial configuration file.


In [ ]:
export PF_CONFIG_FILE=$PWD/data/pathfind.conf

Let's take a look at the pf map usage.


In [ ]:
pf map -h

Now, let's get the mapping pipeline results for lane 5477_6#1.


In [ ]:
pf map -t lane -i 5477_6#1

This returns the locations of the BAM files which were produced by the mapping pipeline.

A quick way to get information about which mapper and reference were used by the mapping pipeline is to use the --details or -d option.

Let's get the mapping details for lane 5477_6#1.


In [ ]:
pf map -t lane -i 5477_6#1 -d

Here we can see that "smalt" was use as the mapper and "Streptococcus_pneumoniae_Taiwan19F-14_v1" was used as the reference.

You can request the mapping pipeline be run more than once using different mappers or reference. To filter the output by mapper we can use the --mapper or -M option and the --reference or -R option to filter by reference.

Let's look for mapping pipeline results for lane 5477_6#1 which used the mapper "smalt".


In [ ]:
pf map -t lane -i 5477_6#1 -M smalt

Here we got the same results as before. But, what if we try looking for results produced by a different mapper?

Let's look for mapping pipeline results for lane 5477_6#1 which used the mapper "bwa".


In [ ]:
pf map -t lane -i 5477_6#1 -M bwa

This gave us "No data found" as BWA hasn't been run on this lane.

Let's look for mapping pipeline results for lane 5477_6#1 which used the reference "Streptococcus_pneumoniae_Taiwan19F-14_v1".


In [ ]:
pf map -t lane -i 5477_6#1 \
    -R "Streptococcus_pneumoniae_Taiwan19F-14_v1"

Notice that we only get one BAM file (.bam) and its index (.bai) returned. This is because the mapping pipeline has been run twice on this lane, once using the reference "Streptococcus_pneumoniae_Taiwan19F-14_v1" and once with "Streptococcus_pneumoniae_ATCC_700669_v1".

We could then symlink the BAM files into a directory using the --symlink or -l option.

Let's symlink our BAM files for lane 5477_6#1 to "my_bam_files".


In [ ]:
pf map -t lane -i 5477_6#1 \
    -R "Streptococcus_pneumoniae_Taiwan19F-14_v1" \
    -l my_bam_files

In [ ]:
ls my_bam_files

We can also get some statistics from our mapping results using the --stats or -s option.

Let's get some mapping statistics for lane 5477_6#1.


In [ ]:
pf map -t lane -i 5477_6#1 -s

This generated a new file called "5477_6_1.mapping_stats.csv" which contains our mapping statistics.


In [ ]:
cat 5477_6_1.mapping_stats.csv

Notice that there are two rows for lane 5477_6#1. This is because the mapping pipeline was run twice on this lane using different references.

Questions

Q1: How many BAM files are returned by default for lane 5477_6#10?


In [ ]:
# Enter your answer here

Q2: Which mappers have been used with the mapping pipeline for lane 5477_6#10?
Hint: the mapper is in the 3rd column of the details


In [ ]:
# Enter your answer here

Q3: Which references have been used with the mapping pipeline for lane 5477_6#10?
Hint: the reference is in the 2nd column of the details


In [ ]:
# Enter your answer here

Q4: What percentage of the reads from lane 5477_6#10 were mapped to "Streptococcus_pneumoniae_OXC141_v1"?
Hint: you can use awk to filter the statistics file by column 8 (reference) (make sure you set -F',' as the stats are comma-delimited!)


In [ ]:
# Enter your answer here

In [ ]:
# Enter your answer here

What's next?

For a quick recap of how to get QC pipeline results, head back to QC pipeline results.

Otherwise, let's move on to how to get your SNP pipeline results.