When your sample data is in the Pathogen Informatics databases, it becomes available to the automated analysis pipelines. After the analysis pipelines have been requested and run, you can use the pf
scripts to return the results of each of the automated analysis pipelines.
The mapping pipeline maps your raw sequence reads to a reference that you selected. We can use pf map
to return the location of the BAM files that were produced by the mapping pipeline.
In this section of the tutorial we will cover:
pf map
to get BAM files generated by the mapping pipelinepf map
results by mapper and reference pf map
to symlink BAM files generated by the mapping pipeline pf map
to get mapping statisticsFirst, let's tell the system the location of our tutorial configuration file.
In [ ]:
export PF_CONFIG_FILE=$PWD/data/pathfind.conf
Let's take a look at the pf map
usage.
In [ ]:
pf map -h
Now, let's get the mapping pipeline results for lane 5477_6#1.
In [ ]:
pf map -t lane -i 5477_6#1
This returns the locations of the BAM files which were produced by the mapping pipeline.
A quick way to get information about which mapper and reference were used by the mapping pipeline is to use the --details
or -d
option.
Let's get the mapping details for lane 5477_6#1.
In [ ]:
pf map -t lane -i 5477_6#1 -d
Here we can see that "smalt" was use as the mapper and "Streptococcus_pneumoniae_Taiwan19F-14_v1" was used as the reference.
You can request the mapping pipeline be run more than once using different mappers or reference. To filter the output by mapper we can use the --mapper
or -M
option and the --reference
or -R
option to filter by reference.
Let's look for mapping pipeline results for lane 5477_6#1 which used the mapper "smalt".
In [ ]:
pf map -t lane -i 5477_6#1 -M smalt
Here we got the same results as before. But, what if we try looking for results produced by a different mapper?
Let's look for mapping pipeline results for lane 5477_6#1 which used the mapper "bwa".
In [ ]:
pf map -t lane -i 5477_6#1 -M bwa
This gave us "No data found" as BWA hasn't been run on this lane.
Let's look for mapping pipeline results for lane 5477_6#1 which used the reference "Streptococcus_pneumoniae_Taiwan19F-14_v1".
In [ ]:
pf map -t lane -i 5477_6#1 \
-R "Streptococcus_pneumoniae_Taiwan19F-14_v1"
Notice that we only get one BAM file (.bam) and its index (.bai) returned. This is because the mapping pipeline has been run twice on this lane, once using the reference "Streptococcus_pneumoniae_Taiwan19F-14_v1" and once with "Streptococcus_pneumoniae_ATCC_700669_v1".
We could then symlink the BAM files into a directory using the --symlink
or -l
option.
Let's symlink our BAM files for lane 5477_6#1 to "my_bam_files".
In [ ]:
pf map -t lane -i 5477_6#1 \
-R "Streptococcus_pneumoniae_Taiwan19F-14_v1" \
-l my_bam_files
In [ ]:
ls my_bam_files
We can also get some statistics from our mapping results using the --stats
or -s
option.
Let's get some mapping statistics for lane 5477_6#1.
In [ ]:
pf map -t lane -i 5477_6#1 -s
This generated a new file called "5477_6_1.mapping_stats.csv" which contains our mapping statistics.
In [ ]:
cat 5477_6_1.mapping_stats.csv
Notice that there are two rows for lane 5477_6#1. This is because the mapping pipeline was run twice on this lane using different references.
In [ ]:
# Enter your answer here
Q2: Which mappers have been used with the mapping pipeline for lane 5477_6#10?
Hint: the mapper is in the 3rd column of the details
In [ ]:
# Enter your answer here
Q3: Which references have been used with the mapping pipeline for lane 5477_6#10?
Hint: the reference is in the 2nd column of the details
In [ ]:
# Enter your answer here
Q4: What percentage of the reads from lane 5477_6#10 were mapped to "Streptococcus_pneumoniae_OXC141_v1"?
Hint: you can use awk
to filter the statistics file by column 8 (reference) (make sure you set -F',' as the stats are comma-delimited!)
In [ ]:
# Enter your answer here
In [ ]:
# Enter your answer here
For a quick recap of how to get QC pipeline results, head back to QC pipeline results.
Otherwise, let's move on to how to get your SNP pipeline results.