When your sample data is in the Pathogen Informatics databases, it becomes available to the automated analysis pipelines. After the analysis pipelines have been requested and run, you can use the pf scripts to return the results of each of the automated analysis pipelines.
The mapping pipeline maps your raw sequence reads to a reference that you selected. We can use pf map to return the location of the BAM files that were produced by the mapping pipeline.
In this section of the tutorial we will cover:
pf map to get BAM files generated by the mapping pipelinepf map results by mapper and reference pf map to symlink BAM files generated by the mapping pipeline pf map to get mapping statisticsFirst, let's tell the system the location of our tutorial configuration file.
In [ ]:
    
export PF_CONFIG_FILE=$PWD/data/pathfind.conf
    
Let's take a look at the pf map usage.
In [ ]:
    
pf map -h
    
Now, let's get the mapping pipeline results for lane 5477_6#1.
In [ ]:
    
pf map -t lane -i 5477_6#1
    
This returns the locations of the BAM files which were produced by the mapping pipeline.
A quick way to get information about which mapper and reference were used by the mapping pipeline is to use the --details or -d option.
Let's get the mapping details for lane 5477_6#1.
In [ ]:
    
pf map -t lane -i 5477_6#1 -d
    
Here we can see that "smalt" was use as the mapper and "Streptococcus_pneumoniae_Taiwan19F-14_v1" was used as the reference.
You can request the mapping pipeline be run more than once using different mappers or reference. To filter the output by mapper we can use the --mapper or -M option and the --reference or -R option to filter by reference.
Let's look for mapping pipeline results for lane 5477_6#1 which used the mapper "smalt".
In [ ]:
    
pf map -t lane -i 5477_6#1 -M smalt
    
Here we got the same results as before. But, what if we try looking for results produced by a different mapper?
Let's look for mapping pipeline results for lane 5477_6#1 which used the mapper "bwa".
In [ ]:
    
pf map -t lane -i 5477_6#1 -M bwa
    
This gave us "No data found" as BWA hasn't been run on this lane.
Let's look for mapping pipeline results for lane 5477_6#1 which used the reference "Streptococcus_pneumoniae_Taiwan19F-14_v1".
In [ ]:
    
pf map -t lane -i 5477_6#1 \
    -R "Streptococcus_pneumoniae_Taiwan19F-14_v1"
    
Notice that we only get one BAM file (.bam) and its index (.bai) returned. This is because the mapping pipeline has been run twice on this lane, once using the reference "Streptococcus_pneumoniae_Taiwan19F-14_v1" and once with "Streptococcus_pneumoniae_ATCC_700669_v1".
We could then symlink the BAM files into a directory using the --symlink or -l option.
Let's symlink our BAM files for lane 5477_6#1 to "my_bam_files".
In [ ]:
    
pf map -t lane -i 5477_6#1 \
    -R "Streptococcus_pneumoniae_Taiwan19F-14_v1" \
    -l my_bam_files
    
In [ ]:
    
ls my_bam_files
    
We can also get some statistics from our mapping results using the --stats or -s option.
Let's get some mapping statistics for lane 5477_6#1.
In [ ]:
    
pf map -t lane -i 5477_6#1 -s
    
This generated a new file called "5477_6_1.mapping_stats.csv" which contains our mapping statistics.
In [ ]:
    
cat 5477_6_1.mapping_stats.csv
    
Notice that there are two rows for lane 5477_6#1. This is because the mapping pipeline was run twice on this lane using different references.
In [ ]:
    
# Enter your answer here
    
Q2: Which mappers have been used with the mapping pipeline for lane 5477_6#10?
Hint: the mapper is in the 3rd column of the details
In [ ]:
    
# Enter your answer here
    
Q3: Which references have been used with the mapping pipeline for lane 5477_6#10?
Hint: the reference is in the 2nd column of the details
In [ ]:
    
# Enter your answer here
    
Q4: What percentage of the reads from lane 5477_6#10 were mapped to "Streptococcus_pneumoniae_OXC141_v1"?
Hint: you can use awk to filter the statistics file by column 8 (reference) (make sure you set -F',' as the stats are comma-delimited!)
In [ ]:
    
# Enter your answer here
    
In [ ]:
    
# Enter your answer here
    
For a quick recap of how to get QC pipeline results, head back to QC pipeline results.
Otherwise, let's move on to how to get your SNP pipeline results.