Analysis pipeline status

Introduction

You can use the pf status script to return information about the status of your samples within the automated analysis pipelines, allowing you to see which pipelines have been run on the data.

The automated analysis pipelines available include:

  • Quality control (QC)
  • Mapping
  • SNP calling
  • Bacterial, Eukaryote and Pacbio assembly
  • Annotation
  • RNA-Seq expression

Running pf status will return a table with one row per lane and one column per pipeline. In that table, you will see either a '-' meaning that the pipeline hasn't been run or, if the pipelines have been requested, 'Running', 'Done' or 'Failed' for each of the lanes.

Let's take lane 5477_6#1 as an example. Here is the output from pf status.

Name Import QC Mapping Archive Improve SNP call RNASeq Assemble Annotate
5477_6#1 Done Done Done Done - Done - Done Done

This tells us that for lane 5477_6#1, the import, QC, mapping, archive, SNP calling, assembly and annotation pipelines have been run and are finished (Done).

In this section of the tutorial we will cover:

  • using pf status to determine the status of samples in the various pathogen informatics pipelines

Exercise 4

First, let's tell the system the location of our tutorial configuration file.


In [ ]:
export PF_CONFIG_FILE=$PWD/data/pathfind.conf

We can get the status of all lanes associated with a study by setting the type (-t or --type) to "study" and giving the study ID or name as the identifier (-i or --id).

Let's get the status of the lanes associated with study 664.


In [ ]:
pf status -t study -i 664

Here you can see that the import, QC, mapping, archive, SNP calling, assembly and annotation pipelines have been run and are finished (Done) for all of the lanes associated with study 664.

Let's try this again using the study name, "Streptococcus pneumoniae global lineages", instead.


In [ ]:
pf status -t study -i "Streptococcus pneumoniae global lineages"

You can see that we get the same result as if we'd used the study ID. It's important to remember to put the study name in quotes (") because it has spaces in in.

Let's try using our study name without the quotes.


In [ ]:
pf status -t study -i Streptococcus pneumoniae global lineages

Oh, errors and the usage. This is why you should get into the habbit of putting the study name between double quotes.

Let's get the status of the lane 5477_6#1.


In [ ]:
pf status -t lane -i 5477_6#1

Alternatively, we can get the sample name for that lane with pf info and use the sample name to get the status.

Let's get the corresponding sample name for lane 5477_6#1.


In [ ]:
pf info -t lane -i 5477_6#1

Now let's use the sample name that was returned, Tw01_0055, to get the status.


In [ ]:
pf status -t sample -i Tw01_0055

Finally, let's get the status of the lanes in "data/lanes.txt".


In [ ]:
pf status -t file -i data/lanes.txt

Questions

Q1: Has the assembly pipeline been run on lane 10018_1#1? If so, what is the status?


In [ ]:
# Enter your answer here

Q2: Which lanes in study 607 has the assembly pipeline been run on?
Hint: you could use awk to get the assembly column (9th column)


In [ ]:
# Enter your answer here

Q3: How many lanes in study 607 has the mapping pipeline been run on?
Hint: you could use awk to get the mapping column (4th column) and wc to count the number of lines returned


In [ ]:
# Enter your answer here

What's next?

For a quick recap of how to get metadata and accessions, head back to sample information and accessions.

Otherwise, let's move on to how to get your QC pipeline results.