You can use the pf status
script to return information about the status of your samples within the automated analysis pipelines, allowing you to see which pipelines have been run on the data.
The automated analysis pipelines available include:
Running pf status
will return a table with one row per lane and one column per pipeline. In that table, you will see either a '-' meaning that the pipeline hasn't been run or, if the pipelines have been requested, 'Running', 'Done' or 'Failed' for each of the lanes.
Let's take lane 5477_6#1 as an example. Here is the output from pf status
.
Name | Import | QC | Mapping | Archive | Improve | SNP call | RNASeq | Assemble | Annotate |
---|---|---|---|---|---|---|---|---|---|
5477_6#1 | Done | Done | Done | Done | - | Done | - | Done | Done |
This tells us that for lane 5477_6#1, the import, QC, mapping, archive, SNP calling, assembly and annotation pipelines have been run and are finished (Done).
In this section of the tutorial we will cover:
pf status
to determine the status of samples in the various pathogen informatics pipelines
In [ ]:
export PF_CONFIG_FILE=$PWD/data/pathfind.conf
We can get the status of all lanes associated with a study by setting the type (-t
or --type
) to "study" and giving the study ID or name as the identifier (-i
or --id
).
Let's get the status of the lanes associated with study 664.
In [ ]:
pf status -t study -i 664
Here you can see that the import, QC, mapping, archive, SNP calling, assembly and annotation pipelines have been run and are finished (Done) for all of the lanes associated with study 664.
Let's try this again using the study name, "Streptococcus pneumoniae global lineages", instead.
In [ ]:
pf status -t study -i "Streptococcus pneumoniae global lineages"
You can see that we get the same result as if we'd used the study ID. It's important to remember to put the study name in quotes (") because it has spaces in in.
Let's try using our study name without the quotes.
In [ ]:
pf status -t study -i Streptococcus pneumoniae global lineages
Oh, errors and the usage. This is why you should get into the habbit of putting the study name between double quotes.
Let's get the status of the lane 5477_6#1.
In [ ]:
pf status -t lane -i 5477_6#1
Alternatively, we can get the sample name for that lane with pf info
and use the sample name to get the status.
Let's get the corresponding sample name for lane 5477_6#1.
In [ ]:
pf info -t lane -i 5477_6#1
Now let's use the sample name that was returned, Tw01_0055, to get the status.
In [ ]:
pf status -t sample -i Tw01_0055
Finally, let's get the status of the lanes in "data/lanes.txt".
In [ ]:
pf status -t file -i data/lanes.txt
Q1: Has the assembly pipeline been run on lane 10018_1#1? If so, what is the status?
In [ ]:
# Enter your answer here
Q2: Which lanes in study 607 has the assembly pipeline been run on?
Hint: you could use awk
to get the assembly column (9th column)
In [ ]:
# Enter your answer here
Q3: How many lanes in study 607 has the mapping pipeline been run on?
Hint: you could use awk
to get the mapping column (4th column) and wc
to count the number of lines returned
In [ ]:
# Enter your answer here
For a quick recap of how to get metadata and accessions, head back to sample information and accessions.
Otherwise, let's move on to how to get your QC pipeline results.