When your sample data is in the Pathogen Informatics databases, it becomes available to the automated analysis pipelines. After the analysis pipelines have been requested and run, you can use the pf
scripts to return the results of each of the automated analysis pipelines.
The genome assembly pipeline used depends on sequence data and organism:
We can use pf assembly
to return the location of assembly pipeline results.
In this section of the tutorial we will cover:
pf assembly
to get assembly pipeline resultspf assembly
results by programpf assembly
to symlink assembly pipeline resultspf assembly
to get assembly statisticsFirst, let's tell the system the location of our tutorial configuration file.
In [ ]:
export PF_CONFIG_FILE=$PWD/data/pathfind.conf
Let's take a look at the pf assembly
usage.
In [ ]:
pf assembly -h
Now, let's get the assembly pipeline results for run 5477_6#1.
In [ ]:
pf assembly -t lane -i 5477_6#1
This returns the locations of the FASTA-formatted contig files which were produced by the assembly pipeline.
By default, pf assembly
will return the scaffolded contigs. But, what if you want to see all of the assembled contigs. To get these we can use the --filetype
or -f
option.
In [ ]:
pf assembly -t lane -i 5477_6#1 -f all
This returns a third file, "unscaffolded_contigs.fa".
Notice that the results are located in a directories which are named after the assembler that was used to generate the assembly e.g. "spades_assembly". This tells us that SPAdes was the program used to generate the assembly. A quick way to filter assembly pipeline results by program is to use the --progam
or -P
option.
Let's get all assembly pipeline results for run 5477_6 which were generated using "spades".
In [ ]:
pf assembly -t lane -i 5477_6 -P spades
Here we can see that SPAdes was used to generate assemblies for lanes 5477_6#1 and 5477_6#3. We can symlink these assemblies into a directory using the --symlink
or -l
option.
Let's symlink the assembly pipeline results for run 5477_6 which were generated with SPAdes to "5477_6_spades".
In [ ]:
pf assembly -t lane -i 5477_6 -P spades -l 5477_6_spades
In [ ]:
ls 5477_6_spades
We can also get some statistics from our assembly results using the --stats
or -s
option.
Let's get some assembly statistics for lane 10018_1#2.
In [ ]:
pf assembly -t lane -i 5477_6#1 -s
This generated a new file called "5477_6_1.assemblyfind_stats.csv" which contains our assembly statistics.
In [ ]:
cat 5477_6_1.assemblyfind_stats.csv
In [ ]:
# Enter your answer here
Q2: Which program was used to generate the assembly for lane 10018_1#51?
Hint: look at the location path
In [ ]:
# Enter your answer here
Q3: Symlink the assembly/assemblies generated by "IVA" for run 10018_1 into a new directory called "iva_results".
Hint: don't forget to filter the results if more than one program has been used
In [ ]:
# Enter your answer here
Q4: How many contigs were assembled by velvet for lane 5477_6#2 and what is the N50?
Hint: you'll need to get some statistics for this lane and filter by program
In [ ]:
# Enter your answer here
In [ ]:
# Enter your answer here
For a quick recap of how to get QC pipeline results, head back to SNP calling pipeline results.
Otherwise, let's move on to how to get your annotation pipeline results.