Running Roary

At this stage you should have three GFF files generated by Prokka, each in its own directory. Provided your QC looked alright, you are now ready to run Roary to generate the pan genome.

We are going to run Roary twice, first with the default settings, and then using MAFFT to generate a core gene alignment. For both of these runs we will want all the annotation files in the same directory, so lets take a copy of them to our current directory:


In [ ]:
cp annotated_sample*/*.gff .

Run Roary with default settings

Running Roary with the default settings is very straightforward. All you need to do is to run roary *.gff and it will create a pan genome using all GFF files in the current directory. We want to run Roary twice with different settings, so in order to keep track of our output files from each run we will also specify an output directory where Roary should put the results. Give the following command a go:


In [ ]:
roary -f output_no_alignment *.gff

This will run for a minute or two.

We will have a closer look at the results in the next section, so for now let us just see that there are some output files in the directroy we asked Roary to create:


In [ ]:
ls -l output_no_alignment

Run Roary with MAFFT

To be able to create pretty trees and cool visualisations, we want to genereate a multi-FASTA alignment of the core genes. To do this, we will now run Roary again, but this time with some more options.

Option Description
-e Create a multiFASTA alignment of the core genes
--mafft Use with -e to use MAFFT instead of PRANK
-p Number of threads to use

By default, Roary will use PRANK when the -e option is speified. It is accurate but slow. MAFFT is less accurate but very fast so we are going to use this instead by specifying the --mafft option. To further speed things up, we are going to use 8 threads (the -p option). For all usage options, you can have a look at the Roary website.


In [ ]:
roary -f output_with_alignment -e --mafft -p 8 *.gff

This will take a bit longer to run than the previous command, about 5 minutes. Once finished you should have a directory called output_with_alignment containing the output files, this time including a core_gene_alignment.aln file. Just quickly check that this is the case and then we will head over to the next section: Exploring the results.


In [ ]:
ls -l output_with_alignment

Check your understanding

Q7: Why do we want to run Roary with MAFFT?
a) Because it's quicker than to run Roary without the -e option
b) To get more accurate results
c) To generate a core gene alignment

Q8: Why do we use the -p otion?
a) We have to when we use MAFFT
b) To speed up the run
c) To get a pretty tree

The answers to these questions can be found here.

You can also revisit the previous section, or go back to the index page.