Pangenome Construction using Roary

Introduction

Given a set of genomes, the pan genome is the collection of all genes the set contains. Roary, the pan genome pipeline, takes closely related annotated genomes in GFF3 file format and calculates the pan genome.

For more in depht information about Roary, please feel free to have a look the paper:

Roary: Rapid large-scale prokaryote pan genome analysis
Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill
Bioinformatics, 2015;31(22):3691-3693 doi:10.1093/bioinformatics/btv421

or visit the Roary manual.

Learning outcomes

By the end of this tutorial you can expect to be able to:

  • Describe what a pangenome is
  • Prepare data for input to Roary
  • Perform QC on input data and understand why QC is important
  • Run Roary to create a pangenome with and without a core alignment
  • Understand the different output files produced by Roary
  • Draw a basic tree from the core gene alignment produced by Roary
  • Query the pangenome results produced by Roary
  • Use Phandango to visualise the results produced by Roary

Tutorial sections

This tutorial comprises the following sections:

  1. What is a pan genome
  2. Preparing the input data
  3. Performing QC on your data
  4. Running Roary
  5. Exploring the results
  6. Visualising the results with Phandango

Authors

This tutorial was created by Sara Sjunnebo.

Running the commands from this tutorial

You can run the commands in this tutorial either directly from the Jupyter notebook (if using Jupyter), or by typing the commands in your terminal window.

Running commands on Jupyter

If you are using Jupyter, command cells (like the one below) can be run by selecting the cell and clicking Cell -> Run from the menu above or using ctrl Enter to run the command. Let's give this a try by printing our working directory using the pwd command and listing the files within it. Run the commands in the two cells below.


In [ ]:
pwd

In [ ]:
ls -l

Running commands in the terminal

You can also follow this tutorial by typing all the commands you see into a terminal window. This is similar to the "Command Prompt" window on MS Windows systems, which allows the user to type DOS commands to manage files.

To get started, select the cell below with the mouse and then either press control and enter or choose Cell -> Run in the menu at the top of the page.


In [ ]:
echo cd $PWD

Now open a new terminal on your computer and type the command that was output by the previous cell followed by the enter key. The command will look similar to this:

cd /home/manager/pathogen-informatics-training/Notebooks/ROARY/

Now you can follow the instructions in the tutorial from here.

Let’s get started!

This tutorial assumes that you have Roary and Prokka installed on your computer. For download and installation instructions, please see:

To check that you have installed Roary correctly, you can run the following command:


In [ ]:
roary --help

This should return the help message for Roary.

Similarly, to check that you have installed Prokka correctly, you can run:


In [ ]:
prokka --help

This should return the help message for Prokka.

To get started with the tutorial, head to the first section: What is a pan genome
The answers to all questions in the tutorial can be found here.