The Pan Genome

The pan genome for a prokaryote population is the complete set of genes that it contains. This includes genes present in all of the genomes, and genes that are only present in some, or even only in one of the genomes. The subset of genes present in all of the genomes is called the core genome. These are often highly conserved genes with important functions, for instance housekeeping genes. The subset of genes that are not present in all, but in two or more of the genomes, is called the accessory genome. The accessory genome often contains genes that have been transferred between baterial strains, for example genes linked to virulence or drug resistance. Genes present in only one of the genomes can be referred to as strain specific.

As you can imagine, the pan, core and acessory genomes can provide important insight into the genetic structure of prokaryotic genomes. By analysing the pan genome we can gain a better understanding of key processes like evolution and selection. Roay is a software tool that allows you to calculate the pan genome from annotated bacterial genomes. It is fast and accurate and can conveniently be run on most modern PCs. In this tutorial we are going to guide you through a complete pan genome analysis, starting with annotation of the genomes, working through running the pan genome pipelin, and finally visualising the results.

Check your understanding

Q1: The pan genome contains:
a) Only genes present in one genome in a population
b) All genes from all genomes in a population
c) Only genes present in all genomes in a population

Q2: Core genes are:
a) Often important for basic cell functions
b) Present in only a subset of the genomes of a population
c) Often related to drug resistance

The answers to these questions can be found here.

Now that you know a bit more about pangenomes, let's head over to the next session: Preparing input data
You can also head back to the index page.