Preparation of databases before running SeroBA

In order to use SeroBA for serotyping we must first download and prepare the necessary databases. Start by moving into the data directory:


In [ ]:
cd data

Now download the database from the GitHub repository:


In [ ]:
svn checkout "https://github.com/sanger-pathog\
    ens/seroba/trunk/database"

NOTE if you are running a version of SeroBA older than v.0.1.3 the database is not packaged with the program and you will have to download it using the below command instead:

seroba getPneumocat database_dir

KMC is used by SeroBA to count k-mers and ARIBA is used to avoid the need for reads to be mapped to all reference sequences. Both of these require a database to be set up.

To create a database for KMC and ARIBA run createDBs:

seroba createDBs database/ kmer_size

Where the options are:

database       The database directory which you just downloaded
kmer_size      The k-mer size you want to use for kmc. Recommended = 71

SeroBA uses a default k-mer size of 71 for a read length of 250 bp. When deciding on a k-mer size, it is worth knowing that while a smaller k-mer size can keep the memory requirements low, it will also reduce the specificity. On the other hand, a larger k-mer size will require a larger amount of memory but will produce more unique k-mers and thus increase the specificity. What k-mer size to use also depends on the read length.


In [ ]:
seroba createDBs database/ 71

If you are working with SeroBA on the Sanger farm, the database with k-mer size 71 is already available centrally. This means you do not need to create the database for using SeroBA on the Sanger farm.

However, for the sake of this tutorial, the above steps need to be compleated before you can continue with the tutorial.

In the next section we are going to run SeroBA to determine the serotype of one sample. You can also return to the index or revisit the previous section.