Run ARIBA on all samples

This shows how to run ARIBA on a large number of samples. To save time, this notebook does not actually run any of the commands (each run of ARIBA takes a few minutes).

Reference database

First we need a reference database. You will already have one if you followed the instructions in the previous part of this tutorial (How to use custom reference data with ARIBA). Alternatively, you can use one of the public datasets that ARIBA supports. For example, to use CARD run these two commands to make an ARIBA database directory called ariba_db:

ariba getref card out.card

ariba prepareref -f out.card.fa -m out.card.tsv ariba_db

How to run on one sample

ARIBA needs the database directory, which we will call Ngo_ARIBAdb to be consistent with the previous section of the tutorial, and two sequencing reads files reads.1.fastq.gz, reads.2.fastq.gz. The command to run ARIBA is:

ariba run Ngo_ARIBAdb reads.1.fastq.gz reads.2.fastq.gz outdir

The above command will make a new directory called outdir that contains the results.

Run on all samples

The N. gonorrhoeae dataset consists of 1517 samples, and we need to run ARIBA on each sample, which can be done with a "for" loop. We assume that the reads files are named like this:

ERR1067813.1.fq.gz ERR1067813.2.fq.gz
ERR1067814.1.fq.gz ERR1067814.2.fq.gz
ERR1067815.1.fq.gz ERR1067815.2.fq.gz

Then we can run ARIBA on all samples like this (you may need to edit this command depending on how your own files are named):

for sample in `ls *.1.fq.gz | sed 's/\.1.fq.gz//'`
do
    ariba run Ngo_ARIBAdb $sample.1.fq.gz $sample.2.fq.gz $sample.ariba
done

For Sanger pathogens users only: use LSF to run all the jobs.

for sample in `ls *.1.fq.gz | sed 's/\.1.fq.gz//'`
do
    bsub.py 1 $sample.ariba ariba run Ngo_ARIBAdb \
    $sample.1.fq.gz $sample.2.fq.gz $sample.ariba
done

The output directory of each sample is called $sample.ariba, for example ERR1067813.ariba is the output directory for sample ERR1067813.

ARIBA output

The output files are described here.

Now go to the next part of the tutorial where we use Phandango to view the results.

You can also return to the index or revisit the previous section.