Investigating MIC data

We will use the N. gonorrhoeae dataset. This tutorial includes pre-computed output of running ARIBA on all the samples, and the ARIBA database that was made in the first section. Do not worry if you did not follow that part of the tutorial - we will use a pre-computed version of the database called data/Ref/Ngo_ARIBAdb/.

ARIBA has a function called "micplot" that generates plots showing the distribution of MICs across samples with different combinations of genotypes. To use it, a file is required of MIC data for each sample and at least one drug. It looks like this:


In [ ]:
head data/mic_data.tsv

The first column must be named "Sample" and have names that exactly match those in ARIBA summary files used as input to micplot (we will see this shortly). The remaining columns should contain drug names and MIC scores, however, note that the first two columns contain other data that will be ignored by ARIBA. When ARIBA loads the file, it tries to convert everything in columns 2 onwards to numbers and assign a value of "NA" when this is not possible.

To run micplot, we need an MIC file, like the one above, and an ARIBA summary file (as described in the previous section). This generates a summary of known 23S and mtrR mutations and includes the "assembled" cluster column, so that interrupted mtrR can be identified:


In [ ]:
ariba summary --row_filter n --cluster_cols assembled,known_var \
    --only_clusters 23S,mtrR --v_groups --no_tree \
    --fofn data/filenames.fofn summary.AZMknowngroups

Now we can run micplot using the new file summary.AZMknowngroups.csv and the MIC file data/mic_data.tsv, showing the MIC data for azithromicin compared with the different combinations of sequences and known mutations in 23S and mtrR:


In [ ]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted Azithromycin \
    data/mic_data.tsv summary.AZMknowngroups.csv \
    micplot.AZMknowngroups

This produced a pdf file micplot.AZMknowngroups.pdf that looks like this:

There are various options that can be changed. We will show some of them here, but try running ariba micplot --help to see all the options.

Horizontal lines

Horizontal lines can be added to indicate import cutoffs for MIC data, in this case 0.25 and 2, using the option --hlines.


In [ ]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted --hlines 0.25,2 \
    Azithromycin data/mic_data.tsv summary.AZMknowngroups.csv \
    micplot.AZMknowngroups

Here is the result:

Plot styles

In the plots above, there is one point per sample. It can be hard to see how many points there are, despite there being jittering applied to the horizontal position. We can change the style to group the points together and plot circles of sizes proportional to the number of samples, using the option --point_size. This option determines the size of the points, but when set to zero if groups the points together.


In [ ]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted --hlines 0.25,2 \
    --point_size 0 Azithromycin data/mic_data.tsv \
    summary.AZMknowngroups.csv micplot.AZMknowngroups

Here is the result:

You can choose to not show the violin plots or the dots in the upper plot, using the option --plot_types. The default is violin,point, which means show both. To only show the dots:


In [ ]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted --hlines 0.25,2 \
 --plot_types point --point_size 0 \
 Azithromycin data/mic_data.tsv summary.AZMknowngroups.csv \
 micplot.AZMknowngroups

Here is the result:

Colours

There are various colour options - see the matplotlib colourmaps page for all of the available colour palettes. The default is "Accent", which has 8 colours. ARIBA will cycle through these, repeating colours if there are more than 8 columns in the plot. The palette can be changed using the option --colourmap.


In [ ]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted --hlines 0.25,2 \
 --colourmap PiYG --point_size 0 Azithromycin data/mic_data.tsv \
 summary.AZMknowngroups.csv micplot.AZMknowngroups

Here is the result:

The palette PiYG is continuous, and is almost white in the middle. This is not ideal. We can skip the range in the middle, specifically 40-60%, using the option --colour_skip:


In [ ]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted --hlines 0.25,2 \
 --colourmap PiYG --colour_skip 0.35,0.65 --point_size 0 \
 Azithromycin data/mic_data.tsv summary.AZMknowngroups.csv \
 micplot.AZMknowngroups

Here is the new plot:

The number of colours can be set to less than the number of columns using the option --number_of_colours. This makes ARIBA cycle the colours. Here is an example using the first three colours from the "Dark2" colour palette:


In [ ]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted --hlines 0.25,2 \
 --colourmap Dark2 --number_of_colours 3 --point_size 0 \
 Azithromycin data/mic_data.tsv summary.AZMknowngroups.csv \
 micplot.AZMknowngroups

And we only have three colours:

Setting the number of colours to one results in a black and white figure.


In [ ]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted --hlines 0.25,2 \
 --number_of_colours 1 --point_size 0 Azithromycin \
 data/mic_data.tsv summary.AZMknowngroups.csv micplot.AZMknowngroups

Here is the black and white figure:

This is the end of the tutorial. You can return to the index or revisit the previous section.