Visualising transcriptomes with IGV

Introduction

Integrative Genome Viewer (IGV) allows us to visualise genomic datasets. We have a quick start guide here which contains the information you need to complete Exercise 3. The IGV user guide contains more information on all of the IGV features and functions: http://software.broadinstitute.org/software/igv/UserGuide.

The objectives of this part of the tutorial are:

load a reference genome into IGV and navigate the genome
load an annotation file into IGV and explore gene structure
load read alignments into IGV and inspect read alignments

Exercise 3

First, we will use samtools to create an index for the P. chabaudi reference genome, which IGV will use to traverse the genome. This index file will have the extension .fai and should always be in the same directory as the reference genome.

Make sure you are in the data directory with the tutorial files.



In [ ]:

    
cd data

Index the genome fasta file (required by IGV).



In [ ]:

    
samtools faidx PccAS_v3_genome.fa

Start IGV.



In [ ]:

    
igv.sh

This will open the IGV main window. Now, we need to tell IGV which genome we want to use. IGV has many pre-loaded genomes available, but P. chabaudi is not one of them. This means we will need to load our genome from a file.

Load your reference genome into IGV. Go to "Genomes -> Load Genome from File…". Select "PccAS_v3_genome.fa" and click "Open". For more information, see Loading a reference genome in our quick start guide.

We not only want to see where our reads have mapped, but what genes they have mapped to. For this, we have an annotation file in GFF3 format. This contains a list of features, their co-ordinates and orientations which correspond to our reference genome.

Example from PccAS_v3 GFF3

Load your annotation file into IGV. Go to ""File -> Load from File…". Select "PccAS_v3.gff3" and click "Open". For more information, see Loading gene annotations in our quick start guide.

This will load a new track called "PccAS_v3.gff3". The track is currently shown as a density plot. You will need to zoom in to see individual genes.

Search for the gene PCHAS_0505200 by typing "PCHAS_0505200" in the search box to zoom in and centre the view on PCHAS_0505200.

To get a clearer view of the gene structure, right click on the annotation track and click "Expanded".

In the annotation track, genes are presented as blue boxes and lines. These boxes represent exons, while the lines represent intronic regions. Arrows indicate the direction (or strand) of transcription for each of the genes. Now we have our genome and its annotated features, we just need the read alignments for our five samples.

Load your alignment file for the MT1 sample into IGV. Go to ""File -> Load from File…". Select "MT1_sorted.bam" and click "Open". For more information, see Loading alignment files in our quick start guide.

Note: BAM files and their corresponding index files must be in the same directory for IGV to load them properly.

This will load a new track called "MT1_sorted.bam" which contains the read alignments for the MT1 sample. We can change how we visualise our data by altering the view options. By default, IGV will display reads individually so they are compactly arranged. If you were to hover over a read in the default view, you will only get the details for that read. However, if we change our view so that the reads are visualised as pairs, the read pairs will be joined together by line and when we hover over either of the reads, we will get information about both of the reads in that pair.

To view our reads as pairs, right click on the MT1_sorted.bam alignment track and click "View as pairs".

To condense the alignment, right click on the MT1_sorted.bam alignment track and click "Squished".

For more information on sorting, grouping and visualising read alignments, see the IGV user guide.

Load the remaining sorted BAM files for the MT2 sample and the three SBP samples.

Using the search box in the toolbar, go to PCHAS_1409500. For more information, see Jump to gene or locus in our quick start guide.

The first thing to look at is the coverage range for this viewing window on the left-hand side. The three SBP samples have 2-3 times more reads mapping to this gene than the two MT samples. While at first glance it may seem like this gene may be differentially expressed between the two conditions, remember that some samples may have been sequenced to a greater depth than others. So, if a sample has been sequenced to a greater depth we would expect more reads to map in general.

From the gene annotation at the bottom we can also see that there are three annotated exon/CDS features for this gene. However, the coverage plot suggests there may be a fourth unannotated exon which, given the direction of the gene, could suggest a 5' untranslated region (UTR). Note the clean drop off of the coveraged at around position 377,070.

Questions

Q1: How many CDS features are there in "PCHAS_1402500"?

Hint: Look at Jump to gene or locus in our quick start guide.

Q2: Does the RNA-seq mapping agree with the gene model in blue?

Hint: Look at the coverage track and split read alignments.

Q3: Do you think this gene is differentially expressed and is looking at the coverage plots alone a reliable way to assess differential expression?

Hint: Look at the coverage similarities/differences between the MT and SBP samples.

What's next?

You can head back to mapping RNA-Seq reads to the genome using HISAT2 or continue on to transcript quantification with Kallisto.