Visualising alignments in IGV

It is often instructive to look at your data in a genome browser. Here, we use IGV, a stand-alone browser, which has the advantage of being installed locally and providing fast access. Please check their website (http://www.broadinstitute.org/igv) for all the formats that IGV can display.

Web-based genome browsers, like Ensembl or the UCSC browser, are slower, but provide more functionality. They do not only allow for more polished and flexible visualisation, but also provide easy access to a wealth of annotations and external data sources. This makes it straightforward to relate your data with information about repeat regions, known genes, epigenetic features or areas of cross-species conservation, to name just a few. As such, they are useful tools for exploratory analysis.

Visualisation will allow you to get a "feel" for the data, as well as detecting abnormalities and problems. Also, exploring the data in such a way may give you ideas for further analyses. For our visualization purposes we will use the BAM and bigWig formats.

When uploading a BAM file into the genome browser, the browser will look for the index of the BAM file in the same folder where the BAM files is. The index file should have the same name as the BAM file and the suffix .bai. Finally, to create the index of a BAM file you need to make sure that the file is sorted according to chromosomal coordinates.

If you are not in there already, change into the data directory.


In [ ]:
cd data

Sort alignments according to chromosome position and store the result in the file with the prefix PAX5.sorted:


In [ ]:
samtools sort -T PAX5.temp.bam -o PAX5.sorted.bam PAX5.bam

Index the sorted file.


In [ ]:
samtools index PAX5.sorted.bam

The indexing will create a file called PAX5.sorted.bam.bai. Note that you don't have to specify the name of the index file when running samtools index.

Another way to visualise the alignments is to convert the BAM file into a bigWig file. The bigWig format is for display of dense, continuous data and the data will be displayed as a graph. The resulting bigWig files are in an indexed binary format.

The BAM to bigWig conversion takes place in two steps. First, we convert the BAM file into a bedgraph, called PAX5.bedgraph, using the tool genomeCoverageBed from bedtools.

To find the structure of the command and the mandatory arguments type:


In [ ]:
genomeCoverageBed

Apart from the BAM file, we also need to provide the size of the chromosomes for the organism of interest in order to generate the bedgraph file. These have to be stored in a tab-delimited file. When using the UCSC Genome Browser, Ensembl, or Galaxy, you typically indicate which species or genome build you are working with. The way you do this for bedtools is to create a "genome" file, which simply lists the names of the chromosomes (or scaffolds, etc.) and their size (in basepairs).

To obtain chromosome lengths for the human genome, type:


In [ ]:
fetchChromSizes hg19 > genome/hg19.all.chrom.sizes

We next want to remove any chromosome length information for the patched chromosomes, which are accessioned scaffold sequences that represent assembly updates. That way we will only keep the information of the current assembly.

Remove this information using awk:


In [ ]:
awk '$1 !~ /[_.]/' genome/hg19.all.chrom.sizes > genome/hg19.chrom.sizes

Now generate the bedgraph file, called PAX5.bedgraph, by typing:


In [ ]:
genomeCoverageBed -bg -ibam PAX5.sorted.bam \
-g genome/hg19.chrom.sizes > PAX5.bedgraph

We then need to convert the bedgraph into a binary graph, called PAX5.bw, using the tool bedGraphToBigWig from the UCSC tools.

To convert the bedgraph type:


In [ ]:
bedGraphToBigWig PAX5.bedgraph genome/hg19.chrom.sizes PAX5.bw

Now we will load the data into the IGV browser for visualisation.

To launch IGV :


In [ ]:
igv.sh &

On the top left of your screen choose "Human hg19" from the drop down menu. Then in order to load the desired files go to "File –> Load from File".

On the pop up window navigate to the tutorial folder and select the file PAX5.sorted.bam.

Repeat these steps in order to load PAX5.bw as well.

Select "chr1" from the drop down menu on the top left.

Right click on the name of PAX5.bw and choose "Maximum" under the "Windowing Function".

Right click again and select "Autoscale".


Questions

Q1. Look for gene NASP in the search box. Can you see a PAX5 binding site near the NASP gene?
Hint: use the "+" button on the top right zoom in more to see the details of the alignment

Q2. What is the main difference between the visualisation of BAM and bigWig files?


What's next?

You can head back to manipulating SAM output or continue on to aligning the control sample to the genome.