RNA-Seq expression analysis

Introduction

RNA sequencing (RNA-Seq) is a high-throughput method used to profile the transcriptome, quantify gene expression and discover novel RNA molecules. This tutorial uses RNA sequencing of malaria parasites to walk you through transcriptome visualisation, performing simple quality control checks and will show you how to profile transcriptomic differences by identifying differentially expressed genes.

For an introduction to RNA-Seq principles and best practices see:

A survey of best practices for RNA-Seq data analysis
Ana Conesa, Pedro Madrigal, Sonia Tarazona, David Gomez-Cabrero, Alejandra Cervera, Andrew McPherson, Michał Wojciech Szcześniak, Daniel J. Gaffney, Laura L. Elo, Xuegong Zhang and Ali Mortazavi
Genome Biol. 2016 Jan 26;17:13 doi:10.1186/s13059-016-0881-8

Learning outcomes

By the end of this tutorial you can expect to be able to:

  • Align RNA-Seq reads to a reference genome and a transcriptome
  • Visualise transcription data using standard tools
  • Perform QC of NGS transcriptomic data
  • Quantify the expression values of your transcripts using standard tools

Authors

This tutorial was written by Victoria Offord based on materials from Adam Reid.

Running the commands from this tutorial

You can run the commands in this tutorial either directly from the Jupyter notebook (if using Jupyter), or by typing the commands in your terminal window.

Running commands on Jupyter

If you are using Jupyter, command cells (like the one below) can be run by selecting the cell and clicking Cell -> Run from the menu above or using Ctrl Enter to run the command. Let's give this a try by printing our working directory using the pwd command and listing the files within it. Run the commands in the two cells below.


In [ ]:
pwd

In [ ]:
ls -l

Running commands in the terminal

You can also follow this tutorial by typing all the commands you see into a terminal window. This is similar to the "Command Prompt" window on MS Windows systems, which allows the user to type DOS commands to manage files.

To get started, select the cell below with the mouse and then either press control and enter or choose Cell -> Run in the menu at the top of the page.


In [ ]:
echo cd $PWD

Open a new terminal on your computer and type the command that was output by the previous cell followed by the enter key. The command will look similar to this:


In [ ]:
cd /home/manager/pathogen-informatics-training/Notebooks/RNA-Seq/

Now you can follow the instructions in the tutorial from here.

Prerequisites

This tutorial assumes that you have the following software or packages and their dependencies installed on your computer. The software or packages used in this tutorial may be updated from time to time so, we have also given you the version which was used when writing the tutorial.

Let's get started!

To get started with the tutorial, head to the first section: introducing the tutorial dataset.

The answers to all questions in the tutorial can be found in answers.ipynb.