Introducing the tutorial dataset

Working through this tutorial, you will investigate the effect of vector transmission on gene expression of the malaria parasite. The dataset you will be using for this tutorial and Figure 1 have been taken from the following publication:

Vector transmission regulates immune control of Plasmodium virulence
Philip J. Spence, William Jarra, Prisca Lévy, Adam J. Reid, Lia Chappell, Thibaut Brugat, Mandy Sanders, Matthew Berriman and Jean Langhorne
Nature. 2013 Jun 13; 498(7453): 228–231 doi:10.1038/nature12231

Is the transcriptome of a mosquito-transmitted parasite different from one which has not passed through a mosquito?

The key reason for asking this question is that parasites which are transmitted by mosquito (MT) are less virulent (severe/harmful) than those which are serially blood passaged (SBP) in the laboratory. Figure 1A shows the malaria life cycle, the red part highlighting the mosquito stage. Figure 1B shows the difference in virulence, measured by blood parasitemia (presence of parasites in the blood), between mosquito-transmitted and serially blood passaged parasites.

Figure 1C shows that increasing numbers of blood passage post mosquito transmission results in increasing virulence, back to around 20% parasitemia. Subsequent mosquito transmission of high virulence parasites render them low virulence again.

We hypothesise that parasites which have been through the mosquito are somehow better able to control the mosquito immune system than those which have not. This control of the immune system would result in lower parasitemia because this is advantageous for the parasite. Too high a parasitemia is bad for the mouse and therefore bad for the parasite.


Exercise 1

In this tutorial, you will be analysing five RNA samples, each of which has been sequenced on an Illumina HiSeq sequencing machine. There are two conditions: serially blood-passaged parasites (SBP) and mosquito transmitted parasites (MT). One with three biological replicates (SBP), one with two biological replicates (MT).

Sample name Experimental condition Replicate number
MT1 mosquito transmitted parasites 1
MT2 mosquito transmitted parasites 2
SBP1 serially blood-passaged parasites 1
SBP2 serially blood-passaged parasites 2
SBP3 serially blood-passaged parasites 3

The tutorial files can be found in the data directory. Let's go there now!

Move into the directory containing the tutorial data files.


In [ ]:
cd data

Check to see if the tutorial FASTQ files are there.


In [ ]:
ls *.fastq

If the previous ls command didn't return anything, download and uncompress the tutorial FASTQ files.


In [ ]:
wget ftp://ftp.sanger.ac.uk/pub/project/pathogens/workshops/RNASeq_fq.tar.gz
tar -xf RNASeq_fq.tar.gz
mv RNASeq_tutorial_fastqs/* .
gunzip *.fastq.gz

The FASTQ files contain the raw sequence reads for each sample. There will typically be four lines per read:

  1. Header
  2. Sequence
  3. Separator (usually a '+')
  4. Encoded quality value

Take a look at one of the FASTQ files.


In [ ]:
head MT1_1.fastq

You can find out more about the FASTQ format at https://en.wikipedia.org/wiki/FASTQ_format.


Questions

Q1: Why is there more than one FASTQ file per sample?

Hint: think about why there is a MT1_1.fastq and a MT1_2.fastq

Q2: How many reads were generated for the MT1 sample?

Hint: we want the total number of reads from both files (MT1_1.fastq and MT1_2.fastq) so perhaps think about the FASTQ format and the number of lines for each read or whether there's anything you can use in the FASTQ header to search and count...


What's next?

For a quick recap of what the tutorial covers and the software you will need, head back to the Introduction.

Otherwise, let's get started with mapping RNA-Seq reads to the genome using HISAT2.