Working through this tutorial, you will investigate the effect of vector transmission on gene expression of the malaria parasite. The dataset you will be using for this tutorial and Figure 1 have been taken from the following publication:
Vector transmission regulates immune control of Plasmodium virulence
Philip J. Spence, William Jarra, Prisca Lévy, Adam J. Reid, Lia Chappell, Thibaut Brugat, Mandy Sanders, Matthew Berriman and Jean Langhorne
Nature. 2013 Jun 13; 498(7453): 228–231 doi:10.1038/nature12231
The key reason for asking this question is that parasites which are transmitted by mosquito (MT) are less virulent (severe/harmful) than those which are serially blood passaged (SBP) in the laboratory. Figure 1A shows the malaria life cycle, the red part highlighting the mosquito stage. Figure 1B shows the difference in virulence, measured by blood parasitemia (presence of parasites in the blood), between mosquito-transmitted and serially blood passaged parasites.
Figure 1C shows that increasing numbers of blood passage post mosquito transmission results in increasing virulence, back to around 20% parasitemia. Subsequent mosquito transmission of high virulence parasites render them low virulence again.
We hypothesise that parasites which have been through the mosquito are somehow better able to control the mosquito immune system than those which have not. This control of the immune system would result in lower parasitemia because this is advantageous for the parasite. Too high a parasitemia is bad for the mouse and therefore bad for the parasite.
In this tutorial, you will be analysing five RNA samples, each of which has been sequenced on an Illumina HiSeq sequencing machine. There are two conditions: serially blood-passaged parasites (SBP) and mosquito transmitted parasites (MT). One with three biological replicates (SBP), one with two biological replicates (MT).
Sample name | Experimental condition | Replicate number |
---|---|---|
MT1 | mosquito transmitted parasites | 1 |
MT2 | mosquito transmitted parasites | 2 |
SBP1 | serially blood-passaged parasites | 1 |
SBP2 | serially blood-passaged parasites | 2 |
SBP3 | serially blood-passaged parasites | 3 |
The tutorial files can be found in the data
directory. Let's go there now!
Move into the directory containing the tutorial data files.
In [ ]:
cd data
Check to see if the tutorial FASTQ files are there.
In [ ]:
ls *.fastq
If the previous ls
command didn't return anything, download and uncompress the tutorial FASTQ files.
In [ ]:
wget ftp://ftp.sanger.ac.uk/pub/project/pathogens/workshops/RNASeq_fq.tar.gz
tar -xf RNASeq_fq.tar.gz
mv RNASeq_tutorial_fastqs/* .
gunzip *.fastq.gz
The FASTQ files contain the raw sequence reads for each sample. There will typically be four lines per read:
Take a look at one of the FASTQ files.
In [ ]:
head MT1_1.fastq
You can find out more about the FASTQ format at https://en.wikipedia.org/wiki/FASTQ_format.
Hint: think about why there is a MT1_1.fastq and a MT1_2.fastq
Hint: we want the total number of reads from both files (MT1_1.fastq and MT1_2.fastq) so perhaps think about the FASTQ format and the number of lines for each read or whether there's anything you can use in the FASTQ header to search and count...
For a quick recap of what the tutorial covers and the software you will need, head back to the Introduction.
Otherwise, let's get started with mapping RNA-Seq reads to the genome using HISAT2.