Introducing the tutorial dataset

Introduction

Working through this tutorial, you will be investigating IL-22RA1 signalling at the intestinal epithelium. The dataset you will be using for this tutorial forms part of the following publication:

Epithelial IL-22RA1-Mediated Fucosylation Promotes Intestinal Colonization Resistance to an Opportunistic Pathogen
Pham TA, Clare S, Goulding D, Arasteh JM et al.
Cell Host Microbe. 2014 Oct 8;16(4):504-16. doi: 10.1016/j.chom.2014.08.017.
PMID: 25263220

For Sanger pathogen users, you can find this dataset under study id 2319 using the pf commands. Click here for more information.

Background

Cytokines are small, secreted proteins which effect the behavior of other cells. Due to their crucial role in cell signalling, they are often targets of RNA-Seq studies. In this study, the authors were interested in interleukin 22 (IL-22), an important mediator of host mucosal defence, and its receptor, interleukin 22 receptor subunit alpha 1 (IL-22RA1). IL-22 targets receptors on the surface of cells that line the intestines, also known as the intestinal epithelium. It can stimulate these cells to multiply, produce antimicrobial peptides and shed, providing a local defence against colonisation of bacterial and fungal pathogens.

However, the relationship between IL-22 and host defense is complex. While it may be involved in preventing colonisation, in some situations it has been shown to promote colonisation and in others, play no obvious role in susceptibility. So, in this study, the authors generated organoids (small, 3D tissue cultures which mimic the larger organ they represent) from wild type mice and organoids from IL-22RA1 knockout mice (i.e. mice which don't express/produce IL-22RA1). To investigate IL-22RA1 signalling in the intestinal epithelium, they then compared the gene expression from WT and KO organoids stimulated with IL22.

Exercise 1

In this tutorial, you will be analysing 32 RNA samples, each of which has been sequenced on an Illumina HiSeq sequencing machine. There are four conditions: wild type cells with no treatment (WT_Ctrl), wild type cells stimulated with IL22 (WT_IL22), IL-22RA1 knockout cells with no treatment (KO_Ctrl) and IL-22RA1 knockout cells stimulated with IL22 (KO_IL22). There are four biological replicates and two technical replicates for each condition.

Condition Cell type Treatment Biological Replicates Technical Replicates
WT_Ctrl Wild type Control 4 2
WT_IL22 Wild type Control 4 2
KO_Ctrl IL-22RA1 knockout IL22 4 2
KO_IL22 IL-22RA1 knockout IL22 4 2

Note: this is a two factor study, but we must reduce it to a single factor study to run DEAGO

Move into the directory containing the tutorial data files.


In [ ]:
cd data

List the files and folders in the directory.


In [ ]:
ls -l

You should see a directory called counts. This contains the files which have our gene counts (number of reads assigned to each gene ~ gene abundance) for each of the samples, one file per sample. Let's count them.


In [ ]:
ls counts | wc -l

You will also see a file called targets.txt which tells us the relationship between sample and experimental condition.


In [ ]:
head targets.txt

There are two files with similar names, ensembl_mm10.tsv and ensembl_mm10_deago_formatted.tsv, which are the gene annotations. The first contains the annotations as downloaded from Ensembl BioMart, one line per annotation.


In [ ]:
head ensembl_mm10.tsv

The second contains those annotations converted for use with DEAGO, one line per gene.


In [ ]:
head ensembl_mm10_deago_formatted.tsv

These are the input files for DEAGO. We'll take a closer look in the next section of this tutorial.

What's next?

For a quick recap of what the tutorial covers and the software you will need, head back to the Introduction.

Otherwise, let's take a closer look at how to prepare input data.