Once you have your list of differentially expressed (DE) genes, the next step is often to ask whether the DE genes associated with a particular biological process or function. To do this, we can perform a gene ontology (GO) term enrichment analysis. A gene ontology (GO) is a controlled vocabulary which describes gene products from in any organism (i.e. species agnostic). These descriptions are broken down into three categories:
molecular function (MF) what the gene product does
biological process (BP) why the gene product does this
cell component (CC) where the gene product acts
DEAGO uses topGO to compare a target gene list (e.g. your DE genes) to the background (all genes) and identifies GO terms which are significantly enriched or over-represented in your gene list.
The objectives of this part of the tutorial are:
We will need to give DEAGO three bits of information:
To simplest command to run a GO analysis with DEAGO the command would be:
deago -c <counts_directory> -t <targets file> -a <annotation file> --go
To indicate that we want to run a GO analysis, we use the --go
option. Notice here that we must provide a formatted annotation file using the -a
option as we need to know the GO terms assoicated with each gene for the analysis.
However, as before, our count files were generated by featureCounts for this tutorial, so we need to also tell DEAGO the count format with the --count_type
option:
deago -c <counts_directory> -t <targets file> -a <annotation file> --go \
--count_type featurecounts
We're also using the --control
option, as before, which tells DEAGO the condition you want to use as your reference or control, in this case WT_Ctrl.
deago -c <counts_directory> -t <targets file> -a <annotation file> --go\
--count_type featurecounts --control <control>
Once your GO analysis has finished, you should see several new files and directories:
deago.config
deago.rlog
deago_markdown.Rmd
deago_markdown.html
results_<timestammp>
DEAGO also writes the GO term enrichment analysis results tables containing the top 30 significantly enriched GO terms to individual files, split by GO term level (MF and BP), in your timestamped results directory. If possible, there will be three GO tables for each GO term level containing the results for analyses using all genes, up-regulated genes only and down-regulated genes
The output file we're interested in is deago_markdown.html which is your GO analysis report. Go ahead and open it in a web browser (e.g. Chrome, Firefox, IE, Safari...). You can do this by going to "File -> Open" in the top navigation or (if you have Firefox installed, use the command:
In [ ]:
firefox deago_markdown.html
You'll already be familiar with the QC plots
and Pairwise contrasts
sections. Click on Pairwise contrasts
and then go to the contrast subsection for wt_il22_vs_ko_il22
. You should now see six new subsections, 6.7.2 - 6.7.7, which contain your GO term enrichment analysis results for that contrast.
The first three subsections have the prefix GO term enrichment - BP
and contain the results for the Biological Processes (BP). The remaining three subsections have the prefix GO term enrichment - MF
and contain the results for the Molecular Functions (MF).
For each GO level (BP or MF) there is a subsection containing the results from GO term enrichment analyses which were run using all DE genes, up-regulated genes only and down-regulated genes only.
Let's take a look at the BP (upregulated genes only) table for the contrast wt_il22_vs_ko_il22 (subsection 6.7.3). All of the results tables are interactive. Despite performing a single-factor analyis, we can still find the top GO description for up-regulated genes inflammatory response in our results table (GO:0006954).
The paper also mentions Prdm1 which is upregulated by IL-22RA1 signalling and is a susceptibility gene in to inflammatory bowel disease. Let's see whether it's associated with any of the enriched GO terms. Type "prdm1" in the top right search box to search the whole table.
We can see that one GO term matches: GO:0031663 (lipopolysaccharide-mediated signaling pathway).
Scrolling to the right, we can see how the search found this match. For each GO term, the DE genes which are associated with that term are also shown in the table.
In [ ]:
cd data
Each DEAGO analysis should be self-contained, so let's create a new directory for our GO analysis.
In [ ]:
mkdir go_analysis
In [ ]:
cd go_analysis
Now, let's get our GO report.
In [ ]:
deago --build_config -c ../counts -t ../targets.txt \
--count_type featurecounts \
-a ../ensembl_mm10_deago_formatted.tsv \
--control WT_Ctrl -go
In Figure 5C (below), the authors have highlighted four genes: Fut2, Sec1, Fut8 and B4galt1 which are associated with glycosylation.
Answer the following question for all four genes:
Q1: Which biological processes is this upregulated gene associated with?
If you want a recap of input file preparation, head back to running a differential expression (DE) analysis.
Otherwise, let's continue on to troubleshooting.