GO term enrichment

DEAGO uses topGO to identify significantly enriched GO terms. For more information on the topGO methods and workflows go to the topGO vignette.

Remember, each DEAGO analysis should be self-contained (i.e. in a new folder):


In [ ]:
mkdir go_analysis && cd go_analysis

Running a default GO analysis


In [ ]:
deago --build_config -c ../data/counts \
                         -t ../data/targets.txt \
                         -a ../data/ensembl_mm10_deago_formatted.tsv \
                         --go

This will run a QC analysis, a DE analysis and a GO term enrichment analysis with the default settings, making several assumptions.

  • That you're using the expression count format which is the output of the Sanger pathogen pipelines (see Input files).
  • That your control condition is the one that comes first when alphabetically sorted. This is because, by default, R chooses the control condition based on alphabetical order. For example, if you have three conditions: controlTreatment, badTreatment, worseTreatment, it will assume that the control condition is badTreatment because it is first alphabetically.
  • That the adjusted p value FDR cutoff (alpha) you want to use is 0.05.

To define the control condition and FDR cutoff in DEAGO see Identifying DE genes or for more information on what these mean, go to the DESeq2 vignette.

Results

All being well, the DEAGO should generate an analysis report: deago_markdown.html. If not, you will need to look into any errors or warnings in the log file (deago.rlog) or try running the commands from the R markdown file (deago_markdown.Rmd) in R to debug the issue.

In your results directory, you should also see, in addition to the DE analysis results files mentioned in Identifying DE genes, a series of GO analysis results files. These contain the results for the top 30 significantly enriched GO terms from each analysis. For more information on the GO results file names and their contents see Output files.


Including gene symbols

By default, DEAGO includes the gene identifiers from the counts file associated with each GO term in the GO analysis results tables. However, it can sometimes be more useful to see the gene names which may be much more recognisable. To do this, we need a gene symbol column in the annotation file (see Input files and Preparing an annotation file).

For more information on the differences you'll see in your output when you include annotations see Output files.

Return to the index
Previous: Identifying DE genes
Next: Bespoke analyses