DEAGO

Introduction

DEAGO generates user-friendly quality control (QC) and analysis reports for RNA-Seq datasets. These interactive reports can be opened simply in a web browser and provide a simple way of sharing information about your analysis with collaborators.

DEAGO uses a Perl wrapper module (Bio-Deago) to generate a HTML report from markdown templates by calling functions from an R package (deago).

There are three main steps to each analysis:

  • Generating QC plots (ggplot2)
  • Identifying differentially expressed (DE) genes (DESeq2)
  • Performing gene ontology (GO) term enrichment analyses (topGO)

User guide

The DEAGO user guide is split into several sections:

  1. Input files
  2. Output files
  3. Preparing an annotation file
  4. Quality control (QC)
  5. Identifying DE genes
  6. GO term enrichment
  7. Bespoke analyses

Tutorial

There is a short DEAGO tutorial which will give you an opportunity to look at example input data, run some DEAGO analyses and look at the output files which are generated.

Learning outcomes

By the end of the tutorial you can expect to be able to:

  • Understand what input files are required for DEAGO and what format they should have
  • Generate a QC report for the RNA-seq data
  • Generate a DE report that can be used to identify differentially expressed genes
  • Generate a GO term enrichment analysis report

Running the commands in this user guide

You can run the commands in this user guide either directly from the Jupyter notebook (if using Jupyter), or by typing the commands in your terminal window.

Running commands on Jupyter

If you are using Jupyter, command cells (like the one below) can be run by selecting the cell and clicking Cell -> Run from the menu above or using ctrl Enter to run the command. Let's give this a try by printing our working directory using the pwd command and listing the files within it. Run the commands in the two cells below.


In [ ]:
pwd

In [ ]:
ls -l

Running commands in the terminal

You can also follow this guide by typing all the commands you see into a terminal window. This is similar to the "Command Prompt" window on MS Windows systems, which allows the user to type DOS commands to manage files.

To get started, select the cell below with the mouse and then either press control and enter or choose Cell -> Run in the menu at the top of the page.


In [ ]:
echo cd $PWD

Now open a new terminal on your computer and type the command that was output by the previous cell followed by the enter key. The command will look similar to this:

cd /home/manager/pathogen-informatics-training/Notebooks/DEAGO/

Now you can follow the instructions in the guide from here.

Let’s get started!

This user guide assumes that you have deago and Bio-Deago installed on your computer. For download and installation instructions, please see:

Note: For Sanger pathogens users, these are already available on pcs5 and farm3. We also have a separate Sanger pathogen users page which will walk you through preparing your input data from the Sanger pathogen pipelines.

To check that you have installed the software correctly, you can run the following command:


In [ ]:
deago -h

This should return the following help message:

Usage: deago [options]
RNA-Seq differential expression qc and analysis

Main options:
  --output_directory (-o) output directory [.]
  --convert_annotation    convert annotation for use with deago (requires -a)
  --annotation_delim      annotation file delimiter [\t]
  --build_config          build configuration file from command line arguments (see configuration options)
  --config_file           configuration filename or output filename for configuration file if building [./deago.config]
  --markdown_file         output filename for markdown file [./deago_markdown.Rmd]
  --html_file             output filename for html file [./deago_markdown.html]
  -v                      verbose output to STDOUT
  -w                      print version and exit
  -h                      print help message and exit

Configuration options (required):
  -c STR          directory containing count files (absolute path)
  -t STR          targets filename (absolute path)

 Configuration options (optional):
  -r STR          results directory [current working directory]
  -a STR          annotation filename (absolute path)
  -q NUM          qvalue (DESeq2) [0.05]
  --control       name of control condition (must be present in targets file)
  --keep_images   keep images used in report
  --qc            QC only
  --go            GO term enrichment
  --go_levels     BP only, MF only or all [BP|MF|all]
  --count_type    type of count file [expression|featurecounts]
  --count_column  number of column containing count values
  --skip_lines    number of lines to skip in count file
  --count_delim   count file delimiter
  --gene_ids      name of column containing gene ids

DEAGO takes in a configuration file containing key/value pairs [default: ./deago.config]. You can
use your own configuration file with --config_file or specify parameters and let DEAGO build a
configuration file with --build_config (and --config_file if you don't want the default
configuration filename). For more information on configuration parameters run: build_deago_config -h.

DEAGO will then build a master R markdown file (--markdown_file if you don't want the default
markdown filename) from templates which utilize the companion DEAGO R package and the key/value
pairs set out in the configuration file. The R markdown will be processed and used to generate a
HTML report (--html_file if you don't want the default html filename).

To use custom gene names and for GO term enrichment (--go) and annotation file must be provided
(-a). Annotations downloaded from BioMart or those in a similar format can be converted for use
with DEAGO.  For more information run: mart_to_deago -h.

Next: Input files