Output files

For each successful analysis, DEAGO should produce:

  • Configuration file (deago.config - example)
    config file with key/value parameters defining the analysis (only generated when using --build_config)
  • R log (deago.rlog - example)
    log of the R output generated when converting the R markdown to HTML
  • Results directory (result_[timestamp] - example)
    unique results directory containing images (optional), DE analysis and GO analysis output files

Configuration file

DEAGO analyses produce a config file (deago.config) containing tab-delimited key/value pairs to define the parameters for each analysis.

For a featurecounts QC it will look something like this:

count_column    7
count_delim \t
count_type  featurecounts
counts_directory    /path/to/counts
gene_ids    Geneid
go_analysis 0
go_levels   all
keep_images 0
qc_only 1
qvalue  0.05
results_directory   /path/to/qc_results
skip_lines  1
targets_file    /path/to/targets.txt

DEAGO generates this file when the --build_config option is used and it is built using the command line parameters you provide:


In [ ]:
deago --build_config -c data/counts -t data/targets.txt

Config files are useful for debugging or if you want to re-run an analysis:


In [ ]:
deago --config deago.config

R log file

DEAGO produces a log file (deago.rlog) which contains the R output generated when converting the R markdown (deago_markdown.Rmd) into the HTML report (deago_markdown.html).

The R log file is the first place to look if the analysis didn't generate a HTML report and didn't show any command line errors.

For example:

Quitting from lines 206-208 (deago_markdown.Rmd) 
Error in .local(.Object, ...) : allGenes must be a factor with 2 levels
Calls: <Anonymous> ... prepareGOdata -> new -> initialize -> initialize -> .local
In addition: Warning messages:
1: Removed 778 rows containing non-finite values (stat_density). 
2: Removed 778 rows containing non-finite values (stat_density). 

Execution halted

R markdown

DEAGO uses markdown templates from Bio-Deago to generate an R markdown file (deago_markdown.Rmd) which is then used to run the analysis.

The R markdown file allows you to modify the analysis if you need to (e.g. if there is a batch effect). You can then generate a new HTML report from the modified markdown file. See Bespoke analyses for more information.

Here is an example of a section from deago_markdown.Rmd which is used to generate a contrast summary table:

...

The summary table below contains the total number of differentially expressed genes and the number of up-regulated
(lfc > 2) and down-regulated (lfc < -2) genes for each contrast (adjusted p-value < 0.05).

```{r contrasts, echo=TRUE}
contrasts <- getContrasts(dds, parameters)
writeContrasts(dds, contrasts, resultsDir)
```

```{r contrast_summary, echo=TRUE}
contrast_summary <- contrastSummary(contrasts, parameters)
datatable(contrast_summary, options = list(dom = 't', colnames=c('contrast', 'up-regulated','down-
regulated','total'), columnDefs = list(list(className = 'dt-center', targets = 1:ncol(contrast_summary)))))
```

...

Click here for more information on R markdown files and their format.


HTML report

By default, the HTML analysis report generated by DEAGO will be written to deago_markdown.html.

All DEAGO reports will contain the following sections:

  • Introduction - an overview of the report
  • Pipeline configuration - contents of the configuration file used (deago.config)
  • Imported data summary - contents of the sample/condition mapping (targets) file
  • DESeq2 analysis - commands used to set up the DESeq2 object and analysis
  • QC plots - subsections for each QC type generated (e.g. Total read counts per sample, Principal component analysis (PCA)...)
  • R session - summary of R packages used and their versions (useful for debugging)

There is a panel on the left which allows you to conveniently skip to each section of the report.

DE analyses only

Differential expression analyses will generate an extra section called Pairwise contrasts. This section will have a summary of the number of up-regulated and down-regulated genes per contrast.

When there are 2-4 contrasts, a Venn diagram will be generated showing the overlap of DE genes between contrasts.

The Pairwise contrasts section will contain several subsections, one per contrast. Each contrast subsection contains an MA and volcano plot. The top 5 up- and down-regulated genes are labelled. By default the labels are the gene identifier but, if an annotation file with gene symbols is used, the labels will be the gene symbols instead.

All analysis results tables in the DEAGO reports are interactive. For each contrast there will be a results table for differentially expressed genes which can be searched or filtered.

The DE table is restricted by an FDR cutoff (q < 0.01) and a log2 fold change threshold (>= 2 or >= 2) to keep the report compact. For the full, unfiltered results tables you should look at the contrast files which are written to the results directory.

In this example, the gene identifier is blue. This indicates that the identifier is from Ensembl and if clicked will open a new tab in the web browser which is the latest Ensembl page for that stable gene ID. An annotation has been included with this analysis, so you will see a symbol column which contains the gene symbol(s) associated with each gene.

To search the whole table, use the search box at the top right.

To search or filter an individual column, use the search/filter box at the top of the column. The table can also be ordered by clicking on the column headers.

GO term enrichment analyses only

GO term enrichment analysis reports include all of the sections already mentioned. In addition, the GO analyses generate subsections in the Pairwise contrasts containing the GO term enrichment results tables.

Interactive tables will be generated for both biological processes (BP) and molecular functions (MF). In addition to the BP and MF tables for GO analyses of all DE genes, there are also individual tables for up-regulated and down-regulated genes for both BP and MF.

Here is an example of a GO results table:

Where gene symbols are included in the annotation, DEAGO will include the DE genes associated with each GO term in the symbol column.


Results directory

For each new analysis, DEAGO creates a timestamped results directory (e.g. result_20180314093805).

Images

When the --keep_images option is used, DEAGO will create a folder called images in the timestamped results directory. The better quality QC or DE plots can be found in this images folder.

DE contrast results tables

If a DE analysis was performed, DEAGO will write the DESeq2 contrast results tables (one file per contrast) to the timestamped results directory.

The files are named [contrast]_q[alpha].txt where alpha is the FDR cutoff that was set at the start of the analysis.

Here is an example of a contrast DE results file:

geneID symbol ko_ctrl_1.1 ko_ctrl_1.2 ... wt_il22_4.1 wt_il22_4.2 baseMean log2FoldChange lfcSE stat pvalue padj
ENSMUSG00000000001 Gnai3 10999.217 11064.515 ... 13783.985 14419.202 11898.0777 0.150 0.059 2.520 0.011 0.031
ENSMUSG00000000003 Pbsn 0 0 ... 0 0 0 NA NA NA NA NA
ENSMUSG00000000028 Cdc45 228.360 260.280 ... 81.607 89.288 175.293 -0.037 0.186 -0.198 0.842 0.901

The first column will be the gene identifier and the second will be the gene symbol if an annotation file was used in the analysis. The next columns are the DESeq2 normalised counts for each sample. Finally, there are the DESeq2 contrast results columns: baseMean, log2FoldChange, lfcSE, stat, pvalue, padj.

The DE results files are not filtered and contain the results for all of the genes.

GO term enrichment results tables

If a GO term enrichment analysis was performed, GO tables will be written to the timestamped results directory.

Results files will be generated for each contrast BP and MF analysis performed using all of the identified DE genes. Additional files will also be produced for the separate GO analyses performed using the up-regulated and down-regulated genes.

Filename GO level Type of analysis
[contrast]_BP.tsv BP all DE genes
[contrast]_BP_up.tsv BP up-regulated genes only
[contrast]_BP_down.tsv BP down-regulated genes only
[contrast]_MF.tsv MF all DE genes
[contrast]_MF_up.tsv MF up-regulated genes only
[contrast]_MF_down.tsv MF down-regulated genes only

Each GO results table report only the top 30 significantly enriched GO terms. A example of the contents of a GO results file would be:

GO.ID Term Annotated Significant Expected Rank in classic Fisher classic Fisher elim Fisher weight01 Fisher identifiers symbol
GO:0050662 coenzyme binding 241 53 38.21 62 0.00726 0.10007 0.01254 ENSMUSG00000000214, ENSMUSG00000000399, ENSMUSG00000000811, ... Acad8, Acadm, Acbd4, ...
GO:0008047 enzyme activator activity 267 53 42.33 132 0.04623 0.04623 0.01425 ENSMUSG00000000049, ENSMUSG00000000296, ENSMUSG00000000489, ... Abr, Acrbp, Ahsa2, ...
GO:0022829 wide pore channel activity 18 7 2.85 83 0.01585 0.01585 0.01473 ENSMUSG00000002984, ENSMUSG00000005674, ENSMUSG00000008892, ... Aqp11, Gjb1, Gjb2, ...

If gene symbols were provided in the annotation file, the symbols (and identifiers) associated with the GO terms will be reported in the last two columns.

Return to the index
Previous: Input files
Next: Preparing an annotation file