For each successful analysis, DEAGO should produce:
deago.config
- example)--build_config
) deago_markdown.Rmd
- example)deago_markdown.html
- example)result_[timestamp]
- example)count_column 7
count_delim \t
count_type featurecounts
counts_directory /path/to/counts
gene_ids Geneid
go_analysis 0
go_levels all
keep_images 0
qc_only 1
qvalue 0.05
results_directory /path/to/qc_results
skip_lines 1
targets_file /path/to/targets.txt
DEAGO generates this file when the --build_config
option is used and it is built using the command line parameters you provide:
In [ ]:
deago --build_config -c data/counts -t data/targets.txt
Config files are useful for debugging or if you want to re-run an analysis:
In [ ]:
deago --config deago.config
DEAGO produces a log file (deago.rlog
) which contains the R output generated when converting the R markdown (deago_markdown.Rmd
) into the HTML report (deago_markdown.html
).
The R log file is the first place to look if the analysis didn't generate a HTML report and didn't show any command line errors.
For example:
Quitting from lines 206-208 (deago_markdown.Rmd)
Error in .local(.Object, ...) : allGenes must be a factor with 2 levels
Calls: <Anonymous> ... prepareGOdata -> new -> initialize -> initialize -> .local
In addition: Warning messages:
1: Removed 778 rows containing non-finite values (stat_density).
2: Removed 778 rows containing non-finite values (stat_density).
Execution halted
DEAGO uses markdown templates from Bio-Deago to generate an R markdown file (deago_markdown.Rmd
) which is then used to run the analysis.
The R markdown file allows you to modify the analysis if you need to (e.g. if there is a batch effect). You can then generate a new HTML report from the modified markdown file. See Bespoke analyses for more information.
Here is an example of a section from deago_markdown.Rmd
which is used to generate a contrast summary table:
...
The summary table below contains the total number of differentially expressed genes and the number of up-regulated
(lfc > 2) and down-regulated (lfc < -2) genes for each contrast (adjusted p-value < 0.05).
```{r contrasts, echo=TRUE}
contrasts <- getContrasts(dds, parameters)
writeContrasts(dds, contrasts, resultsDir)
```
```{r contrast_summary, echo=TRUE}
contrast_summary <- contrastSummary(contrasts, parameters)
datatable(contrast_summary, options = list(dom = 't', colnames=c('contrast', 'up-regulated','down-
regulated','total'), columnDefs = list(list(className = 'dt-center', targets = 1:ncol(contrast_summary)))))
```
...
Click here for more information on R markdown files and their format.
By default, the HTML analysis report generated by DEAGO will be written to deago_markdown.html
.
All DEAGO reports will contain the following sections:
Introduction
- an overview of the reportPipeline configuration
- contents of the configuration file used (deago.config
)Imported data summary
- contents of the sample/condition mapping (targets) fileDESeq2 analysis
- commands used to set up the DESeq2 object and analysisQC plots
- subsections for each QC type generated (e.g. Total read counts per sample, Principal component analysis (PCA)...)R session
- summary of R packages used and their versions (useful for debugging)There is a panel on the left which allows you to conveniently skip to each section of the report.
Differential expression analyses will generate an extra section called Pairwise contrasts
. This section will have a summary of the number of up-regulated and down-regulated genes per contrast.
When there are 2-4 contrasts, a Venn diagram will be generated showing the overlap of DE genes between contrasts.
The Pairwise contrasts
section will contain several subsections, one per contrast. Each contrast subsection contains an MA and volcano plot. The top 5 up- and down-regulated genes are labelled. By default the labels are the gene identifier but, if an annotation file with gene symbols is used, the labels will be the gene symbols instead.
All analysis results tables in the DEAGO reports are interactive. For each contrast there will be a results table for differentially expressed genes which can be searched or filtered.
The DE table is restricted by an FDR cutoff (q < 0.01) and a log2 fold change threshold (>= 2 or >= 2) to keep the report compact. For the full, unfiltered results tables you should look at the contrast files which are written to the results directory.
In this example, the gene identifier is blue. This indicates that the identifier is from Ensembl and if clicked will open a new tab in the web browser which is the latest Ensembl page for that stable gene ID. An annotation has been included with this analysis, so you will see a symbol
column which contains the gene symbol(s) associated with each gene.
To search the whole table, use the search box at the top right.
To search or filter an individual column, use the search/filter box at the top of the column. The table can also be ordered by clicking on the column headers.
GO term enrichment analysis reports include all of the sections already mentioned. In addition, the GO analyses generate subsections in the Pairwise contrasts
containing the GO term enrichment results tables.
Interactive tables will be generated for both biological processes (BP) and molecular functions (MF). In addition to the BP and MF tables for GO analyses of all DE genes, there are also individual tables for up-regulated and down-regulated genes for both BP and MF.
Here is an example of a GO results table:
Where gene symbols are included in the annotation, DEAGO will include the DE genes associated with each GO term in the symbol
column.
For each new analysis, DEAGO creates a timestamped results directory (e.g. result_20180314093805).
When the --keep_images
option is used, DEAGO will create a folder called images
in the timestamped results directory. The better quality QC or DE plots can be found in this images folder.
If a DE analysis was performed, DEAGO will write the DESeq2 contrast results tables (one file per contrast) to the timestamped results directory.
The files are named [contrast]_q[alpha].txt
where alpha
is the FDR cutoff that was set at the start of the analysis.
Here is an example of a contrast DE results file:
geneID | symbol | ko_ctrl_1.1 | ko_ctrl_1.2 | ... | wt_il22_4.1 | wt_il22_4.2 | baseMean | log2FoldChange | lfcSE | stat | pvalue | padj |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ENSMUSG00000000001 | Gnai3 | 10999.217 | 11064.515 | ... | 13783.985 | 14419.202 | 11898.0777 | 0.150 | 0.059 | 2.520 | 0.011 | 0.031 |
ENSMUSG00000000003 | Pbsn | 0 | 0 | ... | 0 | 0 | 0 | NA | NA | NA | NA | NA |
ENSMUSG00000000028 | Cdc45 | 228.360 | 260.280 | ... | 81.607 | 89.288 | 175.293 | -0.037 | 0.186 | -0.198 | 0.842 | 0.901 |
The first column will be the gene identifier and the second will be the gene symbol if an annotation file was used in the analysis. The next columns are the DESeq2 normalised counts for each sample. Finally, there are the DESeq2 contrast results columns: baseMean, log2FoldChange, lfcSE, stat, pvalue, padj.
The DE results files are not filtered and contain the results for all of the genes.
If a GO term enrichment analysis was performed, GO tables will be written to the timestamped results directory.
Results files will be generated for each contrast BP
and MF
analysis performed using all of the identified DE genes. Additional files will also be produced for the separate GO analyses performed using the up-regulated and down-regulated genes.
Filename | GO level | Type of analysis |
---|---|---|
[contrast]_BP.tsv | BP | all DE genes |
[contrast]_BP_up.tsv | BP | up-regulated genes only |
[contrast]_BP_down.tsv | BP | down-regulated genes only |
[contrast]_MF.tsv | MF | all DE genes |
[contrast]_MF_up.tsv | MF | up-regulated genes only |
[contrast]_MF_down.tsv | MF | down-regulated genes only |
Each GO results table report only the top 30 significantly enriched GO terms. A example of the contents of a GO results file would be:
GO.ID | Term | Annotated | Significant | Expected | Rank in classic Fisher | classic Fisher | elim Fisher | weight01 Fisher | identifiers | symbol |
---|---|---|---|---|---|---|---|---|---|---|
GO:0050662 | coenzyme binding | 241 | 53 | 38.21 | 62 | 0.00726 | 0.10007 | 0.01254 | ENSMUSG00000000214, ENSMUSG00000000399, ENSMUSG00000000811, ... | Acad8, Acadm, Acbd4, ... |
GO:0008047 | enzyme activator activity | 267 | 53 | 42.33 | 132 | 0.04623 | 0.04623 | 0.01425 | ENSMUSG00000000049, ENSMUSG00000000296, ENSMUSG00000000489, ... | Abr, Acrbp, Ahsa2, ... |
GO:0022829 | wide pore channel activity | 18 | 7 | 2.85 | 83 | 0.01585 | 0.01585 | 0.01473 | ENSMUSG00000002984, ENSMUSG00000005674, ENSMUSG00000008892, ... | Aqp11, Gjb1, Gjb2, ... |
If gene symbols were provided in the annotation file, the symbols (and identifiers) associated with the GO terms will be reported in the last two columns.
Return to the index
Previous: Input files
Next: Preparing an annotation file