Identifying differentially expressed (DE) genes

DEAGO uses DESeq2 to identify differentially expressed genes. For more information on DESeq2 methods and workflows go to the DESeq2 vignette.

Remember, each DEAGO analysis should be self-contained (i.e. in a new folder):



In [ ]:

    
mkdir de_analysis && cd de_analysis

Running a default DE analysis



In [ ]:

    
deago --build_config -c ../data/counts \
                         -t ../data/targets.txt

This will run a QC analysis and a DE analysis with the default settings, making several assumptions.

That you're using the expression count format which is the output of the Sanger pathogen pipelines (see Input files).

That your control condition is the one that comes first when alphabetically sorted. This is because, by default, R chooses the control condition based on alphabetical order. For example, if you have three conditions: controlTreatment, badTreatment, worseTreatment, it will assume that the control condition is badTreatment because it is first alphabetically.

That the adjusted p value FDR cutoff (alpha) you want to use is 0.05.

To define the control condition and FDR cutoff in DEAGO see Using different DE analysis parameters or for more information on what these mean, go to the DESeq2 vignette.

Results

All being well, the DEAGO should generate an analysis report: deago_markdown.html. If not, you will need to look into any errors or warnings in the log file (deago.rlog) or try running the commands from the R markdown file (deago_markdown.Rmd) in R to debug the issue.

In your results directory, you should also see a series of DE analysis results files, one per contrast (e.g. [contrast]_q[alpha].txt), which contain the unfiltered DESeq2 results (i.e. all genes are listed). The alpha (q) reference in the filename is just a reminder of the FDR cutoff you set at the start of the analysis.

For more information on the output files and what they contain see Output files.

Using different DE analysis parameters

You may want to set a different FDR cutoff, control condition or count data type. Here is an example command:



In [ ]:

    
deago --build_config -c ../data/counts \
                         -t ../data/targets.txt \
                         --count_type featurecounts \
                         --control Ctrl \
                         -q 0.01

--control tells DEAGO the condition you want to use as your reference or control (e.g. Ctrl). The value must be present in the condition column in your targets files

-q tells DEAGO the FDR cutoff to use in the analysis (e.g. 0.01)

--count_type tells DEAGO the format of the count data (e.g. featurecounts) setting the values for --count_column, --skip_lines, --gene_ids, --count_delim

As with your QC analysis, you can also use the --keep_images option to generate better quality versions of the DE plots (e.g. MA plots, volcano plots, venn diagrams...).

Including gene symbols

By default, DEAGO includes the gene identifiers in the counts file in the DE analysis results tables. However, it can sometimes be more useful to see the gene names which may be much more recognisable. To do this, we need an annotation file (see Input files and Preparing an annotation file).

To use an annotation with your DE analysis use the -a option:



In [ ]:

    
deago --build_config -c ../data/counts \
                         -t ../data/targets.txt \
                         -a ../data/ensembl_mm10_deago_formatted.tsv

For more information on the differences you'll see in your output when you include annotations see Output files.

Return to the index
Previous: Quality control
Next: GO term enrichment