Quality control (QC)

Before you run your differential expression analysis, it's a good idea to first check the quality of your data. This will give you an idea of whether there are any outliers you want to remove or batch effects you need to account for.

Remember, each DEAGO analysis should be self-contained (i.e. in a new folder):


In [ ]:
mkdir qc && cd qc

Running a default QC analysis

To run DEAGO in QC mode:


In [ ]:
deago --build_config -c ../data/counts \
                         -t ../data/targets.txt \
                         --qc

A summary of the options used and what they mean:

  • -c tells DEAGO the location (relative or absolute) of the folder containing your count files
  • -t tells DEAGO the location (relative or absolute) of the sample/condition mapping file
  • --qc tells DEAGO to only run the QC analysis

This assumes that you are using the expression count format which is the output of the Sanger pathogen pipelines (see Input files).

Results

All being well, the DEAGO should generate an analysis report: deago_markdown.html. If not, you will need to look into any errors or warnings in the log file (deago.rlog) or try running the commands from the R markdown file (deago_markdown.Rmd) in R to debug the issue.

For more information on the output files and what they contain see Output files.


Running a QC analysis with featureCounts files

To run DEAGO in QC mode with featureCounts files:


In [ ]:
deago --build_config -c ../data/counts \
                         -t ../data/targets.txt \
                         --count_type featurecounts \
                         --qc

A summary of the options used and what they mean:

  • -c tells DEAGO the location (relative or absolute) of the folder containing your count files
  • -t tells DEAGO the location (relative or absolute) of the sample/condition mapping file
  • --count_type tells DEAGO the format of the count data (e.g. featurecounts) setting the values for --count_column, --skip_lines, --gene_ids, --count_delim
  • --qc tells DEAGO to only run the QC analysis

Getting presentation quality QC plots

The QC plots that are generated are small and embedded in the QC report so that it is easy to share. However, this means they aren't ideal for presentations, posters or publications.

To get better quality images:


In [ ]:
deago --build_config -c ../data/counts \
                         -t ../data/targets.txt \
                         --qc \
                         --keep_images

Using the --keep_images option in the analysis creates an images folder in your results directory that contains larger, better quality versions of all of the plots generated as part of the analysis.

For more information on the output files and what they contain see Output files.

Return to the index
Previous: Preparing an annotation file
Next: Identifying DE genes