Prelab: Motif discovery and regulatory analysis - I

Welcome to the first of three modules on DNA sequence motifs! To get started, please watch the lecture "Motif Discovery" on Mediasite (the lecture slides are also on Canvas), then answer the following questions.

1) What is the difference between a consensus sequence and a degenerate consensus sequence?

2) Define the term information content in the context of sequence motifs.

3) How does relative entropy account for the background frequencies of nucleotides?

4) What is the goal of a motif discovery approach like Gibbs sampling?

5) What challenge of motif discovery does using matched controls address, and how does it help solve this problem?

In the classwork, you will be working with the MEME software suite. This is an amazing, and extensive software package that allows you to do a wide range of motif discovery and enrichment analyses just about any time of sequence (DNA, RNA, or protein). There are a lot of tools available, so it is important to know what they are and what they do.

http://meme-suite.org/

The above website has a workflow and pipeline where each tool is listed. If you mouse over the tools, you will get a short description of what each tool does. It can be a lot to remember! So this might serve as a quick reference for you in the future

6) What question does MEME and GLAM2 help the user to address? What is the key difference between the tools?

7) What question does the tool MAST allow you to address? How does this differ from MEME?