In-Class: Mouse ENCODE & Advanced bedtools

Part I: Mouse on the Genome Browser

Like for humans, you can view mouse ENCODE data using the UCSC genome browser. Follow these steps to look at a gene in the browser, and then answer the following questions

1) Go to the ENCODE website: https://genome.ucsc.edu/ENCODE/

2) Under the 'Mouse Data at UCSC' header, click on 'Genome Browser(mm9)'

3) View the region chr15:35100000-35120000

4) Scroll down to the 'Expression and Regulation' Section and change several of the ENCODE data tracks, including several of the RNA-seq ones, on.

  1. What evidence is there that this region is a gene? What evidence is there that it is not a gene? For hints on what to look for, look at the Enhancer paper from the pre-lab

2) Now click the 'zoom-out 3x' button. Is there stronger evidence of a gene in this larger region? If so, what is it? Discuss at least 2 RNA-seq and 2 chip-seq data sets.

Part II: Comparing marks across tissues and develpment

You want to study how H3K9ac changes throughout development and tissue types. Find, download and unzip the following data sets (in bed broadPeak format) from the mouse encode project (http://www.mouseencode.org/) using the 'data search' link on the homepage. If you are having trouble finding the data, search for the experiment ID and click on the external GEO link.

ChIP-seq of heart (Mus musculus, adult 8 week), H3K9ac, Bing Ren, UCSD, experiment ENCSR000CEM

ChIP-seq of liver (Mus musculus, adult 8 week), H3K9ac, Bing Ren, UCSD, experiment ENCSR000CEQ

1) Using the intersect function combined with the -v option in bedtools (http://bedtools.readthedocs.org/en/latest/content/bedtools-suite.html), how many regions are there that have evidence of H3K9ac in heart and not liver? What about liver and not heart? Include the commands you used to generate your numbers.

Now, download the following replicated peaks dataset (this will be in mm10, which is okay):

ChIP-seq of heart (Mus musculus, postnatal 0 day mouse heart), H3K9ac, Bing Ren, UCSD, experiment ENCSR519DNE

2) What commands would you use to check whether there are more changes in H3K9ac as the mouse develops or between different tissue types?

3) Now run your commands. What is your result? Is this suprising to you?

Part III: Finding Enhancers

From reading the paper in the pre-lab, you know that H3K27ac is a mark of active enhancers. You have a list of genes that you think may be active in liver cells. This list is in liver_genes.bed. You now want to find which genes have active enhancers by seeing which ones are near H3K27ac marks. Download the broad peak file from experiment ENCSR000CDH (from Mouse liver, adult). Using unix commands and bedtools' intersect command in combination with the -u flag answer the following questions.

1) How many total genes are in the file liver_genes.bed? How many regions are in the broad peak file?

2) How many genes overlap at least one H3K27ac mark?

3) However, you know that often H3K27ac occurs upstream of the active gene, so you don't just want to look for genes with H3K27ac marks in the actual gene, but also for genes with marks upstream of the gene. Use the window command in bedtools to increase the size of each genic region in livergenes.bed by 10,000 bases in both directions and perform an overlap analysis. Try running the command with and without the -u flag. What are the differences in the output bed files? How many genes overlap H3K27ac now? Save these genes to Ac_genes.bed.


In [ ]:

4) However, you know that H3K27ac alone is not enough to declare that an enhancer is active. H3K4me3 is also associated with active enhancers. Download the broad peak file from experiment ENCSR000CAP in the Mouse Encode project. Use the window function to find how many genes (+10,000 basepairs in either direction) overlap both H3K4me3 and H3K27ac? Save this list of genes as MeAc_genes.bed. Try running the command with and without the -u flag. What are the differences in the output bed files?

Homework: Finding Active Regions

You also know that H3K9ac is a sign of repressed chromatin. Download the replicated peaks file for mouse H3K9ac data (experiment ENCSR889MYY), then with the intersect command and -v flag, count the number of genes from MeAc_genes.bed that do NOT also have a H3K9ac mark.

1) What is the bedtools command to do this? (3 points)

2) How many genes from MeAc_genes.bed do not have a H3K9ac mark? (2 points)

3) Are you convinced that these genes are truly active genes in embroynic mouse liver? Do you think that there may be some genes in liver_genes.txt that are expressed that we missed? Why or why not? What complexities are we overlooking? (5 points)

Hint: Think about issues such as where different marks should be located relative to the gene (from the Enhancer paper in the prelab), whether bedtool's window command was the ideal one to use, and cell type heterogeneity.