Finding enriched areas using MACS

MACS2 stands for model-based analysis of ChIP-Seq. It was designed for identifying transcription factor binding sites. MACS2 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. MACS2 can be easily used for ChIP-Seq data alone, or with a control sample to increase specificity.

If you are not in there already, change into the data directory.


In [ ]:
cd data

Consult the MACS2 help file to see the options and parameters:


In [ ]:
macs2 --help

In [ ]:
macs2 callpeak --help

The input for MACS2 can be in ELAND, BED, SAM, BAM or BOWTIE formats (you just have to set the --format flag).

Options that you will have to use include:

-t to indicate the input ChIP file

-c to indicate the name of the control file  

--format the tag file format 
(if this option is not set MACS automatically detects which format the file is) 

--name to set the name of the output files  

--gsize to set the mappable genome size 
(with the read length we have, 70% of the genome is a fair estimation)

--call-summits to detect all subpeaks in each enriched region and return their summits

--pvalue the P-value cutoff for peak detection.

Now run macs using the following command:


In [ ]:
macs2 callpeak -t PAX5.sorted.bam -c Control.sorted.bam \
--format BAM --name PAX5 --gsize 138000000 --pvalue 1e-3 \
--call-summits

MACS2 generates its peak files in a file format called .narrowPeak file. This is a BED format describing genomic locations. Many types of genomic data can be represented as (sets of) genomic regions. In the following section we will look into the BED format in more detail, and we will perform simple operations on genomic interval data.


What's next?

You can head back to aligning the control sample to the genome or continue on to file formats.