Call ANY type of peak from reads stored in fq.gz files. Can call peaks for:
factor: Peak finding for single contact or focal ChIP-Seq experiments or DNase-Seq. This type of analysis is useful for transcription factors, and aims to identify the precise location of DNA-protein contact. This type of peak finding uses a FIXED width peak size, which is automatically estimated from the Tag Autocorrelation.
histone: Peak finding for broad regions of enrichment found in ChIP-Seq experiments for various histone marks. This analysis finds variable-width peaks.
super: Find Super Enhancers in your data
groseq: De novo transcript identification from strand specific GRO-Seq. This attempts to identify transcripts from nascent RNA sequencing reads.
tss Identification of promoter/TSS from 5'RNA-Seq/CAGE or 5'GRO-Seq data.
dnase Adjusted parameters for DNase-Seq peak finding.
mC DNA methylation analysis
In [7]:
#Define data folders, data, and sample name
#remember that fasta should have built bowtie2 indices
annotation="/input_dir/mm9"
tmp="/input_dir/"
input_dir="/input_dir/"
output_dir="/output_dir/"
input_fq_1="h3k27ac_mesc_5M.fq.gz"
input_fq_2=""
sample_name="mesc_h3k27ac_5M"
In [9]:
#align and sort / filter bam using Bowtie2 and "sensitive" settings
#paired end
if [ -z "$input_dir""$input_fq_2" ]; then
echo "Aligning using bowtie2 w/ paired end"
bowtie2 -p 8 --sensitive -x $annotation \
-1 $input_dir$input_fq_1 -2 $input_dir$input_fq_2 | samtools view -bS - > $tmp$sample_name".bam"
else
echo "Aligning using bowtie2 w/ single end"
bowtie2 -p 8 --sensitive -x $annotation \
-U $input_dir$input_fq_1 | samtools view -bS - > $tmp$sample_name".bam"
fi
#sort index and filter for canonical chromosomes
samtools sort -@ 10 $tmp$sample_name".bam" -o $tmp$sample_name"_sorted.bam"
samtools index "$tmp$sample_name""_sorted.bam"
samtools view $tmp$sample_name"_sorted.bam" -hu chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX > "$output_dir$sample_name""_filt_sorted.bam"
samtools index "$output_dir$sample_name""_filt_sorted.bam"
In [ ]:
#Take a peek at the bam
samtools view "$output_dir$sample_name""_filt_sorted.bam"| head
#get alignment stats
samtools flagstat "$output_dir$sample_name""_filt_sorted.bam"
samtools idxstats "$output_dir$sample_name""_filt_sorted.bam"
Homer is a pretty sweet collection of mainly perl scripts which can handle next-gen sequencing data to do things such as call peaks, below we use it to call different types of peaks, more information and a lot of knowledge is Homer peak calling tutorial
In [ ]:
makeTagDirectory $tmp/"$sample_name"_homer_tags "$output_dir$sample_name""_filt_sorted.bam"
In [3]:
findPeaks $tmp/"$sample_name"_homer_tags -style dnase -o auto
pos2bed.pl "$tmp""$sample_name"_homer_tags/peaks.txt > "$output_dir$sample_name""_open_peaks.bed"
In [ ]:
findPeaks $tmp/"$sample_name"_homer_tags -style super -typical $tmp/"$sample_name"_homer_tags/enh_peaks.txt -o auto
#convert regular enhancer peak file
pos2bed.pl $tmp/"$sample_name"_homer_tags/enh_peaks.txt > "$output_dir$sample_name""_enh_peaks.bed"
#save peaks as bed file in output directory
pos2bed.pl "$tmp""$sample_name"_homer_tags/peaks.txt > "$output_dir$sample_name""_se_peaks.bed"
In [4]:
#Switch to python2.7 for MACS2 and NucleoATAC
source activate py27
#Call acessibility peaks, and using those peaks define area around to resolve nucleosomes
macs2 callpeak -t "$output_dir$sample_name""_filt_sorted.bam" --nomodel --shift -37 -f BAMPE \
--extsize 73 --broad --keep-dup all -n $tmp"$sample_name""_MACS"
bedops --range 500:500 --everything $tmp"$sample_name""_MACS_peaks.broadPeak" | bedtools merge -i - > $tmp"$sample_name""_MACS_merged.bed"
nucleoatac run --bed $tmp"$sample_name""_MACS_merged.bed" --bam $output_dir"$sample_name""_filt_sorted.bam" --fasta "$annotation"".fa" --out $tmp"$sample_name""_nucATAC" --cores 8
source deactivate py27