Peakcalling Peak Stats

This notebook is for the analysis of outputs from the peakcalling pipeline relating to the quality of the peakcalling steps

There are severals stats that you want collected and graphed - you can click on the links below to find the jupyter notebooks where you can directly interact with the code or the html files that can be opened in your web browser.

Stats you should be interested in are:

Quality of Bam files for Peakcalling

how many reads input: notebook html
how many reads removed at each step (numbers and percentages): notebook html
how many reads left after filtering: notebook html
how many reads mapping to each chromosome before filtering?: notebook html
how many reads mapping to each chromosome after filtering?: notebook html
X:Y reads ratio: notebook html
inset size distribution after filtering for PE reads: notebook html
samtools flags - check how many reads are in categories they shouldn't be: notebook html
[picard stats - check how many reads are in categories they shouldn't be:

Peakcalling stats

Number of peaks called in each sample: notebook html
Number of reads in peaks: notebook html
Size distribution of the peaks
Location of peaks
correlation of peaks between samples
other things?
IDR stats
What peak lists are the best

This notebook takes the sqlite3 database created by CGAT peakcalling_pipeline.py and uses it for plotting the above statistics

It assumes a file directory of:

    location of database = project_folder/csvdb

    location of this notebook = project_folder/notebooks.dir/



In [2]:

    
!pwd
!date









    



/Users/charlotteg/Documents/7_BassonProj/Mar17
Sat 11 Mar 2017 19:50:55 GMT