Notebooks used to create figures shown in "Sequana Coverage: Detection and Characterization of Genome Variations using Running Median and Mixture Models" Desvillechabrol D., Bouchier C., Kennedy S., Cokelaer T.

This notebook reproduces all figures in the paper using sequana 0.6.5

The required data

This notebooks requires as input 3 BED files for the 3 organisms considered. In addition, a genbank and a reference genomes are required (see below).

3 Associated notebooks can be used to reproduce or get the genbank and reference genome files.

All data are posted (with FastQ files) in https://www.synapse.org/#!Synapse:syn10638358/wiki/465309 (you need a login), so we also provide the BED, genbank and genome reference files on this github repository https://github.com/sequana/resources/tree/master/coverage and tht 5 required files can be downloaded and uncompressed as follows:



In [1]:

    
# UNCOMMENT TO DOWNLOAD THE DATA that should be in this directory anyway
#%%bash
#export GITHUB=https://github.com/sequana/resources/raw/master/coverage/
#wget $GITHUB/JB409847.filtered.bed.bz2
#wget $GITHUB/FN433596.filtered.bed.bz2
#wget $GITHUB/S_pombe.filtered.bed.bz2
#wget $GITHUB/FN433596.fasta.bz2
#wget $GITHUB/S_pombe.gbk.bz2
#bunzip2 S_pombe.gbk.bz2
#bunzip2 FN433596.fasta.bz2
#bunzip2 *.filtered.bed.bz2

Check that the md5sum are correct. You should get:



In [2]:

    
#md5sum *filtered.bed FN433596.fasta S_pombe.gbk
# 7f100537578c3e19b79d3c6d5b3a01b1  S_pombe.gbk
# 91773301c18597ed1c58e8b0b5f375fa  FN433596.fasta
# 69555501bafce246427faebe3b3f32fc  FN433596.filtered.bed
# 5a2173339eff5475ed64bd242162e0a7  JB409847.filtered.bed
# 12ba8ca543d8a314772ee4246bcdf2f6  S_pombe.filtered.bed

Install the sequana library

Please, use conda as explained on http://sequana.readthedocs.io/en/master/installation.html



In [2]:

    
from sequana import GenomeCov



In [3]:

    
%pylab inline
matplotlib.rcParams['figure.figsize'] = [10,7]
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['legend.edgecolor'] = "black"
matplotlib.rcParams['legend.fontsize'] = "large"









    



Populating the interactive namespace from numpy and matplotlib

Figure 1



In [5]:

    
bacteria = GenomeCov("FN433596.filtered.bed", low_threshold=-4, high_threshold=4)
data = bacteria.chr_list[0]
data.run(20001, circular=True)









    Out[5]:





<sequana.bedtools.ChromosomeCovMultiChunk at 0x7fbc65dc6c18>



In [6]:

    
cov_mean = mean(data.df["cov"])
high_t = cov_mean + cov_mean * 25 / 100
low_t = cov_mean - cov_mean * 25 / 100
fig, ax = plt.subplots()
ax.set_facecolor("#eeeeee")
pylab.xlim(0, data.df["pos"].iloc[-1])
p1, = pylab.plot(data.df["cov"], color="k",
                 label="Genome coverage", linewidth=1)
p2 = pylab.axhline(cov_mean, color="#0099cc", linewidth=2, label="Mean coverage")
p3 = pylab.axhline(high_t, color="r", linewidth=2, ls="--", 
                   label="Thresholds")
pylab.axhline(low_t, color="r", linewidth=2, ls="--")
pylab.legend([p1, p2, p3], 
             [p1.get_label(), p2.get_label(), p3.get_label()],
             loc="best")
pylab.xlabel("Base position (bp)", fontsize=18)
pylab.ylabel("Coverage", fontsize=18)
pylab.grid()
pylab.tight_layout()
ylim([0, 1600])
pylab.savefig("mean_coverage.png", dpi=200)



In [7]:

    
# Bottom panel: Running median
thr = 4
mu = data.best_gaussian["mu"]
sigma = data.best_gaussian["sigma"]
zcov1 = (thr * sigma + mu) * data.df["rm"]
zcov2 = (-thr * sigma + mu) * data.df["rm"]
data.df["high_threshold"] = zcov1
data.df["low_threshold"] = zcov2

data.plot_coverage(set_ylimits=False, sample=False)
pylab.xlabel("Base position (bp)", fontsize=18)
pylab.ylabel("Coverage", fontsize=18)
pylab.grid(True)
#pylab.tight_layout()
pylab.savefig("running_median.png", dpi=200)

Figure 2 running median versus moving average



In [8]:

    
from sequana import GenomeCov
b = GenomeCov("FN433596.filtered.bed")
c = b.chr_list[0]
# We inject a deleted region of 15000 bases
c.df.loc[1600000:1610000, "cov"] *= 0
# and a region of 220 depth of coverage (hald the mean depth of coverage) closeby (5000 bases apart)
c.df.loc[1615000:1620000, "cov"] = 220

# here is a hack where we compute the median just to create the data structure 
c.running_median(40001, circular=True)
# and replace it with the moving average
c.moving_average(40001, circular=True)
c.df['rm'] = c.df['ma']

c.compute_zscore()
c.plot_rois(1560000, 1650000)
legend(["Coverage", "Moving average", "Thresholds"], loc="lower left" )
pylab.xlabel("Base position (bp)", fontsize=18)
pylab.ylabel("Coverage", fontsize=18)
pylab.xticks([1560000,1580000,1600000,1620000,1640000], 
             [1560000,1580000,1600000,1620000,1640000])
pylab.savefig("moving_average_focus.png", dpi=200)



In [9]:

    
c.running_median(40001, circular=True)
c.compute_zscore()
c.plot_rois(1560000, 1650000)
pylab.xticks([1560000,1580000,1600000,1620000,1640000], 
             [1560000,1580000,1600000,1620000,1640000])
pylab.xlabel("Base position (bp)", fontsize=18)
pylab.ylabel("Coverage", fontsize=18)
pylab.savefig("running_median_focus.png", dpi=200)

Figure Virus genome



In [10]:

    
b2 = GenomeCov("JB409847.filtered.bed", low_threshold=-3., high_threshold=3)
chrom = b2.chr_list[0]
chrom.running_median(5001, circular=True)
chrom.compute_zscore()
chrom.moving_average(5001, circular=True)
chrom.plot_coverage()
ylim([0, 1400])
chrom.df['ma'].plot(label=r"MA_{5001}", color="y")
ax = legend([ "Genome coverage", 
             r"Running median (5001)", 
              "Thresholds $\delta^+$ and $\delta^-$", 
             r"Moving average (5001)"], loc="lower right", fontsize=12, 
                ncol=1, bbox_to_anchor=(0.85,0.))
xlabel("Base position (bp)", fontsize=18)
ylabel("Coverage", fontsize=18)
grid(True)
savefig("comp_ma_vs_rm.png", dpi=200)

Figure 3



In [11]:

    
#binwidth = 0.05
#bins = np.arange(min(data.df["scale"]), max(data.df["scale"]) + binwidth,
#        binwidth)

ax_plot = gca()
ax_plot.set_facecolor("#eeeeee")
ax_plot.set_xlim(0,data.df["pos"].iloc[-1])
ax_plot.plot(data.df["scale"], color="k", label='Normalised genome coverage')
ax_plot.ticklabel_format(axis='x', style='sci', scilimits=(-2,2))
ax_plot.set_xticklabels(["", 0.5, 1, 1.5, 2, 2.5, 3])
ax_plot.set_xlabel(r"Base position (Mbp)", fontsize=18)
ax_plot.set_ylabel("Normalized coverage", fontsize=18)
ax_plot.grid()
ax_plot.axhline(1, lw=2,color="#0099cc")
ax_plot.axhline(1.27, lw=2,color="red", ls="--", label="Threshold at $z=3$")
ax_plot.axhline(0.73, lw=2,color="red", ls="--", label="_nolegend")
ax_plot.axhline(1.36, lw=2,color="red", ls="-.", label="Threshold at $z=4$")
ax_plot.axhline(0.64, lw=2,color="red", ls="-.", label="_nolegend")
legend(loc="upper left")
ylim([0, 3.5])
pylab.savefig("normalize_coverage.png", dpi=200)

Figure 4



In [12]:

    
# Load the data for the virus, bacteria, fungus
virus = GenomeCov("JB409847.filtered.bed")
ch_virus = virus.chr_list[0]
ch_virus.run(4001, circular=True)









    Out[12]:





<sequana.bedtools.ChromosomeCovMultiChunk at 0x7fbc3db1c390>



In [13]:

    
bacteria = GenomeCov("FN433596.filtered.bed")
ch_bacteria = bacteria.chr_list[0]
ch_bacteria.run(20001, circular=True)









    Out[13]:





<sequana.bedtools.ChromosomeCovMultiChunk at 0x7fbc3db22ba8>



In [14]:

    
fungus = GenomeCov("S_pombe.filtered.bed", chunksize=10000000)
ch_fungus = fungus.chr_list[0]
ch_fungus.run(20001)









    Out[14]:





<sequana.bedtools.ChromosomeCovMultiChunk at 0x7fbc3dafb1d0>



In [16]:

    
# The plot itself
ax1 = subplot(3,1,1)
ax2 = subplot(3,1,2)
ax3 = subplot(3,1,3)
axes = [ax1, ax2, ax3]
# common xlim for all
space = linspace(0,2,80)
 
ch_bacteria.mixture_fitting.plot(bins=space,ax=ax1,hist_kw={'color':'#5F9EA0', "edgecolor":"k"})
ch_virus.mixture_fitting.plot(bins=space,ax=ax3,hist_kw={'color':'#5F9EA0', "edgecolor":"k"})
ch_fungus.mixture_fitting.plot(bins=space,ax=ax2,hist_kw={'color':'#5F9EA0', "edgecolor":"k"})

for ax in axes: ax.set_xlim([0, 1.5])
for ax in axes: ax.set_ylim([0, 6])
for ax in axes: ax.grid(True)
for ax in axes: ax.set_facecolor("#eeeeee")

# share the xticks
ax1.set_xticklabels([])
ax2.set_xticklabels([])

ax3.set_xlabel(r"Normalised genome coverage $\widetilde{C}_b$", fontsize=20)

for ax in axes: ax.set_ylabel("PDF", fontsize=16)

text(0.05,15.9, "Bacteria:\n$C_0=\mathcal{N}(1.00, 0.09)$ with $\pi_0=0.95$" +
    "\n$C_1=\mathcal{N}(0.92, 0.29)$ with $\pi_1=0.05$ \n$\Theta=0.974$",
    bbox={"lw":2,"facecolor":"white", "alpha":0.5}, fontsize=16)

text(0.05,8.7, "Funghi: \n$C_0=\mathcal{N}(1.00, 0.16)$ with $\pi_0=0.93$ "+
    "\n$C_1=\mathcal{N}(0.98, 0.30)$ with $\pi_1=0.07$ \n$\Theta=0.990$",
    bbox={"lw":2, "facecolor":"white", "alpha":0.5},fontsize=16)

text(0.05,1.6,"Virus:\n$C_0=\mathcal{N}(1.01, 0.06)$ with $\pi_0=0.79$ \n" +
    "$C_1=\mathcal{N}(0.80, 0.26)$ with $\pi_1=0.21$\n$\Theta=0.845$", 
    bbox={"lw":2, "facecolor":"white","alpha":0.5}, fontsize=16)


for ax in axes:
    ax.get_lines()[2].set_color("red")
    ax.get_lines()[2].set_linestyle("-")
    ax.get_lines()[2].set_linewidth(2)
    ax.get_lines()[1].set_color("black")
    ax.get_lines()[1].set_linewidth(2)
    ax.get_lines()[1].set_linestyle("-")
    ax.get_lines()[0].set_linewidth(1)
    ax.get_lines()[0].set_linestyle("--")
    ax.get_lines()[0].set_color("blue")
ax1.legend(['Mixture model','model $C_0$', 'model $C_1$'], 
           loc="upper right", labelspacing=0, fontsize=14)

pylab.tight_layout()

Figure 5



In [17]:

    
# Double threshold picture. Reusing the bacteria data
region = ch_bacteria.df[2017000:2019000].copy()
region['pos'] -= 2017000
region.index = range(2000)
thr = 8
mu = data.best_gaussian["mu"]
sigma = data.best_gaussian["sigma"]
zcov1 = (thr * sigma + mu) * region["rm"]
zcov2 = ((thr/2) * sigma + mu) * region["rm"]
region["threshold1"] = zcov1
region["threshold2"] = zcov2


fig, ax = plt.subplots()
ax.set_facecolor("#eeeeee")
p0, = pylab.plot(region["cov"], color="k", label="Genome coverage", 
                 linewidth=2)
p1, = pylab.plot(region["rm"], "r", linewidth=2, label="Running median")
p2, = pylab.plot(region["threshold1"], linewidth=2, color="#666666", ls="--",
                 label="First threshold")
p3, = pylab.plot(region["threshold2"], linewidth=2, color="#666666", ls="-.",
                 label="Second threshold")
ax.fill_between(region["pos"], region["cov"], region["threshold2"], 
        where=region["cov"] > region["threshold2"], color="#66cdaa")
ax.fill_between(region["pos"], region["cov"], region["threshold1"], 
        where=region["cov"] > region["threshold1"], color="#088da5")
pylab.legend([p0, p1, p2, p3], 
             [p0.get_label(), p1.get_label(), p2.get_label(), 
              p3.get_label()], loc="upper left")
pylab.xlabel("Base position (bp)", fontsize=18)
pylab.ylabel("Coverage", fontsize=18)
pylab.grid()
pylab.tight_layout()
pylab.axvline(531, lw=2, color="orange", ls="--")
pylab.axvline(1824, lw=2, color="orange", ls="--")
xlim([0,2000])
pylab.savefig("double_threshold.png", dpi=200)

Figure 6



In [18]:

    
b = GenomeCov("FN433596.filtered.bed")
c = b.chr_list[0]
b.compute_gc_content("FN433596.fasta")
c.plot_gc_vs_coverage(Nlevels=6, bins=[80,57], cmap="BrBG")
savefig("hist2d.png", dpi=200)
savefig("hist2d.eps", dpi=200)









    



Computing 2D histogram. Please wait

Figure 7

This figure and others are available in the notbook 09 as well as explanations.



In [42]:

    
# load CNV from cnvnator
from sequana.cnv import CNVnator
cnvnator = CNVnator("cnvnator_events_fig7_bin6.txt").df
cnvnator = CNVnator("cnvnator_events_fig7_bin100.txt").df

# load the BED and ROIs pre-generated using W=100000
b = GenomeCov("FN433596.bed")
chrom = b.chr_list[0]
chrom.run(100001, circular=True, cnv_delta=10)
#chromosome.rois.df.to_csv("rois_100000_10.csv")









    Out[42]:





<sequana.bedtools.ChromosomeCovMultiChunk at 0x7fbc668aa400>



In [41]:

    
def plot_variant(chrom, x1=526000, x2=538000):
    clf()    
    chrom.plot_rois(x1, x2)
    roi = chrom.rois
    low = roi.get_low_rois()
    high = roi.get_high_rois()
    chr = chrom
    highT = (chr.thresholds.high * chr.best_gaussian["sigma"] +
         chr.best_gaussian["mu"]) * chr.df["rm"]
    lowT = (chr.thresholds.low * chr.best_gaussian["sigma"] +
        chr.best_gaussian["mu"]) * chr.df["rm"]

    for k,v in low.iterrows():        
        Y1 = chr.df['cov'].loc[v.start:v.end]
        Y2 = lowT.loc[v.start:v.end] 
        Y1 = Y1.combine(Y2, max) *0
        #if v.start > x1 and v.end<x2:
        #    try:fill_between(range(v.start, v.end+1), Y1, Y2, 
        #                     alpha=0.3, color="green")
        #    except:pass   
    # CNVntator 
    xlim([x1, x2])
    x1, x2 = xlim()
    L = x2-x1
    for i in range(len(cnvnator)):
        this = cnvnator.loc[i]
        axhline(chrom.df['cov'].mean() * this[3], (this.start-x1)/L, 
                (this.end-x1)/L, lw=8, color="yellow", alpha=0.5)
    xlabel("Base position (bp)")
    
plot_variant(chrom, 374000,400000)
pylab.xlabel("Base position (bp)", fontsize=18)
pylab.ylabel("Coverage", fontsize=18)
pylab.savefig("comp_cnv_bin100_2.png", dpi=200)

Figure 8

See notebook 08 on github.com/sequana_resources/coverage

Or run population.py script. You will still need the 6 BED files created in notebook 8 !

Figure 9



In [43]:

    
b6 = GenomeCov("ERR316404.bed")
c6 = b6.chr_list[0]
c6.run(40000)









    Out[43]:





<sequana.bedtools.ChromosomeCovMultiChunk at 0x7fbc62c0d668>



In [51]:

    
#_, axes = subplots(2,1)
#sca(axes[0])
c6.plot_rois(2828000,2829500, clf=False)
axes[0].fill_between([2828489,2829247],0,350, alpha=0.3, color="green")
pylab.xlabel("Base position (bp)", fontsize=18)
pylab.ylabel("Coverage", fontsize=18)
#sca(axes[1])
#c3.plot_rois(2828000,2829500, clf=False)
#axes[1].
fill_between([2828489,2829247],0,350, alpha=0.3, color="green")
savefig("triple_event.png", dpi=200)

Figure (not included)



In [6]:

    
b = GenomeCov("S_pombe.filtered.bed", "S_pombe.gbk")
c = b.chr_list[1]
c.running_median(20001, circular=False)
c.compute_zscore()
c.df = c.df.iloc[1000000:1100000]
c.plot_coverage()
c.df.mapq0.plot(color="orange", zorder=-1, label="Filtered coverage", lw=3)
xlim([1043000,1047000])
grid()
ylim([0,350])
legend()
l = gca().get_legend()
l.set_title("")
xticks([1043000,1044000,1045000,1046000,1047000], 
       [1043000,1044000,1045000,1046000,1047000])
xlabel("Base position (bp)", fontsize=18)
tight_layout()
savefig("excess.png", dpi=200)
savefig("excess.eps", dpi=200)

Figure (not included)



In [7]:

    
b = GenomeCov("S_pombe.filtered.bed", "S_pombe.gbk")
c = b.chr_list[0]
c.running_median(20001, circular=False)
c.compute_zscore()
# To speed up the plotting, let us keep only 200,000 data points
c.df = c.df.iloc[3900000:4100000]



In [10]:

    
from sequana.tools import genbank_features_parser
c.plot_coverage()
c.df.mapq0.plot(color="orange", zorder=-1, label="Filtered coverage", lw=3)
XMIN, XMAX = 4020000,4030000
xlim([XMIN, XMAX])
grid()
YMAX = 300
ylim([0,YMAX])
legend()
l = gca().get_legend()
l.set_title("")
tight_layout()
xlabel("Base position (bp)", fontsize=16)

# highlight repeat regions
gbk = genbank_features_parser("S_pombe.gbk")
repeats = [this for this in gbk["CU329670"] if this['type'] == "repeat_region"\
           and this['gene_start']> XMIN and this['gene_end']<XMAX]
for repeat in repeats:
    fill_between([repeat["gene_start"], repeat["gene_end"]], [YMAX,YMAX],
        color="green", alpha=0.2, zorder=-2)

# highlight CDS retrotransposable
CDSs = [this for this in gbk["CU329670"] if this['type'] == "CDS" and \
        this['gene_start']> XMIN and this['gene_end']<XMAX]
for cds in CDSs:
    if cds['product'] == "retrotransposable element":
        fill_between([cds["gene_start"], cds["gene_end"]], [YMAX, YMAX],
            alpha=0.2, color="blue", zorder=-2)

savefig("repeats.png", dpi=200)
savefig("repeats.eps", dpi=200)

Figure running median performances



In [12]:

    
import pandas as pd



In [32]:

    
### INSTALL blist using pip install blist==1.3.6
import scipy.signal
from sequana.running_median import RunningMedian
from easydev import Timer
import pandas as pd
data = pylab.randn(1000000)
windows = [101,501,1001,5001,10001, 20001, 30001, 40001, 50001, 100001, 
           200001, 400001]
times2 = []
times3 = []
times_pandas = [] 
times_scipy = []

def compute_times(do_scipy=True):
    for window in windows:
        print(window)
        with Timer(times2):
            results = RunningMedian(data, window).run()
        with Timer(times3):
            from blist import blist
            results = RunningMedian(data, window, blist).run()
        with Timer(times_pandas):
            # The DataFrame creation is taken into account here but in the
            # Genomecov, the data is already formatted as a DataFrame, do we
            # want to substract this time ?
            results = pd.DataFrame(data).rolling(window, center=True).median()
        if do_scipy:
            with Timer(times_scipy):
                results = scipy.signal.medfilt(data, window)
            print(times_scipy[-1])
    print(times_scipy)

# Uncomment this to re-run the function, that takes long time especially 
# with scipy set to True
compute_times(do_scipy=False)



In [35]:

    
windows = [101,501,1001,5001,10001, 20001, 30001, 40001, 50001, 100001, 200001, 400001]
times2 = [1.095, 1.285, 1.488, 2.739, 4.335, 7.811, 11.176,
          14.191, 17.533, 33.9416, 59.198, 91.28] 
times3 = [1.1578, 2.121, 3.077, 5.218, 6.695, 7.340, 8.330, 
          8.463, 8.719, 10.699, 10.969, 9.48]
times_scipy = [5.27, 34.05, 75.5, 435.36, 925.13, 1950.7, 
               3046.56, 3901, 4996, 9993, 19986, 40151]
times_pandas =  [0.465, 0.512, 0.572, 0.771, 0.903, 1.046, 1.075,
                 1.175, 1.303, 1.708, 2.151, 2.631]
#times_scipy = times_scipy[0:11]

def plotting():
    f = pylab.figure()
    pylab.semilogy(windows, times2, label="sequana_list", marker="o", lw=2)
    pylab.semilogy(windows, times3, label="sequana_blist", marker="o", lw=2)
    pylab.semilogy(windows, times_scipy, label="SciPy v0.17", marker="o", lw=2)
    pylab.semilogy(windows, times_pandas, label="Pandas", marker="o", lw=2)
    ax = f.get_axes()[0]
    ax.set_facecolor("#eeeeee")

    pylab.xlabel("Window size (W)")
    pylab.ylabel("Time (seconds)")

    pylab.ylim([0.001, 50000])
    pylab.xlim([100, 400010])

    pylab.grid()

    x = np.array(windows)
    pylab.semilogy(x, 3.5e-4*x, "kx--", lw=2, label="O(W)")
    pylab.semilogy(x, 0.8*(pylab.log(x)), "ko:", lw=2, label='O(log(W))') # blist
    pylab.semilogy(x, 0.1*x, "ks-.", label="O(W)") # scipy

    pylab.semilogy(x, list(9e-2*pylab.log(x[0:6])) + list(6e-3*x[6:]**0.5),
        "k--", label=r"O(log(W)) then O($\sqrt{W}$)")

    pylab.axvline(20000, color="r",  linestyle="--")
    pylab.legend(loc="lower right", fontsize="medium", ncol=2)
    pylab.savefig("running_mean_comp.png")
    pylab.loglog()
    pylab.savefig("running_mean_comp_loglog.png")

plotting()

Table1

virus column



In [4]:

    
# Load the data for the virus, bacteria, funghi
virus = GenomeCov("JB409847.filtered.bed")
ch_virus = virus.chr_list[0]
ch_virus.running_median(5001, circular=True)
ch_virus.compute_zscore()



In [15]:

    
# standard metrics
def plot_stats(data):
    stats = data.get_stats()
    stats["mu"] = ["", 1]
    # centralness and mixture model related
    for k,v in data.best_gaussian.items():
        stats[k] = [v]
    stats["centraless"] = [data.get_centralness()]
    stats = pd.DataFrame(stats).T
    return stats
plot_stats(ch_virus)

bacteria



In [18]:

    
bacteria = GenomeCov("FN433596.filtered.bed")
ch_bacteria = bacteria.chr_list[0]
ch_bacteria.running_median(5001, circular=True)
ch_bacteria.compute_zscore()









    



WARNING [sequana]:  
Found mixture model parameters (k=2) where  |mu0-mu1| < sigma0. k=1 could be a better choice.mu0=0.9994826833416014, m1=0.9729187164266592, sigma0=0.07223993143698498, sigma1=0.30810318358668204



In [19]:

    
plot_stats(ch_bacteria)



In [ ]:

    
# fungus case



In [21]:

    
fungus = GenomeCov("S_pombe.filtered.bed")
ch_fungus = fungus.chr_list[0]
ch_fungus.running_median(20001)
ch_fungus.compute_zscore()
plot_stats(ch_fungus)









    





            
               
            
            
            






    














    














    














    














    














    














    









    



WARNING [sequana]:  
Found mixture model parameters (k=2) where  |mu0-mu1| < sigma0. k=1 could be a better choice.mu0=1.0016846179530743, m1=0.993576451176356, sigma0=0.1623235560435524, sigma1=0.3097322232293308






    Out[21]:







  
    
      
      0
    
  
  
    
      BOC
      99.997660
    
    
      CV
      0.189097
    
    
      DOC
      105.326499
    
    
      MAD
      12.000000
    
    
      Median
      105.000000
    
    
      STD
      19.916975
    
    
      centraless
      0.986589
    
    
      mu
      1.001685
    
    
      pi
      0.932079
    
    
      sigma
      0.162324



In [23]:

    
fungus = GenomeCov("S_pombe.filtered.bed")
ch_fungus = fungus.chr_list[0]
ch_fungus.running_median(5001)
ch_fungus.compute_zscore()
plot_stats(ch_fungus)









    





            
               
            
            
            






    














    














    














    














    














    














    









    



WARNING [sequana]:  
Found mixture model parameters (k=2) where  |mu0-mu1| < sigma0. k=1 could be a better choice.mu0=1.0027822620022897, m1=0.9843636781690419, sigma0=0.1568422428323304, sigma1=0.28133013193623313






    Out[23]:







  
    
      
      0
    
  
  
    
      BOC
      99.997660
    
    
      CV
      0.189097
    
    
      DOC
      105.326499
    
    
      MAD
      12.000000
    
    
      Median
      105.000000
    
    
      STD
      19.916975
    
    
      centraless
      0.985587
    
    
      mu
      1.002782
    
    
      pi
      0.909135
    
    
      sigma
      0.156842



In [ ]:

	0
BOC	96.595100
CV	0.254645
DOC	931.310735
MAD	71.000000
Median	988.000000
STD	237.153920
centraless	0.868452
mu	1.011224
pi	0.851098
sigma	0.068569

	0
BOC	98.463859
CV	0.187838
DOC	447.805918
MAD	44.000000
Median	453.000000
STD	84.115099
centraless	0.960712
mu	0.999483
pi	0.962268
sigma	0.072240

	0
BOC	99.997660
CV	0.189097
DOC	105.326499
MAD	12.000000
Median	105.000000
STD	19.916975
centraless	0.986589
mu	1.001685
pi	0.932079
sigma	0.162324