Executed: Mon Mar 27 11:35:09 2017

Duration: 11 seconds.

usALEX-5samples - Template

This notebook is executed through 8-spots paper analysis. For a direct execution, uncomment the cell below.



In [1]:

    
ph_sel_name = "DexDem"



In [2]:

    
data_id = "17d"



In [3]:

    
# ph_sel_name = "all-ph"
# data_id = "7d"

Load software and filenames definitions



In [4]:

    
from fretbursts import *









    



 - Optimized (cython) burst search loaded.
 - Optimized (cython) photon counting loaded.
--------------------------------------------------------------
 You are running FRETBursts (version 0.5.9).

 If you use this software please cite the following paper:

   FRETBursts: An Open Source Toolkit for Analysis of Freely-Diffusing Single-Molecule FRET
   Ingargiola et al. (2016). http://dx.doi.org/10.1371/journal.pone.0160716 

--------------------------------------------------------------



In [5]:

    
init_notebook()
from IPython.display import display

Data folder:



In [6]:

    
data_dir = './data/singlespot/'



In [7]:

    
import os
data_dir = os.path.abspath(data_dir) + '/'
assert os.path.exists(data_dir), "Path '%s' does not exist." % data_dir

List of data files:



In [8]:

    
from glob import glob
file_list = sorted(f for f in glob(data_dir + '*.hdf5') if '_BKG' not in f)
## Selection for POLIMI 2012-11-26 datatset
labels = ['17d', '27d', '7d', '12d', '22d']
files_dict = {lab: fname for lab, fname in zip(labels, file_list)}
files_dict









    Out[8]:





{'12d': '/Users/anto/Google Drive/notebooks/multispot_paper/data/singlespot/007_dsDNA_12d_3nM_green100u_red40u.hdf5',
 '17d': '/Users/anto/Google Drive/notebooks/multispot_paper/data/singlespot/004_dsDNA_17d_green100u_red40u.hdf5',
 '22d': '/Users/anto/Google Drive/notebooks/multispot_paper/data/singlespot/008_dsDNA_22d_500pM_green100u_red40u.hdf5',
 '27d': '/Users/anto/Google Drive/notebooks/multispot_paper/data/singlespot/005_dsDNA_27d_green100u_red40u.hdf5',
 '7d': '/Users/anto/Google Drive/notebooks/multispot_paper/data/singlespot/006_dsDNA_7d_green100u_red40u.hdf5'}



In [9]:

    
ph_sel_map = {'all-ph': Ph_sel('all'), 'Dex': Ph_sel(Dex='DAem'), 
              'DexDem': Ph_sel(Dex='Dem')}
ph_sel = ph_sel_map[ph_sel_name]

data_id, ph_sel_name









    Out[9]:





('17d', 'DexDem')

Data load

Initial loading of the data:



In [10]:

    
d = loader.photon_hdf5(filename=files_dict[data_id])

Laser alternation selection

At this point we have only the timestamps and the detector numbers:



In [11]:

    
d.ph_times_t, d.det_t









    Out[11]:





([array([       1954,        2562,        3108, ..., 48000428736,
         48000442778, 48000447993])],
 [array([1, 1, 1, ..., 1, 0, 0], dtype=uint32)])

We need to define some parameters: donor and acceptor ch, excitation period and donor and acceptor excitiations:



In [12]:

    
d.add(det_donor_accept=(0, 1), alex_period=4000, D_ON=(2850, 580), A_ON=(900, 2580), offset=0)

We should check if everithing is OK with an alternation histogram:



In [13]:

    
plot_alternation_hist(d)

If the plot looks good we can apply the parameters with:



In [14]:

    
loader.alex_apply_period(d)









    



# Total photons (after ALEX selection):   2,453,382
#  D  photons in D+A excitation periods:    756,184
#  A  photons in D+A excitation periods:  1,697,198
# D+A photons in  D  excitation period:   1,462,400
# D+A photons in  A  excitation period:     990,982

Measurements infos

All the measurement data is in the d variable. We can print it:



In [15]:

    
d









    Out[15]:





singlespot_004_dsDNA_17d_green100u_red40u G1.000

Or check the measurements duration:



In [16]:

    
d.time_max









    Out[16]:





600.00559991249997

Compute background

Compute the background using automatic threshold:



In [17]:

    
d.calc_bg(bg.exp_fit, time_s=60, tail_min_us='auto', F_bg=1.7)









    



 - Calculating BG rates ... [DONE]



In [18]:

    
dplot(d, timetrace_bg)









    Out[18]:





<matplotlib.axes._subplots.AxesSubplot at 0x1041e1160>



In [19]:

    
d.rate_m, d.rate_dd, d.rate_ad, d.rate_aa









    Out[19]:





([2286.0248813678868],
 [613.94010117705795],
 [914.22216629179422],
 [742.22554889663013])

Burst search and selection



In [20]:

    
bs_kws = dict(L=10, m=10, F=7, ph_sel=ph_sel)
d.burst_search(**bs_kws)









    



 - Performing burst search (verbose=False) ... - Recomputing background limits for DexDem ... [DONE]
 - Recomputing background limits for all ... [DONE]
 - Fixing  burst data to refer to ph_times_m ... [DONE]
[DONE]
 - Calculating burst periods ...[DONE]
 - Counting D and A ph and calculating FRET ... 
   - Applying background correction.
   - Applying leakage correction.
   - Applying direct excitation correction.
   [DONE Counting D/A]



In [21]:

    
th1 = 30
ds = d.select_bursts(select_bursts.size, th1=30)



In [22]:

    
bursts = (bext.burst_data(ds, include_bg=True, include_ph_index=True)
          .round({'E': 6, 'S': 6, 'bg_d': 3, 'bg_a': 3, 'bg_aa': 3, 'nd': 3, 'na': 3, 'naa': 3, 'nda': 3, 'nt': 3, 'width_ms': 4}))



In [23]:

    
bursts.head()



In [24]:

    
burst_fname = ('results/bursts_usALEX_{sample}_{ph_sel}_F{F:.1f}_m{m}_size{th}.csv'
               .format(sample=data_id, th=th1, **bs_kws))
burst_fname









    Out[24]:





'results/bursts_usALEX_17d_DexDem_F7.0_m10_size30.csv'



In [25]:

    
bursts.to_csv(burst_fname)



In [26]:

    
assert d.dir_ex == 0
assert d.leakage == 0



In [27]:

    
print(d.ph_sel)
dplot(d, hist_fret);









    



DexDem



In [28]:

    
# if data_id in ['7d', '27d']:
#     ds = d.select_bursts(select_bursts.size, th1=20)
# else:
#     ds = d.select_bursts(select_bursts.size, th1=30)



In [29]:

    
ds = d.select_bursts(select_bursts.size, add_naa=False, th1=30)



In [30]:

    
n_bursts_all = ds.num_bursts[0]



In [31]:

    
def select_and_plot_ES(fret_sel, do_sel):
    ds_fret= ds.select_bursts(select_bursts.ES, **fret_sel)
    ds_do = ds.select_bursts(select_bursts.ES, **do_sel)
    bpl.plot_ES_selection(ax, **fret_sel)
    bpl.plot_ES_selection(ax, **do_sel)    
    return ds_fret, ds_do



In [32]:

    
ax = dplot(ds, hist2d_alex, S_max_norm=2, scatter_alpha=0.1)

if data_id == '7d':
    fret_sel = dict(E1=0.60, E2=1.2, S1=0.2, S2=0.9, rect=False)
    do_sel = dict(E1=-0.2, E2=0.5, S1=0.8, S2=2, rect=True)    
    ds_fret, ds_do = select_and_plot_ES(fret_sel, do_sel)
    
elif data_id == '12d':
    fret_sel = dict(E1=0.30,E2=1.2,S1=0.131,S2=0.9, rect=False)
    do_sel = dict(E1=-0.4, E2=0.4, S1=0.8, S2=2, rect=False)
    ds_fret, ds_do = select_and_plot_ES(fret_sel, do_sel)

elif data_id == '17d':
    fret_sel = dict(E1=0.01, E2=0.98, S1=0.14, S2=0.88, rect=False)
    do_sel = dict(E1=-0.4, E2=0.4, S1=0.80, S2=2, rect=False)
    ds_fret, ds_do = select_and_plot_ES(fret_sel, do_sel)

elif data_id == '22d':
    fret_sel = dict(E1=-0.16, E2=0.6, S1=0.2, S2=0.80, rect=False)
    do_sel = dict(E1=-0.2, E2=0.4, S1=0.85, S2=2, rect=True)
    ds_fret, ds_do = select_and_plot_ES(fret_sel, do_sel)    

elif data_id == '27d':
    fret_sel = dict(E1=-0.1, E2=0.5, S1=0.2, S2=0.82, rect=False)
    do_sel = dict(E1=-0.2, E2=0.4, S1=0.88, S2=2, rect=True)
    ds_fret, ds_do = select_and_plot_ES(fret_sel, do_sel)



In [33]:

    
n_bursts_do = ds_do.num_bursts[0]
n_bursts_fret = ds_fret.num_bursts[0]

n_bursts_do, n_bursts_fret









    Out[33]:





(1045, 3276)



In [34]:

    
d_only_frac = 1.*n_bursts_do/(n_bursts_do + n_bursts_fret)
print ('D-only fraction:', d_only_frac)









    



D-only fraction: 0.241842166165



In [35]:

    
dplot(ds_fret, hist2d_alex, scatter_alpha=0.1);



In [36]:

    
dplot(ds_do, hist2d_alex, S_max_norm=2, scatter=False);

Donor Leakage fit

Half-Sample Mode

Fit peak usng the mode computed with the half-sample algorithm (Bickel 2005).



In [37]:

    
def hsm_mode(s):
    """
    Half-sample mode (HSM) estimator of `s`.

    `s` is a sample from a continuous distribution with a single peak.
    
    Reference:
        Bickel, Fruehwirth (2005). arXiv:math/0505419
    """
    s = memoryview(np.sort(s))
    i1 = 0
    i2 = len(s)

    while i2 - i1 > 3:
        n = (i2 - i1) // 2
        w = [s[n-1+i+i1] - s[i+i1] for i in range(n)]
        i1 = w.index(min(w)) + i1
        i2 = i1 + n

    if i2 - i1 == 3:
        if s[i1+1] - s[i1] < s[i2] - s[i1 + 1]:
            i2 -= 1
        elif s[i1+1] - s[i1] > s[i2] - s[i1 + 1]:
            i1 += 1
        else:
            i1 = i2 = i1 + 1

    return 0.5*(s[i1] + s[i2])



In [38]:

    
E_pr_do_hsm = hsm_mode(ds_do.E[0])
print ("%s: E_peak(HSM) = %.2f%%" % (ds.ph_sel, E_pr_do_hsm*100))









    



DexDem: E_peak(HSM) = 9.56%

Gaussian Fit

Fit the histogram with a gaussian:



In [39]:

    
E_fitter = bext.bursts_fitter(ds_do, weights=None)
E_fitter.histogram(bins=np.arange(-0.2, 1, 0.03))



In [40]:

    
E_fitter.fit_histogram(model=mfit.factory_gaussian())
E_fitter.params



In [41]:

    
res = E_fitter.fit_res[0]
res.params.pretty_print()









    



Name          Value      Min      Max   Stderr     Vary     Expr
amplitude    0.9792     -inf      inf 0.009766     True     None
center      0.09503       -1        2 0.0006992     True     None
fwhm          0.143     -inf      inf 0.001647    False 2.3548200*sigma
height        6.434     -inf      inf  0.06417    False 0.3989423*amplitude/max(1.e-15, sigma)
sigma       0.06072        0      inf 0.0006992     True     None



In [42]:

    
E_pr_do_gauss = res.best_values['center']
E_pr_do_gauss









    Out[42]:





0.09503420866404944

KDE maximum



In [43]:

    
bandwidth = 0.03
E_range_do = (-0.1, 0.15)
E_ax = np.r_[-0.2:0.401:0.0002]

E_fitter.calc_kde(bandwidth=bandwidth)
E_fitter.find_kde_max(E_ax, xmin=E_range_do[0], xmax=E_range_do[1])
E_pr_do_kde = E_fitter.kde_max_pos[0]
E_pr_do_kde









    Out[43]:





0.095000000000008467

Leakage summary



In [44]:

    
mfit.plot_mfit(ds_do.E_fitter, plot_kde=True, plot_model=False)
plt.axvline(E_pr_do_hsm, color='m', label='HSM')
plt.axvline(E_pr_do_gauss, color='k', label='Gauss')
plt.axvline(E_pr_do_kde, color='r', label='KDE')
plt.xlim(0, 0.3)
plt.legend()
print('Gauss: %.2f%%\n  KDE: %.2f%%\n  HSM: %.2f%%' % 
      (E_pr_do_gauss*100, E_pr_do_kde*100, E_pr_do_hsm*100))









    



Gauss: 9.50%
  KDE: 9.50%
  HSM: 9.56%

Burst size distribution



In [45]:

    
nt_th1 = 50



In [46]:

    
dplot(ds_fret, hist_size, which='all', add_naa=False)
xlim(-0, 250)
plt.axvline(nt_th1)









    Out[46]:





<matplotlib.lines.Line2D at 0x123dc9ac8>



In [47]:

    
Th_nt = np.arange(35, 120)
nt_th = np.zeros(Th_nt.size)
for i, th in enumerate(Th_nt):
    ds_nt = ds_fret.select_bursts(select_bursts.size, th1=th)
    nt_th[i] = (ds_nt.nd[0]  + ds_nt.na[0]).mean() - th



In [48]:

    
plt.figure()
plot(Th_nt, nt_th)
plt.axvline(nt_th1)









    Out[48]:





<matplotlib.lines.Line2D at 0x123cea978>



In [49]:

    
nt_mean = nt_th[np.where(Th_nt == nt_th1)][0]
nt_mean









    Out[49]:





29.152810284389147

Fret fit

Max position of the Kernel Density Estimation (KDE):



In [50]:

    
E_pr_fret_kde = bext.fit_bursts_kde_peak(ds_fret, bandwidth=bandwidth, weights='size')
E_fitter = ds_fret.E_fitter



In [51]:

    
E_fitter.histogram(bins=np.r_[-0.1:1.1:0.03])
E_fitter.fit_histogram(mfit.factory_gaussian(center=0.5))



In [52]:

    
E_fitter.fit_res[0].params.pretty_print()









    



Name          Value      Min      Max   Stderr     Vary     Expr
amplitude    0.9741     -inf      inf  0.02171     True     None
center       0.4477       -1        2 0.002632     True     None
fwhm         0.2408     -inf      inf 0.006199    False 2.3548200*sigma
height          3.8     -inf      inf  0.08471    False 0.3989423*amplitude/max(1.e-15, sigma)
sigma        0.1023        0      inf 0.002632     True     None



In [53]:

    
fig, ax = plt.subplots(1, 2, figsize=(14, 4.5))
mfit.plot_mfit(E_fitter, ax=ax[0])
mfit.plot_mfit(E_fitter, plot_model=False, plot_kde=True, ax=ax[1])
print('%s\nKDE peak %.2f ' % (ds_fret.ph_sel, E_pr_fret_kde*100))
display(E_fitter.params*100)









    



DexDem
KDE peak 46.34 






    






  
    
      
      amplitude
      center
      sigma
    
  
  
    
      0
      97.412
      44.7667
      10.2266

Weighted mean of $E$ of each burst:



In [54]:

    
ds_fret.fit_E_m(weights='size')









    Out[54]:





array([ 0.4358676])

Gaussian fit (no weights):



In [55]:

    
ds_fret.fit_E_generic(fit_fun=bl.gaussian_fit_hist, bins=np.r_[-0.1:1.1:0.03], weights=None)









    Out[55]:





array([ 0.44494197])

Gaussian fit (using burst size as weights):



In [56]:

    
ds_fret.fit_E_generic(fit_fun=bl.gaussian_fit_hist, bins=np.r_[-0.1:1.1:0.005], weights='size')









    Out[56]:





array([ 0.44754897])



In [57]:

    
E_kde_w = E_fitter.kde_max_pos[0]
E_gauss_w = E_fitter.params.loc[0, 'center']
E_gauss_w_sig = E_fitter.params.loc[0, 'sigma']
E_gauss_w_err = float(E_gauss_w_sig/np.sqrt(ds_fret.num_bursts[0]))
E_gauss_w_fiterr = E_fitter.fit_res[0].params['center'].stderr
E_kde_w, E_gauss_w, E_gauss_w_sig, E_gauss_w_err, E_gauss_w_fiterr









    Out[57]:





(0.46340000000001896,
 0.4476669832059237,
 0.10226625899569441,
 0.0017867361435577153,
 0.0026323496214673619)

Stoichiometry fit

Max position of the Kernel Density Estimation (KDE):



In [58]:

    
S_pr_fret_kde = bext.fit_bursts_kde_peak(ds_fret, burst_data='S', bandwidth=0.03) #weights='size', add_naa=True)
S_fitter = ds_fret.S_fitter



In [59]:

    
S_fitter.histogram(bins=np.r_[-0.1:1.1:0.03])
S_fitter.fit_histogram(mfit.factory_gaussian(), center=0.5)



In [60]:

    
fig, ax = plt.subplots(1, 2, figsize=(14, 4.5))
mfit.plot_mfit(S_fitter, ax=ax[0])
mfit.plot_mfit(S_fitter, plot_model=False, plot_kde=True, ax=ax[1])
print('%s\nKDE peak %.2f ' % (ds_fret.ph_sel, S_pr_fret_kde*100))
display(S_fitter.params*100)









    



DexDem
KDE peak 57.02 






    






  
    
      
      amplitude
      center
      sigma
    
  
  
    
      0
      100.735
      57.4012
      10.9407



In [61]:

    
S_kde = S_fitter.kde_max_pos[0]
S_gauss = S_fitter.params.loc[0, 'center']
S_gauss_sig = S_fitter.params.loc[0, 'sigma']
S_gauss_err = float(S_gauss_sig/np.sqrt(ds_fret.num_bursts[0]))
S_gauss_fiterr = S_fitter.fit_res[0].params['center'].stderr
S_kde, S_gauss, S_gauss_sig, S_gauss_err, S_gauss_fiterr









    Out[61]:





(0.57020000000002202,
 0.5740123603201247,
 0.10940666169217383,
 0.00191148907480451,
 0.0017682063310031167)

The Maximum likelihood fit for a Gaussian population is the mean:



In [62]:

    
S = ds_fret.S[0]
S_ml_fit = (S.mean(), S.std())
S_ml_fit









    Out[62]:





(0.58003379464527438, 0.10564716164598524)

Computing the weighted mean and weighted standard deviation we get:



In [63]:

    
weights = bl.fret_fit.get_weights(ds_fret.nd[0], ds_fret.na[0], weights='size', naa=ds_fret.naa[0], gamma=1.)
S_mean = np.dot(weights, S)/weights.sum()
S_std_dev = np.sqrt(
        np.dot(weights, (S - S_mean)**2)/weights.sum())
S_wmean_fit = [S_mean, S_std_dev]
S_wmean_fit









    Out[63]:





[0.56441467566178327, 0.10118849852916255]

Save data to file



In [64]:

    
sample = data_id

The following string contains the list of variables to be saved. When saving, the order of the variables is preserved.



In [65]:

    
variables = ('sample n_bursts_all n_bursts_do n_bursts_fret '
             'E_kde_w E_gauss_w E_gauss_w_sig E_gauss_w_err E_gauss_w_fiterr '
             'S_kde S_gauss S_gauss_sig S_gauss_err S_gauss_fiterr '
             'E_pr_do_kde E_pr_do_hsm E_pr_do_gauss nt_mean\n')

This is just a trick to format the different variables:



In [66]:

    
variables_csv = variables.replace(' ', ',')
fmt_float = '{%s:.6f}'
fmt_int = '{%s:d}'
fmt_str = '{%s}'
fmt_dict = {**{'sample': fmt_str}, 
            **{k: fmt_int for k in variables.split() if k.startswith('n_bursts')}}
var_dict = {name: eval(name) for name in variables.split()}
var_fmt = ', '.join([fmt_dict.get(name, fmt_float) % name for name in variables.split()]) + '\n'
data_str = var_fmt.format(**var_dict)

print(variables_csv)
print(data_str)









    



sample,n_bursts_all,n_bursts_do,n_bursts_fret,E_kde_w,E_gauss_w,E_gauss_w_sig,E_gauss_w_err,E_gauss_w_fiterr,S_kde,S_gauss,S_gauss_sig,S_gauss_err,S_gauss_fiterr,E_pr_do_kde,E_pr_do_hsm,E_pr_do_gauss,nt_mean

17d, 4501, 1045, 3276, 0.463400, 0.447667, 0.102266, 0.001787, 0.002632, 0.570200, 0.574012, 0.109407, 0.001911, 0.001768, 0.095000, 0.095650, 0.095034, 29.152810



In [67]:

    
# NOTE: The file name should be the notebook name but with .csv extension
with open('results/usALEX-5samples-PR-raw-%s.csv' % ph_sel_name, 'a') as f:
    f.seek(0, 2)
    if f.tell() == 0:
        f.write(variables_csv)
    f.write(data_str)

	E	S	bg_a	bg_aa	bg_d	i_end	i_start	na	naa	nd	nda	nt	size_raw	t_end	t_start	width_ms
0	0.659497	0.686847	2.802	2.346	2.056	319	266	21.198	14.654	10.944	-0.166	46.796	54	0.083104	0.079756	3.3479
1	0.485818	0.628931	2.852	2.387	2.092	783	678	30.148	36.613	31.908	-0.169	98.668	106	0.167744	0.164337	3.4074
2	0.168616	0.934351	4.362	3.650	3.199	1525	1479	5.638	2.350	27.801	-0.259	35.789	47	0.367103	0.361893	5.2105
3	0.442026	0.584341	2.263	1.894	1.660	1702	1644	13.737	22.106	17.340	-0.134	53.182	59	0.386803	0.384100	2.7040
4	0.207736	0.991912	3.170	2.653	2.325	2431	2381	8.830	0.347	33.675	-0.188	42.851	51	0.600688	0.596901	3.7873