This notebook is part of a tutorial series for the FRETBursts burst analysis software.
In this notebook, we present a typical FRETBursts workflow for μs-ALEX smFRET burst analysis. Briefly, we show how to perform background estimation, burst search, burst selection, compute FRET histograms and ALEX histograms, do sub-population selection and finally, FRET efficiency fit.
Before running the notebook, you can click on menu Cell -> All Output -> Clear to clear all previous output. This will avoid mixing output from current execution and the previously saved one.
We start by loading FRETBursts
:
In [ ]:
from fretbursts import *
Note that FRETBursts version string tells you the exact FRETBursts version (and revision) in use. Storing the version in the notebook helps with reproducibility and tracking software regressions.
Next, we initialize the default plot style for the notebook (under the hood it uses seaborn):
In [ ]:
sns = init_notebook()
Note that the previous command has no output. Finally, we print the version of some dependencies:
In [ ]:
import lmfit; lmfit.__version__
In [ ]:
import phconvert; phconvert.__version__
The full list of smFRET measurements used in the FRETBursts tutorials can be found on Figshare.
This is the file we will download:
In [ ]:
url = 'http://files.figshare.com/2182601/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5'
url
variable above to download your own data file.
This is useful if you are executing FRETBursts online and you want to use
your own data file. See
First Steps.
Here, we download the data file and put it in a folder named data
,
inside the notebook folder:
In [ ]:
download_file(url, save_dir='./data')
NOTE: If you modified the
url
variable providing an invalid URL the previous command will fail. In this case, edit the cell containing theurl
and re-try the download.
Use one of the following 2 methods to select a data file.
Here, we can directly define the file name to be loaded:
In [ ]:
filename = "./data/0023uLRpitc_NTP_20dT_0.5GndCl.hdf5"
filename
Now filename
contains the path of the file you just selected.
Run again the previous cell to select a new file. In a following cell
we will check if the file actually exists.
Alternatively, you can select a data file with an "Open File" windows. Note that, since this only works in a local installation, the next commands are commented (so nothing will happen when running the cell).
If you want to try the File Dialog, you need to remove the #
signs:
In [ ]:
# filename = OpenFileDialog()
# filename
Let's check that the file exists:
In [ ]:
import os
if os.path.isfile(filename):
print("Perfect, file found!")
else:
print("Sorry, file:\n%s not found" % filename)
We can finally load the data and store it in a variable called d
:
In [ ]:
d = loader.photon_hdf5(filename)
If you don't get any message, the file is loaded successfully.
We can also set the 3 correction coefficients:
leakage
dir_ex
(ALEX-only)gamma
In [ ]:
d.leakage = 0.11
d.dir_ex = 0.04
d.gamma = 1.
NOTE: at any later moment, after burst search, a simple reassignment of these coefficient will update the burst data with the new correction values.
In [ ]:
d.ph_times_t, d.det_t
We need to define some ALEX parameters:
In [ ]:
d.add(det_donor_accept = (0, 1),
alex_period = 4000,
offset = 700,
D_ON = (2180, 3900),
A_ON = (200, 1800))
Here the parameters are:
det_donor_accept
: donor and acceptor channelsalex_period
: length of excitation period (in timestamps units)D_ON
and A_ON
: donor and acceptor excitation windowsoffset
: the offset between the start of alternation and start of timestamping
(see also Definition of alternation periods).To check that the above parameters are correct, we need to plot the histogram of timestamps (modulo the alternation period) and superimpose the two excitation period definitions to it:
In [ ]:
bpl.plot_alternation_hist(d)
If the previous alternation histogram looks correct, the corresponding definitions of the excitation periods can be applied to the data using the following command:
In [ ]:
loader.alex_apply_period(d)
If the previous histogram does not look right, the parameters in the d.add(...)
cell can be modified and checked by running the histogram plot cell until everything looks fine. Don't forget to apply the
parameters with loader.usalex_apply_period(d)
as a last step.
NOTE: After applying the ALEX parameters a new array of timestamps containing only photons inside the excitation periods is created (name
d.ph_times_m
). To save memory, by default, the old timestamps array (d.ph_times_t
) is deleted. Therefore, in the following, when we talk about all-photon selection we always refer to all photons inside both excitation periods.
The entire measurement data is now stored in the variable d
. Printing it
will give a compact representation containing the file-name and additional parameters:
In [ ]:
d
To check the measurement duration (in seconds) run:
In [ ]:
d.time_max
In [ ]:
dplot(d, timetrace);
Here, dplot
is a generic wrapper (the same for all plots)
that takes care of setting up the figure, title and axis
(in the multispot case dplot
creates multi-panel plot).
The second argument, timetrace
, is the actual plot function.
All the eventual additional arguments passed to dplot
are,
in turn, passed to the plot function (e.g. timetrace
).
If we look at the documentation for timetrace
function we notice that it accepts a long list of arguments.
In python, when an argument is not specified, it will take the default
value specified in the function definition (see previus link).
As an example, to change the bin size (i.e. duration) of the timetrace histogram,
we can look up in the timetrace
documentation
and find that the argument we need to modify is binwidth
(we can also see that the default value is 0.001
seconds).
We can then re-plot the timetrace using a bin of 0.5 ms:
In [ ]:
dplot(d, timetrace, binwidth=0.5e-3);
The timetrace is computed between tmin
and tmax
(by default 0 and 200s),
but as you can see is displayed only between 0 an 1 second, just because these
are the default x-axis limits. The axis limits can be changes by using the
standard matplotlib command plt.xlim()
.
On the other hand, to change the range where the timetrace is computed,
we pass the additional arguments tmin
and tmax
as follows:
In [ ]:
dplot(d, timetrace, binwidth=0.5e-3, tmin=50, tmax=150)
plt.xlim(51, 52);
When using FRETBursts in a notebook, all plots are static by default.
This is because we use the so called inline
backend of matplotlib.
If you want to manipulate figures interactively, you can switch
to the interactive notebook
backend with:
%matplotlib notebook
to go back to inline use:
%matplotlib inline
NOTE: Currently, the notebook
backend is incompatible with the QT backend.
If in a session you activate the notebook
backend, then switching to the QT backend requires
restarting the notebook. Conversely, you can switch between inline
and notebook
or between inline
and qt4
backends in the same session wihtou issues.
See also:
- bpl.timetrace function documentation
- bpl.ratetrace function documentation
- Intensity timetrace and Rate-timetrace, a later section in this tutorial.
As a first step of the analysis, we need to estimate the background.
The assumption is that the background is a Poisson process and therefore
the corresponding inter photon delays are exponentially distributed. Since the
background can change during the measurement, a new estimation is
computed every time_s
seconds (this time is called the background period).
The inter photon delay distribution contains both single-molecule signal and background, the latter being the only one we are interested in and the former being in general much shorter. Therefore, a threshold is needed to discriminate between the exponential tail and the single-molecule peak.
Choosing a threshold and fitting the exponential tail are two different problems. FRETBursts provides several ways to specify the minimum threshold and different functions to fit the exponential tail.
We will go over three different methods in increasing order of complexity (and recommendability)
For more information see:
Data.calc_bg()
background
(e.g. bg
) moduleexp_fitting
module.Let start with a standard Maximum Likelihood (ML) background fit with a minimum tail threshold of 500μs:
In [ ]:
d.calc_bg(bg.exp_fit, time_s=1000, tail_min_us=500)
We can look at how the fit looks with:
In [ ]:
dplot(d, hist_bg, show_fit=True)
Note that the fits are not very good. This is understandable because we used a single threshold for all the photon streams, each one having a quite different background.
To improve the fit, we can try specifying a threshold for each channel. This method is bit ad-hoc but it may work well when the thresholds are properly choosen.
In [ ]:
d.calc_bg(bg.exp_fit, time_s=1000, tail_min_us=(800, 4000, 1500, 1000, 3000))
In [ ]:
dplot(d, hist_bg, show_fit=True);
For ALEX measurements, the tuple passed to
tail_min_us
in order to define the thresholds needs to contain
5 values corresponding the 5 distinct photon streams (the all-photon stream
Data.ph_streams
attribute:
In [ ]:
d.ph_streams
Finally, is possible to let FRETBursts infer the threshold automatically with:
In [ ]:
d.calc_bg(bg.exp_fit, time_s=1000, tail_min_us='auto', F_bg=1.7)
Which results in the following fit plot:
In [ ]:
dplot(d, hist_bg, show_fit=True);
Under the hood, this method estimates the threshold automatically according to this formula:
threshold_auto = F_bg / coarse_background_rate
where F_bg
is the fit function input argument (by default 1.7)
and coarse_background_rate
is an initial background estimation
performed with fixed threshold. This method is concemptually an
iterative method to compute the threshold that is stopped
after the second iteration (this is usually more than enough for
accurate estimates).
Of the three methods here described, the latter is the recommended one since it works well and without user intervention in a wide range of experimental conditions.
In [ ]:
d.calc_bg(bg.exp_fit, time_s=30, tail_min_us='auto', F_bg=1.7)
In [ ]:
dplot(d, timetrace_bg)
In [ ]:
d.bg
We can also get the average background for each channel:
In [ ]:
d.bg_mean
The first step of burst analysis is the burst search.
We will use the sliding-window algorithm on all photons. Note that "all photons", as mentioned before, means all photons selected in the alternation histogram. An important variation compared to the classical sliding-windows is that the threshold-rate for burst start is computed as a function of the background and changes when the background changes during the measurement.
To perform a burst search evaluating the photon rate with
10 photons (m=10
), and selecting a minimum rate 6 times larger than
the background rate (F=6) calculated with all photons (default):
In [ ]:
d.burst_search(L=10, m=10, F=6)
The previous command performs the burst search, corrects the bursts sizes for background, spectral leakage and direct excitation, and computes $\gamma$-corrected FRET and Stoichiometry.
See the
burst_search
documentation for more details.
We can plot the resulting FRET histogram using the following command:
In [ ]:
dplot(d, hist_fret);
All pre-defined plots follow this pattern:
call the generic dplot()
function, passing 2 parameters:
d
in this case) hist_fret
)In some case we can add other optional parameters to tweak the plot.
All plot functions start with hist_
for histograms,
scatter_
for scatter-plots or timetrace_
for plots as a function
of measurement time. You can use autocompletion to find all
plot function or you can look in bursts_plot.py
where
all plot functions are defined.
Instead of hist_fret
we can use hist_fret_kde
to add a KDE overlay. Also, we can plot a weighted histogram by passing an additional parameter weights
:
In [ ]:
dplot(d, hist_fret, show_kde=True);
dplot(d, hist_fret, show_kde=True, weights='size');
You can experiment with different weighting schema (for all
supported weights see get_weigths()
function in fret_fit.py
).
When we performed the burst search, we specified L=10
without
explaining what this parameter means. L is traditionally the minimum size
(number of photons) for a burst: smaller bursts will be rejected.
By setting L=m (10 in this case) we are deciding to not discard
any burst (because the smallest detected burst has at least m counts).
Selecting the bursts in a second step, by applying a minimum burst size criterion, results in a more accurate and unbiased selection.
For example, we can select bursts with more than 30 photons (after
background, gamma, leakage and direct excitation corrections)
and store the result in a new
Data()
variable ds
:
In [ ]:
ds = d.select_bursts(select_bursts.size, th1=30)
By defaults the burst size includes donor and acceptor photons
during donor excitation. To add acceptor photons during
acceptor excitation (naa
), we add the parameter add_naa=True
:
In [ ]:
ds = d.select_bursts(select_bursts.size, add_naa=True, th1=30)
Similar to plot functions, all selection functions
are defined in select_bursts.py
and you can access them by typing
select_bursts.
and using the TAB key for autocompletion.
See also:
- Burst selection in the documentation. In particular the function
select_bursts.size
andData.select_bursts
.
To replot the FRET histogram after selection (note that now
we are passing ds
to the plot function):
In [ ]:
dplot(ds, hist_fret);
Note how the histogram exhibits much more clearly defined peaks after burst selection.
Under the hood the previous hist_fret
plot creates a MultiFitter
object for $E$ values. This object, stored as ds.E_fitter
, operates
on multi-channel data and computes the histogram, KDE and can fit
the histogram with a model (lmfit.Model).
Now, just for illustration purposes, we fit the previous histogram with 3 Gaussians, using the already created ds.E_fitter
object:
In [ ]:
ds.E_fitter.fit_histogram(mfit.factory_three_gaussians(), verbose=False)
In [ ]:
dplot(ds, hist_fret, show_model=True);
The bin width can be changed with binwidth
argument. Alternatively,
an arbitrary array of bin edges can be passed in bins
(overriding binwidth
).
We can customize the appearance of this plot (type
hist_fret?
for the complete set of arguments).
For example to change from a bar plot to a line-plot
we use the hist_style
argument:
In [ ]:
dplot(ds, hist_fret, show_model=True, hist_style='line')
We can customize the line-plot, bar-plot, the model plot and the KDE plot by passing dictionaries with matplotlib style. The name of the arguments are:
hist_plot_style
: style for the histogram line-plothist_bar_style
: style for the histogram bar-plotmodel_plot_style
: style for the model plotkde_plot_style
: style for the KDE plotAs an example:
In [ ]:
dplot(ds, hist_fret, show_model=True, hist_style='bar', show_kde=True,
kde_plot_style = dict(linewidth=5, color='orange', alpha=0.6),
hist_plot_style = dict(linewidth=3, markersize=8, color='b', alpha=0.6))
plt.legend();
Similarly, we can plot the burst size using all photons
(type hist_size?
to learn about all plot options):
In [ ]:
dplot(ds, hist_size, add_naa=True);
Or plot the burst size histogram for the different components:
In [ ]:
dplot(ds, hist_size_all);
NOTE: The previous plot may generate a benign warning due to the presence of zeroes when switching to log scale. Just ignore it.
A scatterplot of Size vs FRET is created by:
In [ ]:
dplot(ds, scatter_fret_nd_na)
xlim(-1, 2)
We can further select only bursts smaller than 300 photons to get rid of multi-molecule events:
In [ ]:
ds2 = ds.select_bursts(select_bursts.size, th2=300)
and superimpose the two histograms before and after selection to see the difference:
In [ ]:
ax = dplot(ds2, hist_fret, hist_style='bar', show_kde=True,
hist_bar_style = dict(facecolor='r', alpha=0.5,
label='Hist. no large bursts'),
kde_plot_style = dict(lw=3, color='m',
label='KDE no large bursts'))
dplot(ds, hist_fret, ax=ax, hist_style='bar', show_kde=True,
hist_bar_style = dict(label='Hist. with large bursts'),
kde_plot_style = dict(lw=3, label='KDE with large bursts'))
plt.legend();
NOTE: It is not necessarily true that bursts with more that 300 photons represents multiple molecules. To asses the valididty of this assumption it can be useful to plot the peak count rates in each burst. See
hist_burst_phrate
for this kind of plot.
In [ ]:
ds.E_fitter.find_kde_max(np.r_[0:1:0.0002], xmin=0.2, xmax=0.6)
and plot it with show_kde_peak=True
, we also use show_fit_value=True
to show a box with the fitted value:
In [ ]:
dplot(ds, hist_fret, hist_style='line',
show_fit_value=True,
show_kde=True, show_kde_peak=True);
Instead of using the KDE, we can use the peak position as fitted from a gaussian model.
In [ ]:
ds.E_fitter.fit_histogram(mfit.factory_three_gaussians(), verbose=False)
To select which peak to show we use fit_from='p1_center'
:
In [ ]:
dplot(ds, hist_fret, hist_style='line',
show_fit_value=True, fit_from='p2_center',
show_model=True);
The string 'p2_center'
is the name of the parameter of the
gaussian fit that we want to show in the text box. To see all
the parameters of the model we look in:
In [ ]:
ds.E_fitter.params # <-- pandas DataFrame, one row per channel
We can create a simple E-S scatter plot with scatter_alex
:
In [ ]:
dplot(ds, scatter_alex, figsize=(4,4), mew=1, ms=4, mec='black', color='purple');
We can also plot the ALEX histogram with a scatterplot overlay using hist2d_alex
:
In [ ]:
dplot(ds, hist2d_alex);
Finally we can also plot an ALEX histogram and marginals (joint plots) as follow (for more options see: Example - usALEX histogram):
In [ ]:
alex_jointplot(ds);
To get rid of the large donor-only population, we can simply select bursts with at least 10 photons in the acceptor channel (during acceptor excitation). At the same time, with a burst size selection using Dex photons we get rid of the A-only population:
In [ ]:
ds = d.select_bursts(select_bursts.size, th1=20)
ds2 = ds.select_bursts(select_bursts.naa, th1=10)
alex_jointplot(ds2);
As you can see, we remained with only the FRET sub-populations.
See next sections show how to select a region on the E/S histogram to isolate a subpopulation.
#
) from the lines
containing the %matplotlib
command.
To select bursts graphically, we need to open the ALEX histogram in a new (QT) window, drag the mouse to define a selection and have it printed here in the notebook.
Here we describe how to do it in 3 steps.
Step 1 Switch the plot modality to QT (i.e. opens graphs in a separate window):
In [ ]:
# Switches to open plot in external window
#%matplotlib qt
Step 2 Plot the E-S histogram in an external windows. There you can use the mouse to select a region:
In [ ]:
# ALEX histogram with GUI selection enabled
dplot(ds, hist2d_alex, gui_sel=True)
The region selection is printed above.
Step 3 Restore the normal inline plotting (no more external windows).
In [ ]:
# Switch back to show plots inline in the notebook
#%matplotlib inline
To apply a selection based on E-S values, we can paste the values obtained from the previous plot (or we can type them in manually).
The selection function used here is select_bursts.ES
.
The same E and S boundaries can define either a rectangular
or an elliptical selection by using
respectively rect=True
or rect=False
.
Here we use the elliptical selection:
In [ ]:
roi = dict(E1=-0.07, E2=1.17, S1=0.18, S2=0.70, rect=False)
d_fret_mix = ds.select_bursts(select_bursts.ES, **roi)
By plotting the FRET histogram we can double check that the selection has been applied:
In [ ]:
g = alex_jointplot(d_fret_mix)
bpl.plot_ES_selection(g.ax_joint, **roi);
Now we can further separate high- and low FRET sub-populations.
In [ ]:
roi_high_fret = dict(E1=0.65, E2=1.09, S1=-0.13, S2=0.96)
d_high_fret = d_fret_mix.select_bursts(select_bursts.ES, **roi_high_fret)
roi_low_fret = dict(E1=-0.19, E2=0.64, S1=-0.05, S2=0.92)
d_low_fret = d_fret_mix.select_bursts(select_bursts.ES, **roi_low_fret)
In [ ]:
alex_jointplot(d_high_fret);
alex_jointplot(d_low_fret);
We can for example compute the ratio of high- and low-fret bursts:
In [ ]:
d_low_fret.num_bursts / d_high_fret.num_bursts
To plot a burst-width histogram, we use hist_width
instead of hist_fret
:
In [ ]:
dplot(d_low_fret, hist_width)
To use a larger bin size, plots two sub-populations (in different color) and add a legend:
In [ ]:
ax = dplot(d_high_fret, hist_width, bins=(0, 10, 0.2))
dplot(d_low_fret, hist_width, bins=(0, 10, 0.2), ax=ax)
plt.legend(['High-FRET population', 'Low-FRET population']);
Finally, we compute the mean burst width for each subpopulation:
In [ ]:
millisec = d.clk_p * 1e3
mean_b_width_low_fret = d_low_fret.mburst[0].width.mean() * millisec
mean_b_width_high_fret = d_high_fret.mburst[0].width.mean() * millisec
print('Mean burst width: %.1f ms (high-fret), %.1f (low-fret)' %
(mean_b_width_high_fret, mean_b_width_low_fret))
We can fit a FRET distribution to any model. For example,
we will fit the FRET selection (2 FRET sub-populations) with 2 Gaussians.
The fitting model is a Model
object
from the lmfit
library.
The first step, previously performed implicitely by the hist_fret()
plot function, is to create a MultiFitter
object for $E$ and/or $S$, by calling bext.bursts_fitter().
With MultiFitter
objects, we can compute histograms,
KDEs and fit the histogram in one single step, as in the following example:
In [ ]:
E_fitter = bext.bursts_fitter(d_fret_mix, 'E', binwidth=0.03, bandwidth=0.03,
model=mfit.factory_two_gaussians())
S_fitter = bext.bursts_fitter(d_fret_mix, 'S', binwidth=0.03, bandwidth=0.03,
model=mfit.factory_gaussian())
However if we want to modify the model (for example to add a
constrain) we need to perform the fit in a second step.
To skip the fitting, we simply avoid passing a model
:
In [ ]:
E_fitter = bext.bursts_fitter(d_fret_mix, 'E', binwidth=0.03, bandwidth=0.03)
S_fitter = bext.bursts_fitter(d_fret_mix, 'S', binwidth=0.03, bandwidth=0.03)
Now we create a model and initialize the parameters
using mfit.factory_two_gaussians()
(see Model factory functions
for a list of pre-defined models in FRETBursts):
In [ ]:
model = mfit.factory_two_gaussians(add_bridge=True)
We can see the list of parameters, initial values and constraints:
In [ ]:
model.print_param_hints()
We can change initial values and/or constrains (bounds):
In [ ]:
model.set_param_hint('p1_center', value=0.3, min=-0.1, max=0.6)
model.set_param_hint('p2_center', value=0.85, min=0.5, max=1.1)
model.print_param_hints()
Finally, we fit the histogram with one of the supported minimization methods (the default is least-squares):
In [ ]:
E_fitter.fit_histogram(model=model, verbose=False) # default method is 'leastsq'
#E_fitter.fit_histogram(model=model, method='nelder') # example using simplex method
To plot the model with the fitted parameters on top of the FRET
histogram, we add show_model=True
as seen before:
In [ ]:
dplot(d_fret_mix, hist_fret, show_model=True);
The fitting results of lmfit
models are stored in lmfit.ModelResult objects. In FRETBursts, these objects are available in MultiFitter.fit_res
:
In [ ]:
E_fitter.fit_res[0]
Here the [0]
indicates CH=0. This index is used in multispot measurements, in which
there are several channels.
The values of the fitted parameters are available in the best_values
dictionary:
In [ ]:
results = E_fitter.fit_res[0]
In [ ]:
results.params.pretty_print(columns=['value'])
We can also print a complete report of fitted parameters including reduced $\chi^2$, error ranges ($\pm 1 \sigma$) and correlations:
In [ ]:
print(results.fit_report(min_correl=0.5))
To customize the printed report, see lmfit.fit_report() docs.
We can also take a look at the initial parameters:
In [ ]:
results.init_params.pretty_print()
We can compute more accurate confidence-intervals (note that it can take several seconds, see lmfit docs for the method details):
In [ ]:
ci = results.conf_interval()
lmfit.report_ci(ci)
Finally, FRETBursts's
MultiFitter object
(e.g. E_fitter
) stores the fitted parameters in E_fitter.params
as pandas DataFrame:
In [ ]:
E_fitter.params
For more information on fitting see:
To export data computed by FRETBursts, you need to find first where the data is stored or, for data computed on fly, which function/method is used to computed it.
Most data computed by FRETBursts is stored in the Data
object
(see the Data docs for the list of attributes).
For example:
Data.nd
, Data.na
, Data.naa
Data.ph_times_m
All these attributes are list of arrays, one per excitation spot.
This means that, even for single-spot data, we need to use indexing ([0]
)
to obtain the array. For example:
In [ ]:
d.nd[0]
Any numpy array can be saved to disk with the method .tofile()
. For example:
In [ ]:
d.nd[0].tofile('n_dd.csv', sep=',')
The background data is stored in Data.bg
(and Data.bg_mean
) attribute (see Getting the background rates in this notebook). This attribute is dict containing lists containing arrays (scalars) and ca be saved to disk similarly.
In [ ]:
bursts = bext.burst_data(ds)
bursts
NOTE: For multi-spot data use the
ich
argument to get burst data for the different spots.
This table (as any pandas.DataFrame
) can be saved in a CSV text file with:
In [ ]:
bursts.to_csv('bursts.csv')
For an initial inspection of a data file, it is common to compute an intensity timetrace plot (or timetrace for short). We also can compute a similar plot called ratetrace, which does not bin counts but shows the instantaneous count rate. In both cases, it is convenient to scroll along the plot interactively.
In FRETBursts, the two timetrace and ratetrace plots support interactive scrolling. We just need to switch from the inline backend to QT, as we did before.
#
.
Here are the corresponding commands:
In [ ]:
#%matplotlib qt
In [ ]:
#dplot(ds, ratetrace, scroll=True, bursts=True)
In [ ]:
#dplot(ds, timetrace, tmax=600, scroll=True, bursts=True)
In [ ]:
#%matplotlib inline
In [ ]: