This notebook is the 2nd part of the template scan analysis. Its input are the output files of the template_scan_analysis.sh shell script, which is simply a little script calling the digicam-template command on a defined set of raw-data files for you, so everybody can be sure to get the same results.
Let us quickly check if the 3 files are really here:
In [ ]:
ls template_scan_dac_*.h5
Okay now we import all the libs we are going to need.
In [ ]:
import numpy as np
import h5py
import matplotlib.pyplot as plt
%matplotlib inline
from ipywidgets import interact
from matplotlib import colors
from glob import glob
from scipy import interp, interpolate
from scipy.interpolate import BSpline, CubicSpline
from tqdm import tqdm, trange
Now we read the histograms into memory, so we have quick and easy access, also we define a few global variables like x_bin_center, y_bin_center and extent, for plotting and analysis of the histograms. We store the histograms in the H dict using the file names as keys.
In [ ]:
paths = sorted(glob('template_scan_dac_*.h5'))
H = {}
for path in paths:
with h5py.File(path) as f:
dset = f['adc_count_histo']
H[path] = dset[...]
extent = dset.attrs['extent']
x_bin_edges = np.linspace(*extent[:2], dset.shape[1]+1)
y_bin_edges = np.linspace(*extent[2:], dset.shape[2]+1)
x_bin_center = (x_bin_edges[1:] + x_bin_edges[:-1]) / 2
y_bin_center = (y_bin_edges[1:] + y_bin_edges[:-1]) / 2
To give you a feeling what we are working with in the next few cells, let us plot just one example histogram:
In [ ]:
pixel_id = 133
for h in H.values():
plt.imshow(
h[pixel_id].T,
origin='bottom',
extent=extent,
norm=colors.LogNorm()
)
plt.colorbar()
plt.gca().set_aspect('auto')
plt.xlabel('time around 50% max height [ns]')
plt.ylabel('normalized amplitude')
plt.title('example 2d histogram from pixel {}'.format(pixel_id))
break
None
The next function analyse_2d_histo_for_pixel takes one of these histograms we have just seen, calculates the profile (think TProfile, if you are a ROOT user) and fits a cubic spline to the profile (where we think we know it well enough).
Developer remark: This function clearly does more than one thing, hence the general name "analyse". I think, "mode", "mean", "std" could also be methods of a Histogram2D class, then this function basically boils down to calculating the spline, which will look much cleaner.
Also you see this function looks again into the globals: y_bin_centerand x_bin_center, this is also bad, as you see below, when I analyze the combined histogram of all 1296 pixels.
In [ ]:
def analyse_2d_histo_for_pixel(histogram_2d):
_h = histogram_2d
N = _h.sum(axis=-1)
mode = y_bin_center[_h.argmax(axis=-1)]
mean = (_h * y_bin_center[None, :]).sum(axis=-1) / N
squared_sum = (y_bin_center[None, :] - mean[:, None])**2
std = np.sqrt((_h * squared_sum).sum(axis=-1) / (N-1))
average_std = np.nanmean(std)
# For the spline we only use those bins, where we have "enough"
# statistics. I define here "enough" as 100 entries
G = N >= 100
_x = x_bin_center[G]
_y = mean[G]
spl = CubicSpline(_x, _y)
return {
'mode': mode,
'mean': mean,
'std': std,
'N': N,
'spline': spl,
'enough_entries': G,
}
The interactive function below was useful in the beginning to explore the datasets and see what the problems might be. It grew over time, and you see it is quite long. It does however not perform any analysis task. It is just plotting results, so you can ignore it
In [ ]:
@interact
def plot(pid=(0, 1295)):
N = len(H)
fig, ax = plt.subplots(N+1, figsize=(12, 12), sharex=True)
splines = []
for ax_id, (path, h) in enumerate(H.items()):
result = analyse_2d_histo_for_pixel(h[pid])
splines.append((
result['spline'],
np.nanmean(result['std'])
))
G = result['enough_entries']
img = ax[ax_id].imshow(
h[pid].T,
origin='bottom',
extent=extent,
norm=colors.LogNorm()
)
plt.colorbar(img, ax=ax[ax_id])
ax[ax_id].errorbar(
x=x_bin_center[G],
y=result['mean'][G],
yerr=result['std'][G],
fmt='.',
color='red'
)
__x = np.linspace(x_bin_center.min(), x_bin_center.max(), 1000)
ax[ax_id].plot(__x, result['spline'](__x), '-', color='magenta', lw=1)
ax[ax_id].set_aspect('auto')
ax[ax_id].set_ylabel('normalized amplitude')
ax[ax_id].set_title('Path:{}'.format(path))
ax[ax_id].grid()
for spl, average_std in splines:
__x = np.linspace(x_bin_center.min(), x_bin_center.max(), 1000)
ax[-1].plot(__x, spl(__x), '-', label='avg std: {:.2f}'.format(average_std))
ax[-1].grid()
ax[-1].legend()
plt.suptitle('Pixel: {}'.format(pid))
plt.xlabel('time around 50% max height [ns]')
plt.tight_layout()
None
The cell below tries to find the "best" spline for every pixel. You can see above that depending on the DAC setting, the pixel can saturate, which is visible here as a longer but flatter curve.
Other pixel look into LEDs which are comparatively dim, i.e. at low DAC settings these pixel might see no light at all, while at the highest DAC setting they see enough light to produce a nicely defined template curve.
In order to find the "best" (non-saturating) template I say:
I think this method is not perfect, but at the moment, I have no better idea.
In [ ]:
splines = []
for pid in trange(1296):
sub_splines = {}
for path, h in H.items():
result = analyse_2d_histo_for_pixel(h[pid])
max_amplitude = result['spline'](np.linspace(0, 20, 50)).max()
sub_splines[(max_amplitude, np.nanmean(result['std']))] = result['spline']
keys = list(sub_splines.keys())
average_stds = np.array([k[-1] for k in keys])
max_amplitudes = np.array([k[0] for k in keys])
if (average_stds < 0.05).all():
splines.append(sub_splines[keys[np.argmax(max_amplitudes)]])
else:
splines.append(sub_splines[keys[np.argmin(average_stds)]])
The cell below simply plots the splines for all 1296 pixels into one plot, to understand if we really need one template per pixel
In [ ]:
x = []
y = []
_x = np.linspace(x_bin_center.min(), x_bin_center.max(), 1000)
for spl in splines:
x.append(_x)
y.append(spl(_x))
x = np.concatenate(x)
y = np.concatenate(y)
plt.figure(figsize=(18, 12))
histogram_2d, xe, ye, _ = plt.hist2d(
x,
y,
bins=(501, 501),
range=[extent[:2], extent[2:]],
norm=colors.LogNorm()
)
plt.grid()
plt.colorbar()
None
_h = histogram_2d
xc = (xe[1:] + xe[:-1]) / 2
yc = (ye[1:] + ye[:-1]) / 2
N = _h.sum(axis=-1)
mode = yc[_h.argmax(axis=-1)]
mean = (_h * yc[None, :]).sum(axis=-1) / N
squared_sum = (yc[None, :] - mean[:, None])**2
std = np.sqrt((_h * squared_sum).sum(axis=-1) / (N-1))
average_std = np.nanmean(std)
# For the spline we only use those bins, where we have "enough"
# statistics. I define here "enough" as 100 entries
G = N >= 100
_x = xc[G]
_y = mean[G]
spl = CubicSpline(_x, _y)
plt.errorbar(
x=xc,
y=mean,
yerr=std / np.sqrt(1296),
fmt='.',
color='red'
)
And in the cell below, we can see how the pulse_SST-1M_pixel_0.dat looks in comparison to the average template we got from 1296 different pixels. I find it remarkably similar.
In [ ]:
from digicampipe.utils.utils import get_pulse_shape
plt.figure(figsize=(14, 8))
plt.plot(xc, mean, label='mean of 1296 templates')
plt.plot(
xc,
get_pulse_shape(xc, -7.5, np.nanmax(mean), 0),
label='pulse_SST-1M_pixel_0.dat'
)
plt.xlabel('time around 50% max height [ns]')
plt.ylabel('normalized amplitude')
plt.legend(loc='upper right');
In [ ]: