In [1]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')
My goal is to build a script or notebook that will produce bicorrelation plots for the FNPC fission CVT project.
The plots will be 2d histograms of $\Delta t_1$ vs. $\Delta t_2$ for two detectors, where $\Delta t$ for each detector is the time of an interaction relative to the time of the corresponding fission chamber interaction.
Steps in the analysis:
1) Load the data from bicorr1
(produced by bicorr.generate_bicorr
)
2) Make some benchmark plots to make sure the bicorr1
data looks like what we expect
3) Fill a giant matrix with histogram information about detector pair interactions
In [1]:
import matplotlib.pyplot as plt
import matplotlib.colors
import numpy as np
import os
import scipy.io as sio
import sys
import time
import inspect
import pandas as pd
from tqdm import *
# Plot entire array
np.set_printoptions(threshold=np.nan)
In [2]:
import seaborn as sns
sns.set_palette('spectral')
In [3]:
sys.path.append('../scripts/')
import bicorr as bicorr
In [4]:
%load_ext autoreload
%autoreload 2
In [5]:
help(bicorr.load_bicorr)
In [5]:
bicorr_data = bicorr.load_bicorr(1, root_path = '../datar')
In [6]:
bicorr_data.shape
Out[6]:
I want to work with larger data, so I'm going to pull 10000 lines from the bicorr1
file from flux (rather than the tiny bicorr1
file produced from cced1_part
.
I used the same technique as before on flux:
head -n 10000 bicorr1 > bicorr1_part
And then I copied that to datar > 1
In [5]:
bicorr_data = bicorr.load_bicorr(bicorr_path = '../datar/1/bicorr1_part')
In [6]:
bicorr_data.shape
Out[6]:
I can parse this array in the same way I did with the cced
file.
In [9]:
# Print all info from first line
print(bicorr_data[:][0])
# Print event number from line 0
print(bicorr_data['event'][0])
In [11]:
help(bicorr.save_fig_to_folder)
In [7]:
fig_folder = 'fig'
In [13]:
# Which detector pairs fired?
plt.plot(bicorr_data['det1ch'],bicorr_data['det2ch'],'.k')
plt.xlabel('Detector 1 channel')
plt.ylabel('Detector 2 channel')
plt.title('Detector pairs with bicorrelation events')
bicorr.save_fig_to_folder('bicorr_pairs_scatter',fig_folder)
plt.show()
In [14]:
plt.figure(figsize=(7,5))
plt.hist2d(bicorr_data['det1ch'],bicorr_data['det2ch'],bins=np.arange(-0.5,46.5,1),cmin=1,cmap='viridis')
plt.ylim([-.5,46.5])
plt.colorbar()
plt.grid(True, which='both')
plt.xticks([i for i in np.arange(0,46,4)])
plt.yticks([i for i in np.arange(0,46,4)])
plt.xlabel('Detector 1 channel')
plt.ylabel('Detector 2 channel')
plt.title('Frequency of detector pair interactions')
bicorr.save_fig_to_folder('bicorr_pairs_2dhist',fig_folder)
plt.show()
I should take some time to explain the features on this plot. There are a few interesting observables:
In [18]:
# Plot event number vs. line in
plt.plot(bicorr_data['event'])
plt.xlabel('Line number')
plt.ylabel('Event number')
plt.title('Event number vs. line number')
bicorr.save_fig_to_folder('/bicorr_all_evnum',fig_folder)
plt.show()
Try it as a function
In [20]:
bicorr.bicorr_checkpoint_plots(bicorr_data, show_flag=True)
We are going to store the bicorr
events in a giant array of histograms called bicorr_hist_master
, or bhm
for short. The dimensions in the matrix will be:
0: 990 in length: detector pair index (indicated using dictionary det_pair_dict
)
1: 4 in length: interaction type (0=nn, 1=np, 2=pn, 3=pp)
2: $\Delta t_1$ for detector 1
3: $\Delta t_2$ for detector 2
Thus, calling bicorr_hist_master[0][0][:][:]
will produce the 2-d histogram for the 0th detector pair (ch 1 and ch 2) for nn interactions.
Set up this matrix.
In [21]:
help(bicorr.build_dt_bin_edges)
In [6]:
dt_bin_edges, num_dt_bins = bicorr.build_dt_bin_edges(print_flag=True)
In [7]:
# Number of bins in interaction type
num_intn_types = 4 #(0=nn, 1=np, 2=pn, 3=pp)
In [8]:
# What are the unique detector numbers? Use same technique as in bicorr.py
chList, fcList, detList, num_dets, num_det_pairs = bicorr.build_ch_lists(print_flag=True)
In [25]:
# Pre-allocate matrix
# Note: This is very memory intensive... can't run it too many times or python crashes.
bicorr_hist_master = np.zeros((num_det_pairs,num_intn_types,num_dt_bins,num_dt_bins),dtype=np.uint32)
In [14]:
bicorr_hist_master.shape
Out[14]:
How large will this be when we store it to disk?
In [15]:
bicorr_hist_master.nbytes/1e9
Out[15]:
I had originally made the time range go from -200 to 200 ns, making for a 37 GB array. Now I am going from -50 to 200 which cuts it down to 15.84 GB. Now I can have two loaded in memory at a time. The further option still exists to make the time binning coarser to .5 ns instead of .25 ns.
In [11]:
bicorr_hist_master = bicorr.alloc_bhm(num_det_pairs,num_intn_types,num_dt_bins)
I'm updating this with latest methods. I am not going to rerun the first two attempts, so they may not work in the future. I'm only going to update the third attempt, filling the histogram line-by-line.
np.where
to find indices, fill with np.histogram2d
Find the indices of the lines that will fill each detector type (based on detector pair and interaction type).
If this doesn't work, go line by line and fill the bins one at a time. But that's so boring.
In [10]:
# Pre-allocate matrix
# Note: This is very memory intensive... can't run it too many times or python crashes.
bicorr_hist_master = bicorr.alloc_bicorr_hist_master(num_det_pairs,num_intn_types,num_dt_bins)
In [23]:
# Loaded above in Step 1
bicorr_data.shape
Out[23]:
In [24]:
# Store a bunch of indices. Hopefully I won't run out of memory...
# Find event indices for each event type
type0_indices = np.where(np.logical_and(bicorr_data['det1par']==1,bicorr_data['det2par']==1))
type1_indices = np.where(np.logical_and(bicorr_data['det1par']==1,bicorr_data['det2par']==2))
type2_indices = np.where(np.logical_and(bicorr_data['det1par']==2,bicorr_data['det2par']==1))
type3_indices = np.where(np.logical_and(bicorr_data['det1par']==2,bicorr_data['det2par']==2))
In [25]:
np.sum(type0_indices[0].shape+type1_indices[0].shape+type2_indices[0].shape+type3_indices[0].shape)
Out[25]:
How can I quickly return the index?
In [27]:
# Set up dictionary for returning detector pair index
det_df = bicorr.load_det_df()
dict_index_to_pair = det_df['d1d2'].to_dict()
dict_pair_to_index = {v: k for k, v in dict_index_to_pair.items()}
# Loop through all pairs and fill the histogram
start_time = time.time()
count = 0 # Running total of selected indices
count2 = 0 # Running sum of histogram counts
for i1 in np.arange(0, num_dets): # Just run det1ch = 1 for the time being due to memory constraints.
det1ch = detList[i1]
print(str(det1ch)+'---')
for i2 in np.arange(i1+1,num_dets):
det2ch = detList[i2]
det_pair_indices = np.where(np.logical_and(bicorr_data['det1ch']==det1ch,bicorr_data['det2ch']==det2ch))
# type interactions
selected_type0 = np.intersect1d(det_pair_indices, type0_indices)
selected_type1 = np.intersect1d(det_pair_indices, type1_indices)
selected_type2 = np.intersect1d(det_pair_indices, type2_indices)
selected_type3 = np.intersect1d(det_pair_indices, type3_indices)
count = count + np.sum(selected_type0.shape+selected_type1.shape+selected_type2.shape+selected_type3.shape)
# Adding these count2 lines for debugging to search for the error
count2 += np.sum(np.histogram2d(bicorr_data['det1t'][selected_type0],bicorr_data['det2t'][selected_type0],bins=dt_bin_edges)[0])
count2 += np.sum(np.histogram2d(bicorr_data['det1t'][selected_type1],bicorr_data['det2t'][selected_type1],bins=dt_bin_edges)[0])
count2 += np.sum(np.histogram2d(bicorr_data['det1t'][selected_type2],bicorr_data['det2t'][selected_type2],bins=dt_bin_edges)[0])
count2 += np.sum(np.histogram2d(bicorr_data['det1t'][selected_type3],bicorr_data['det2t'][selected_type3],bins=dt_bin_edges)[0])
bicorr_hist_master[dict_pair_to_index[100*det1ch+det2ch],0,:,:] = np.histogram2d(bicorr_data['det1t'][selected_type0],bicorr_data['det2t'][selected_type0],bins=dt_bin_edges)[0]
bicorr_hist_master[dict_pair_to_index[100*det1ch+det2ch],1,:,:] = np.histogram2d(bicorr_data['det1t'][selected_type1],bicorr_data['det2t'][selected_type1],bins=dt_bin_edges)[0]
bicorr_hist_master[dict_pair_to_index[100*det1ch+det2ch],2,:,:] = np.histogram2d(bicorr_data['det1t'][selected_type2],bicorr_data['det2t'][selected_type2],bins=dt_bin_edges)[0]
bicorr_hist_master[dict_pair_to_index[100*det1ch+det2ch],3,:,:] = np.histogram2d(bicorr_data['det1t'][selected_type3],bicorr_data['det2t'][selected_type3],bins=dt_bin_edges)[0]
print('Run time: ', time.time()-start_time)
print('Sum of bicorr_hist_master: ', np.sum(bicorr_hist_master))
print('Running total of type_indices_#: ', count)
print('Running total of histogram sum: ', count2)
This shows that events are getting lost at some point. The total number of events in the histogram do not match the number of events in bicorr_data
. Where are they going? It is possible that those are events outside of the boundaries of dt_bin_edges
. How many events are getting lost?
In [28]:
# How many lines are left out of the histogram?
count-count2
Out[28]:
Are the missing events those in which the time is out of range of the time axes?
In [32]:
t_min = dt_bin_edges[0]; t_max = dt_bin_edges[-1]
print(t_min,t_max)
over1 = np.ndarray.flatten(np.argwhere(bicorr_data['det1t']>=t_max))
under1 = np.ndarray.flatten(np.argwhere(bicorr_data['det1t']<=t_min))
over2 = np.ndarray.flatten(np.argwhere(bicorr_data['det2t']>=t_max))
under2 = np.ndarray.flatten(np.argwhere(bicorr_data['det2t']<=t_min))
overunder = np.concatenate([over1, under1, over2, under2])
np.unique(overunder).shape
Out[32]:
There are more events outside of the range of dt_bin_edges
than are being left out of the histogram. So I am not sure what this means. Last time the numbers were exactly the same.
I believe the problem is in using np.histogram2d, which returns a overflow warning. I believe the problem is in passing in the time stamps as floats, there is an overflow error, which is causing some counts to be lost.
I am not going to spend any more time on this because the next section shows that the line-by-line method is much faster and I will use that method instead.
np.histogramdd
np.histogramdd
, documentation here: https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogramdd.html#numpy.histogramdd
Compute the multidimensional histogram of some data.
The four dimensions are:
det1t
det2t
This will take some work, since I will have to figure out how to provide the bins of the first two dimensions. Keep this in mind, but move on to the next method.
In [9]:
bhm = bicorr.alloc_bhm(num_det_pairs,num_intn_types,num_dt_bins)
In [10]:
# Set up dictionary for returning detector pair index
det_df = bicorr.load_det_df()
dict_pair_to_index, dict_index_to_pair = bicorr.build_dict_det_pair(det_df)
In [11]:
# Type index
dict_type_to_index = {11:0, 12:1, 21:2, 22:3}
# Time indices
dt_min = np.min(dt_bin_edges); dt_max = np.max(dt_bin_edges)
dt_step = dt_bin_edges[1]-dt_bin_edges[0]
I will use the following equation to determine the time bin.
$$ \text{bin}(t) = \lfloor \frac{t-t_{min}}{\Delta t}\rfloor$$
In [35]:
count_loss = 0
start_time = time.time()
for i in tqdm(np.arange(bicorr_data.shape[0]),ascii=True):
# What is the corresponding bin number for the four dimensions?
## Detector pair index
pair_i = dict_pair_to_index[bicorr_data[i]['det1ch']*100+bicorr_data[i]['det2ch']]
## Event type index
type_i = dict_type_to_index[bicorr_data[i]['det1par']*10+bicorr_data[i]['det2par']]
## Time 1 index
t1_i = int(np.floor((bicorr_data[i]['det1t']-dt_min)/dt_step))
t1_i_check = np.logical_and(t1_i>=0, t1_i<num_dt_bins) # Within range?
## Time 2 index
t2_i = int(np.floor((bicorr_data[i]['det2t']-dt_min)/dt_step))
t2_i_check = np.logical_and(t2_i>=0, t2_i<num_dt_bins) # Within range?
if np.logical_and(t1_i_check, t2_i_check):
# Increment the corresponding bin
bhm[pair_i,type_i,t1_i,t2_i] += 1
else:
count_loss += 1
print(np.sum(bhm))
print(time.time()-start_time)
In [36]:
count_loss
Out[36]:
This method is much faster so we will use it.
In [37]:
print(inspect.getsource(bicorr.fill_bhm))
In [12]:
bhm = bicorr.alloc_bhm(num_det_pairs,num_intn_types,num_dt_bins)
In [13]:
bhm = bicorr.fill_bhm(bhm, bicorr_data, det_df, dt_bin_edges)
As developed in my notebook implement_sparse_matrix.
In [14]:
sparse_bhm = bicorr.generate_sparse_bhm(bhm)
I'm going to use np.savez
to save several arrays in the same .npz
file.
CAUTION: This will store at 15 GB file to your hard drive. I can verify that it works. No need to repeat it. Instead, use the sparse matrix technique.
In [ ]:
# np.savez('../datar/1/bicorr_hist_master', bicorr_hist_master = bicorr_hist_master, dict_pair_to_index=dict_pair_to_index, dt_bin_edges=dt_bin_edges)
In [15]:
bicorr.save_sparse_bhm(sparse_bhm, det_df, dt_bin_edges, save_folder = '../datar/1')
The steps are:
det_df
root_path
dt_bin_edges
num_det_pairs
, num_intn_types
bhm
bicorr_data
bhm
save_flag
bhm
, dt_bin_edges
In [5]:
help(bicorr.build_bhm)
In [58]:
bhm, dt_bin_edges = bicorr.build_bhm(det_df,1,2,dt_bin_edges =dt_bin_edges, checkpoint_flag = True,save_flag = False, disable_tqdm = True)
bicorr_hist_master.shape
Out[58]:
In [6]:
bhm, dt_bin_edges = bicorr.build_bhm(root_path = '../datar/')
In [8]:
bhm, dt_bin_edges = bicorr.build_bhm(folder_end = 3, root_path = '../datar/')