This notebook describes how to use the provided tools to interface with the data. It goes over the process of installing the tools, retrieving the data, and opening the data within a notebook.
Video Tutorial on how to import tools into Jupyter
If you have worked with python notebooks before, you are probably familiar with the more basic included libraries. While you may not use all of their functionality for every activity, they are a very useful option to have available. We can bring them in with the standard import
command.
In [ ]:
import numpy as np
import matplotlib.pylab as plt
%matplotlib notebook
Now we need the particle physics specific libraries. If you installed the libraries from the command shell properly as shown in the local setup tutorial, then most of the work needed to use these should already be done. From here it should be as simple as using the import
command like for the included libraries.
In [ ]:
import h5hep
import pps_tools as pps
With all of the tools imported, we now need our data files. Included in pps_tools
are several functions for this purpose. Currently all of the files for the activities are located in this Google Drive folder. The function you will need to download files from this Drive is download_drive_file
, for which the argument is simply the file name. You can also use the function download_file_from_google_drive
if you want to save the file you download under a different name, or if for some reason you need to download a file from Google Drive that is not included, in which case you will need to put the file's id tag instead of the filename.
In the case that you want to download a file from a different location online, you would need to use the download_file
function, for which the argument is the url of the file you need. (This tool will not work for Google Drive files).
Once you download a file, it should permenantly reside in the directory you downloaded it to, so you only have to do it once.
In [ ]:
filename = 'dimuons_1000_collisions.hdf5'
pps.download_drive_file(filename)
### Other examples: ###
#pps.download_file_from_google_drive('dimuons_1000_collisions.hdf5','data/file.hdf5')
#pps.download_file_from_google_drive('<google drive file id>','data/file.hdf5')
#pps.download_file('https://github.com/particle-physics-playground/playground/blob/master/data/dimuons_1000_collisions.hdf5')
With the tools imported and files downloaded, you should now have everything you need to start interfacing with the data!
Video overview on data interfacing
The h5hep
tools we used to unpack the data files puts the data all into lists that are accessible with dictionaries. You can view all the of the dictionary entries the data is tagged with using the following command (this is entirely optional, and simply gives you an idea of what kind of data the file may have contained).
In [ ]:
# Print the keys to see what is in the dictionary OPTIONAL
for key in event.keys():
print(key)
To organize the data in a way that makes it easy to find what you need, you will need to use the hep
tools we imported. This can be done in several ways.
The first way uses less commands, but gives you less control over how much data you are using. This command organizes ALL of the data, so if you are using a large data file, this can take a long time. However, it is usually the best choice if you want to use all the data.
NOTE: This command will need to change depending on which experiment your activity is aligned with. For instance, the top quark activity uses CMS tools, so for the experiment
argument in get_collisions
, we put 'CMS'
. The other possible arguments would be 'CLEO'
and 'BaBar'
.
In [ ]:
infile = '../data/dimuons_1000_collisions.hdf5'
collisions = pps.get_collisions(infile,experiment='CMS',verbose=False)
print(len(collisions), " collisions") # This line is optional, and simply tells you how many events are in the file.
This returns a list called collisions
which has all of the collision events as entries. Each event is in turn its own list whose entries are the different types of particles involved in that collision. These are also lists, containing each individual particle of that particular type as entries, which are also lists of the four-momentum and other characteristics of each particle.
This can be a bit complicated until you learn to work with it, so we'll try a visualization as an example:
In [ ]:
second_collision = collisions[1] # the second event (list indexes start at 0)
print("Second event: ",second_collision)
all_muons = second_collision['muons'] # all of the jets in the first event
print("All muons: ",all_muons)
first_muon = all_muons[0] # the first jet in the first event
print("First muon: ",first_muon)
muon_energy = first_muon['e'] # the energy of the first photon
print("First muon's energy: ",muon_energy)
You might notice that each individual event is callable from all collisions by is entry number, as are the individual particles from within their lists of particle types. However, the particle types themselves are only callable from the event list by their names. The characteristics of each particle are also only callable from their lists by the name of the characteristic. The exact dictionary entry needed to call them can be referenced by printing event.keys
as above.
Because get_collisions
puts ALL of the data in a list, to do you analysis, you can simply call everything you need from this one list. For instance, if we wanted to find the energies of all the jets in the entire list of collisions, we could do so using loops:
In [ ]:
energies = []
for collision in collisions: # loops over all the events in the file
jets = collision['jets'] # gets the list of all photons in the event
for jet in jets: # loops over each photon in the current event
e = jet['e'] # gets the energy of the photon
energies.append(e) # puts the energy in a list
Alternatively, you can use the following series of commands to organize the data. It is a little more involved, but gives you more control over the data. For instance, it gives you the ability to only use some of the events rather than all of them, which also would decrease some of the computing time.
In [ ]:
infile = '../data/dimuons_1000_collisions.hdf5'
alldata = pps.get_all_data(infile,verbose=False)
nentries = pps.get_number_of_entries(alldata)
print("# entries: ",nentries) # This optional line tells you how many events are in the file
The above commands do not actually make the data directly usable, we need one more step for that, which is the get_collision
function. This function is different from the get_collisions
function used in the simpler method in that it only pulls out the information of a single event rather than all of them. This means that to get information from multiple events, you will need to use this command in a loop, for which you can define a range that determines what events you actualy want to use.
NOTE: Depending on which activity you are doing, you will have to change the experiment
argument to 'CMS'
, 'CLEO'
, or 'BaBar'
. You will also need to change the entry_number
argument to be the same variable you call in the loop.
In [ ]:
for entry in range(nentries): # This range will loop over ALL of the events
collision = pps.get_collision(alldata,entry_number=entry,experiment='CMS')
for entry in range(0,int(nentries/2)): # This range will loop over the first half of the events
collision = pps.get_collision(alldata,entry_number=entry,experiment='CMS')
for entry in range(int(nentries/2),nentries): # This range will loop over the second half of the events
collision = pps.get_collision(alldata,entry_number=entry,experiment='CMS')
Other than that get_collision
only gets the information from one event rather than all of them, it essentially organizes the information in the same way that get_collisions
does. You can interact with this data the same way you would for any individual event from the big list of events that get_collisions
would give you.
For instance, to find the energies of all jets in the events we were looking at like we did for the simpler method, it would look very similar. However, you will notice that because we already have to loop over each event to use the get_collision
function, we can simply nest the rest of our code within this loop.
In [ ]:
energies = []
for event in range(0,int(nentries/3)): # Loops over first 3rd of all events
collision = pps.get_collision(alldata,entry_number=event,experiment='CMS') # organizes the data so you can interface with it
jets = collision['jets'] # gets the list of all photons in the current event
for jet in jets: # loops over all photons in the event
e = jet['e'] # gets the energy of the photon
energies.append(e) # adds the energy to a list
In [ ]: