Data Selection Cuts

Why do we apply cuts?

A large number of proton-proton collisions occur at the LHC, in 2011 the beams crossed approximately 20 million times per second during operations. The processing capabilities and storage requirements mean that we can only store a few thousand of these events per second. The selection of which events to store is performed by the trigger system which look out for the most interesting events to save.

The processes that we are looking to study, such as the one in this analysis, are comparitively rare. The events due to the process that we wish to study is known as the signal. Other processes which can fake the signal are known as the backgrounds. Selections cuts are applied in the trigger system and then offline to reduce the data. The selections cuts are made on the characteristics of the events to maximise the number of signal events selected and minimise the number of background events.

The main selection cuts that have been applied to select the data sample you have are described here. The total size of the real data sample that you have is a few million events.

How do we choose the ideal cuts?

It is not possible to cleanly separate the signal and background events. If we apply a very tight selection we can reduce the number of background events to a low level but this will also reduce the number of signal events selected. This will reduce the statistical sensitivity of your measurement. A very loose selection will keep as many signal events as possible but will also have a large number of background events in your sample. This may cause biases in your measurement, depending on your treatment of the background, and will cause uncertainties due to the statistical fluctuations of the background sample.

In practice for this project you should simply choose a set of kaon identification cuts that produce a large signal peak in the invariant mass distribution over a comparatively small background.

Here we discuss how the cuts can be optimised. A common measure that is used to optimise the selection cuts that are applied is a determination of the significance of the signal. This measure is s / √ s + b </span>, where s is the estimated number of signal events and b is the estimated number of background events. The numerator is the signal events. The denominator is the statistical uncertainty on the number of events that are observed, signal and background. It thus represents a measure of the significance of observing the signal over the signal plus background. The cuts can be optimised so that this measure has its highest possible value.

The data sample that you have been given has had a reasonable set of cuts already applied on the kinematics of the reconstructed tracks and vertices. It has not had any selection on particle identification. You will perform this element of the selection.