by Kyle Cranmer, Dec 17, 2015
The correction for 1d look-elsewhere effect (LEE) presented in Trial factors or the look elsewhere effect in high energy physics by Ofer Vitells and Eilam Gross http://arxiv.org/abs/arXiv:1005.1891
It is often claimed that when one has two statistically independent searches for the same particle, then a peak in one tells you where to look for the other, thus eliminating the look-elsewhere effect. These searches might be from two different experiments (eg. ATLAS and CMS), two different decay modes, or two different time perieods (eg. run1 and run2). There are various flaws in this logic, as stressed by Bob Cousins in these slides. This issue quickly becomes subtle as the intuitive procedure of using the location of the excess in one search to inform where to look in the other is not fully specified. A few things can be said about the intuitive procedure
In what follows I will explore the behavior of the LEE correction for the combination of two searches (which can be trivially extended to more than two searches).
The starting point is to consider a search for a new particle with signal strength $\mu$ and unknown mass $\nu$ on top of a background distribution described by some nuisance parameters $\theta$. We perform the search by scanning over the mass $\nu$ and calculating the test statistic
\begin{equation} q(\nu) = -2 \log \frac{ \max_{\theta} L(\mu=0, \nu, \theta)}{ \max_{\mu, \theta} L(\mu, \nu, \theta)} \end{equation}Assuming the background-only is true, $q(\nu)$ is a chi-square random field (with 1 degree of freedom). That means that, for any point $\nu$, the quantity $q(\nu)$ would have a chi-square distribution if you repeated the experiment many times.
The maximum local significance is based on $Z_{local} = \sqrt{q_{max}}$, where $q_{max} = \max_\nu q(\nu)$. The correction from local to global significance is given by:
\begin{equation} p_{global} = p_{local} + N \exp(-(q_{max}-u_0)/2) \end{equation}where $N$ is the average number of upcrossings above level $u_0$ (i.e. times that $q(\nu) > u_0$). This $N$ characterizes the search -- searches with good mass resolution over a large mass range will have large values of $N$, while searches with poor mass resolution will have small values of $N$.
Creating many likelihood scans from pseudo-experiments (toy Monte Carlo) is somewhat time consuming, so here we make realizations of a chi-square random field by using a Gaussian Process. The main trick we will use is that a chi-square distribution for one degree of freedom is the same as the distribution of $x^2$ if $x$ is normally distributed. As you might have guessed, a Gaussian Process (GP) is like a chi-square random field, but it is Gaussian-distributed at each point.
Note, the distributions are not independent at each point, there is some covariance. So if the $q(\nu)$ is high at one point, you can expect it to be high near by. We can control this behavior via the GP's kernel. In particular, $K(\nu, \nu') = Cov[q(\nu)^2, q(\nu')^2]$. We can essentially specify what the mass resolution of our virtual search is via the length scale used in ther kernel.
For more on the theory of Gaussian Processes, the best resource is available for free online: Rasmussen & Williams (2006). We will george -- a nice python package for Gaussian Processes (GP).
The next major conceptual pilar for this work is the asymptotic approximations for the likelihood ratio. Wilks's theorem states that assuming background-only ($\mu=0$) the distribution of the best fit signal strength $\hat{\mu}$ follows a Gaussian distribution $G(\hat{\mu} | \mu=0, \sigma)$, where $\sigma^2 = \textrm{Var}[\mu]$. With that assumption $q(\mu) = (\hat{\mu}/\sigma)^2$, hence $q(\mu)$ is chi-square distributed. In this way of thinking, the GP is generating results for $(\hat{\mu}/\sigma)$ as a function of the mass parameter $\nu$.
This allows us to quickly do combinations on these toy results. Fundamentally, we wish to perform a likelihood combination at every mass point. This is additive in the log-likelihood ratio: \begin{equation} q_{12}(\nu) = q_1(\nu) + q_2(\nu) + const \end{equation}
This is also a chi-square random field. In the asymptotic limit, the likelihood combination is equivalent to:
\begin{equation} \hat{\mu}_{12}(\nu) = \frac{\hat{\mu}_1(\nu) \sigma_2^2(\nu) +\hat{\mu}_2(\nu)\sigma_1^2(\nu)}{\sigma_1^2(\nu)+\sigma_2^2(\nu)} \end{equation}together with variance
\begin{equation} Var[\hat{\mu}_{12}(\nu)] \equiv\sigma_{12}^2(\nu) = \frac{\sigma_1^2(\nu)+\sigma_2^2(\nu)}{\sigma_1^2(\nu) \sigma_2^2(\nu)} \end{equation}The important part here is that we can also work out the kernel for the Gaussian process that describes $(\hat{\mu}_{12}/sigma_{12})^2(\nu)$. In particular, the covariance between $\nu_1$ and $\nu_2$ of the GP for combination can be derived from the covariance of the GP for searches 1 and 2.
Consider the simple case where the two searches have the same constant sensitivity: $\sigma_1(\nu) = \sigma_2(\nu) = \sigma$. Then $\hat{\mu}_{12}(\nu) = [\hat{\mu}_1(\nu)+\hat{\mu}_2(\nu)]/2$ and $\sigma_{12}^2 = \sigma^2/2$. So the kernel for the GP that describes the combination is given by:
\begin{equation} K_{12}(\nu, \nu') = Cov[(\hat{\mu}_{12}(\nu)/\sigma_{12})^2, (\hat{\mu}_{12}(\nu')/\sigma_{12})^2 = \frac{1}{2} K_{1}(\nu, \nu') + \frac{1}{2} K_{2}(\nu, \nu') \end{equation}Correlarry If two searches have the same mass resolution and statistical power, the effective $N$ needed to calculate the LEE for the combination is the same as the individual searches. (This can be demonstraited with the code below by setting ratio_of_correlations=1.
Note In what follows, I'll demonstrate this for the simple case, but this can be extended to have separate $\sigma(\nu)$ curves for searches 1 and 2.
We will create three different Gaussian Processes:
gp1 will generate results from experiment 1 gp2 will generate results from experiment 2gp12 will be shown to have the same behavior as the combination of gp1 and gp2
In [1]:
%pylab inline --no-import-all
#plt.rc('text', usetex=True)
plt.rcParams['figure.figsize'] = (6.0, 6.0)
#plt.rcParams['savefig.dpi'] = 60
In [2]:
import george
from george.kernels import ExpSquaredKernel
from scipy.stats import chi2, norm
In [3]:
length_scale_of_correaltion=1.
ratio_of_length_scales=4.
kernel1 = ExpSquaredKernel(length_scale_of_correaltion, ndim=1)
kernel2 = ExpSquaredKernel(ratio_of_length_scales**2*length_scale_of_correaltion, ndim=1)
kernel12 = 0.5*kernel1+0.5*kernel2
In [4]:
# Create the Gaussian process
# gp = george.GP(kernel)
gp1 = george.GP(kernel1, solver=george.HODLRSolver) #faster
gp2 = george.GP(kernel2, solver=george.HODLRSolver) #faster
gp12 = george.GP(kernel12, solver=george.HODLRSolver) #faster
In [5]:
n_scan_points=250
x = np.linspace(0,100,n_scan_points)
In [6]:
# slow part: pre-compute internal stuff for the GP
gp1.compute(x)
gp2.compute(x)
gp12.compute(x)
Show an example of a realization of the Gaussian process for $z(\nu) = (\hat{\mu}/\sigma)$ and $q(\nu) = z^2(\nu)$
Now lets histogram the values of the random field.
Don't get confused here... if you pick a single point and histogram the value of over many instances, you expect a Gaussian. However, for a single instance, you don't expect the histogram for the value of the field to be Gaussian (because of the correlations). Thought experiments: if you make length_scale_of_correaltion very small, then each point is essentially independent and you do expect to see a Gaussian; however, if length_scale_of_correaltion is very large then you expect the field to be nearly constant and the histogram below would be a delta function.
In [7]:
# evaluate one realization of the GP
z = gp1.sample(x)
# plot the chi-square random field
plt.subplot(121)
plt.plot(x,z)
plt.ylabel(r'$z(\nu)$')
plt.xlabel(r'$\nu$')
plt.subplot(122)
plt.plot(x,z**2)
plt.ylabel(r'$q(\nu)$')
plt.xlabel(r'$\nu$')
Out[7]:
In [8]:
def q_to_pvalue(q):
return (1.-chi2.cdf(q, 1))/2 #divide by 2 for 1-sided test
def pvalue_to_significance(p):
return -norm.ppf(p)
def significance_to_pvalue(Z):
return 1.-norm.cdf(Z)
In [9]:
def num_upcrossings(z):
"""count number of times adjacent bins change between 0,1"""
return np.sum((z-np.roll(z,1))**2)/2
In [10]:
def global_pvalue(u,u0, n):
#return (1.-chi2.cdf(u, 1))/2. + np.exp(-(u-u0)/2)*n #1-sided p-value
return (1.-chi2.cdf(u, 1)) + np.exp(-(u-u0)/2)*n # 2-sided p-value
In [11]:
u1 = 0.5
Check the code to count upcrossings and the LEE correction is working
In [12]:
n_samples = 1000
n_plots = 3
plt.figure(figsize=(9,n_plots*3))
z_array = gp1.sample(x,n_samples)
n_up = np.zeros(n_samples)
for scan_no, z in enumerate(z_array):
scan = z**2
exc1 = (scan>u1) + 0. #add 0. to convert from bool to double
n_up[scan_no] = num_upcrossings(exc1)
if scan_no < n_plots:
plt.subplot(n_plots,2,2*scan_no+1)
plt.plot(x,scan)
plt.plot([0,100],[u1,u1], c='r')
plt.subplot(n_plots,2,2*scan_no+2)
plt.plot(x,exc1)
plt.ylim(-.1,1.1)
print('experiment %d has %d upcrossings' %(scan_no, n_up[scan_no]))
n_av = np.mean(n_up)
print("average number of upcrossings in %d experiments is %f" %(n_samples, n_av))
In [13]:
u = np.linspace(5,25,100)
global_p = global_pvalue(u,u1,n_av)
In [14]:
n_samples = 10000
z_array = gp1.sample(x,n_samples)
q_max = np.zeros(n_samples)
for scan_no, z in enumerate(z_array):
scan = z**2
q_max[scan_no] = np.max(scan)
In [15]:
bins, edges, patches = plt.hist(q_max, bins=30)
icdf = 1.-np.cumsum(bins/n_samples)
icdf = np.hstack((1.,icdf))
icdf_error = np.sqrt(np.cumsum(bins))/n_samples
icdf_error = np.hstack((0.,icdf_error))
plt.xlabel('$q_{max}$')
plt.ylabel('counts / bin')
Out[15]:
In [16]:
# plot the p-value
plt.plot(edges,icdf, c='r', label='toys')
plt.errorbar(edges,icdf,yerr=icdf_error)
plt.plot(u, global_p, label='prediction')
plt.xlabel('$u$')
plt.ylabel('$P(q_{max} >u)$')
plt.legend(('prediction','toys'))
#plt.ylabel('P(q>u)')
plt.ylim(1E-3,10)
plt.xlim(0,25)
plt.semilogy()
Out[16]:
Wow! that was awesome! Go math!
In [17]:
n_samples = 10000
z_array1 = gp1.sample(x,n_samples)
z_array2 = gp2.sample(x,n_samples)
n_av1, n_av2, n_av12 = 0., 0., 0.
q_max = np.zeros((n_samples,3))
q_10 = np.zeros((n_samples,3))
n_plots = 3
plt.figure(figsize=(9,n_plots*3))
scan_no=0
for z1, z2 in zip(z_array1,z_array2):
scan1 = z1**2
scan2 = z2**2
scan12 = ((z1+z2)**2)/2 # This is where the combination happens
exc1 = (scan1>u1) + 0. #add 0. to convert from bool to double
exc2 = (scan2>u1) + 0. #add 0. to convert from bool to double
exc12 = (scan12>u1) + 0. #add 0. to convert from bool to double
if scan_no < n_plots:
aspect = 1.
#plt.subplot(n_plots,3,3*scan_no+1)
plt.subplot(n_plots,1,1*scan_no+1)
plt.plot(x,scan1, c='r', label='search 1')
#plt.subplot(n_plots,3,3*scan_no+2)
plt.subplot(n_plots,1,1*scan_no+1)
plt.plot(x,scan2, c='g', label='search 2')
#plt.subplot(n_plots,3,3*scan_no+3)
plt.subplot(n_plots,1,1*scan_no+1)
plt.plot(x,scan12, c='b', label='combined')
plt.legend(('search 1', 'search 2', 'combined'))
q_max[scan_no,:] = [np.max(scan1), np.max(scan2), np.max(scan12)]
q_10[scan_no,:] = [scan1[10],scan2[10], scan12[10]]
#print num_upcrossings(exc1)
n_av1 += 1.*num_upcrossings(exc1)/n_samples
n_av2 += 1.*num_upcrossings(exc2)/n_samples
n_av12 += 1.*num_upcrossings(exc12)/n_samples
scan_no +=1
print "n_av search 1, search 2, combined = ", n_av1, n_av2, n_av12
In [18]:
#Simple scaling:
print "check simple scailing rule: prediction=%f, observed=%f" %(np.sqrt((n_av1**2+n_av2**2)/2), n_av12)
In [19]:
z_array12 = gp12.sample(x,n_samples)
q12_max = np.zeros((n_samples))
n_up = np.zeros(n_samples)
for scan_no, z12 in enumerate(z_array12):
scan12 = (z12)**2
q12_max[scan_no] = np.max(scan12)
n_up[scan_no] = num_upcrossings((scan12 > u1)+0.)
print("average number of upcrossings for combined GP = %f" %(np.mean(n_up)))
Compare $q_{max}$ distribution from direct combination with the prediction from gp12
In [20]:
bins, edges, patches = plt.hist(q_max[:,2], bins=50, alpha=0.1, color='r', label='explicit combination')
bins, edges, patches = plt.hist(q12_max, bins=edges, alpha=0.1, color='b', label='predicted')
plt.ylabel('counts/bin')
plt.xlabel('$q_{max}$')
plt.legend(('explicit combination', 'predicted'))
Out[20]:
In [21]:
u = np.linspace(5,25,100)
global_p = global_pvalue(u,u1,np.mean(n_up))
icdf = 1.-np.cumsum(bins/n_samples)
icdf = np.hstack((1.,icdf))
icdf_error = np.sqrt(np.cumsum(bins))/n_samples
icdf_error = np.hstack((0.,icdf_error))
plt.plot(edges,icdf, c='r', label='toys')
plt.errorbar(edges,icdf,yerr=icdf_error)
plt.plot(u, global_p, label='prediction')
plt.xlabel('$u$')
plt.ylabel('$P(q_{max} >u)$')
plt.legend(('prediction','toys'))
#plt.ylabel('P(q>u)')
plt.ylim(1E-3,10)
plt.xlim(0,25)
plt.semilogy()
Out[21]:
Bingo!
In [ ]: