This technique has been used in the crime prediction literature to select optimal bandwidths, typically at a stage before forming a grid or "hot spots", e.g. [1].
It is thus worth spending some time to develop the mathematical model upon which this idea is based. We assume that the events occuring in the "validation period" (the e.g. next day after the prediction) are well approximated by an Inhomogeneous Poisson point process with overall rate $\lambda$ (the expected number of events in the validation period) and a probability density function $f$, defined on the area of interest, which gives the local intensity. This means that in an area $A$, the expected number of events is $\lambda \int_A f$.
Our predictions do not (excepting perhaps the SEPP methods, though we discard this information) provide $\lambda$. To compare like with like, we will assume that $f$ is given by a grid: that is, $f$ is constant in each grid cell.
The likelihood, as used by [1], is then $$ \prod_{i=1}^N f(x_i) $$ where $(x_i)_{i=1}^N$ denote the coordinates of the events in the validation period. Mathematically, this is the likelihood, conditional on $N$, the number of events. Typically we look at the log likelihood instead, $$ L_0 = \sum_{i=1}^N \log f(x_i). $$ As noted by [1], if $f(x_i)=0$ then we have observed an impossible event! Thus we set a very small lower bound for $f$.
One potential problem here is that we have a dependence on $N$. For the application in [1], this is unimportant, but when comparing predictions, we should normalise by computing $$ L_1 = \frac{1}{N} \sum_{i=1}^N \log f(x_i). $$ If we wish to compare scores between different studies, then we also need to normalise for the area, leading to $$ L = \log A + \frac{1}{N} \sum_{i=1}^N \log f(x_i), $$ where $A$ is the area of the study zone.
For example, if the prediction is complete spatial randomness, then we have $L=0$ for any observation.
In [ ]: