Statement of the problem

On a grid

We are given a grid of risks.

We wish to find $N$ hotspots (typically $N$ is fixed, say $N=5$, or at least bounded above, say $N<=8$).
Each hotspot is of a maxmimum size (number of grid cells). Or the total coverage is limited.
At the very least, a "hotspot" must be a contiguous selections of cells.
We might wish to further constrain what we mean by a "hotspot".

This thus becomes an optimisation problem, not really connected to criminology. I wonder if it has been studied in computer science?

On a network

We are given an assignment of risk to each edge of the network.

We wish to find $N$ hotspots (typically $N$ is fixed, say $N=5$, or at least bounded above, say $N<=8$).
Each hotspot is of a maxmimum size (shortest walk which covers all edges in the hotspot). Or the total coverage is limited.
At the very least, a "hotspot" must be a contiguous selections of edges.
We might wish to further constrain what we mean by a "hotspot".

Maximisation

In either case, our goal is to find the selection of cells (or edges) which satisfies the above constraints, and subject to this constraint, maximises the sum of risks. This is probably computationally intractible, and so algorithms which get close to the maximum will be sought.

Goals

Can we find simple sets of additional constraints which give rise to "nice" hotspots?
E.g. we might bias in favour of "compact" hotspots (meaning the ratio of the perimeter to area is low).
Can we find efficient algorithms?
What sort of "hit rate" do the different methods / different risk algorithms give?

Simple example: "greedy grid"

Grid based, will produce $\leq N$ hotspots of a certain total size $M$. Only constraint is that each "hotspot" is a connected collection of cells. Iterative:

If the current number of hotspots is $<N$, select the cell with the highest risk.
Otherwise, select the cell with the highest risk which is adjacent to a current hotspot.
Stop if we have selected $M$ cells, otherwise continue.

Advantages:

$\mathcal O(nM)$ where $n$ is the number of grid cells, in the naivest implementation.

Problems:

No guarantee of producing "compact" hotspots.
No guarantee of solving the maximisation problem: this can happen if there are say $N$ isolated cells with high risk, surrounds by very low risk cells, and disjoint to these a large number of medium risk cells, with $M$ much larger than $N$ (so an optimal solution is to choose many medium risk cells; the algorithm chooses the high risk cells and then add very low risk cells).



In [ ]: