Hotspots on a network

A class of algorithms which are based on the classical (prospective / retrospective) hotspotting techniques where a grid is replaced by edges on a network/graph.

Sources

Rosser et al. "Predictive Crime Mapping: Arbitrary Grids or Street Networks?" Journal of Quantitative Criminology 33 (2017) 569--594 10.1007/s10940-016-9321-x
Okabe et al. "A kernel density estimation method for networks, its computational method and a GIS‐based tool" International Journal of Geographical Information Science 23 (2009) 7--32 10.1080/13658810802475491

Algorithm

We follow (1), which itself uses (2) for the KDE method.

We need geometry giving the street network we'll work on.
- For the moment, we use freely available data from the OS Open Road for the UK, and TIGER/Line for the USA.
- For us, a "network" or "graph" is a collection of vertices in the plane, with edges connecting (some of) the vertices, as straight lines. Informally, the "network" is the subset of the plane formed by the union of all the edges. Curved streets can be approximated by adding further vertices. This usage correlates with the above geometry sources.
All events need to be assigned to the network. Following (1) we simply orthogonally project each event to the network (this is equivalent as assigning each event to the closest point on the network).
We then "simply" change the grid-based hotspotting technique to work on the network, using (2). We give details below.

The resulting "hotspot" is actually a "risk" estimate for each point on the network, where a higher risk corresponds to the belief that future events are more likely. As usual, we generate an actual "hotspot(s)" by chosing a "coverage level", say 10%, and then selecting the 10% of the network which is most risky, where we use the natural length of edges in the network to decide what "10%" means.

We might naturally want to compare such a prediction with a grid-based prediction. To do this, (1) suggests (although they don't quite use these words) generating a network prediction from a grid prediction. We do this by giving edge network point the "risk" of the grid cell it occurs in. Once this is done, we can then compare network predictions directly. (For example, to generate 10% coverage, we work from most risky to least risky grid cell, adding all the network which intersects with that cell, until we have selected 10% of the network by length.)

KDE on a network

We follow (2); the summary in (1) is very clear. Let $s'$ be an event on the network. We seek to derive a formula for the (space-like) kernel induced at a point $s$. As a base, we start with a one-dimensional kernel function $f:\mathbb R\rightarrow [0,\infty)$. We assume that $f$ is symmetric, $f(-t) = f(t)$, and it should be normalised, $\int_{\mathbb R} f(t) \ dt = 1$. (Source (2) gives further conditions, but these are details, and will be satisfied by our kernels.) We always suppose our kernels have a finite "support", a value $t_\max$ such that $f(t)=0$ for any $|t|>t_\max$.

We consider all paths in the network from $s'$ to $s$, ignoring any "cyclic" paths. This is not entirely specified in (1), so we use this algorithm:

We perform a Breadth-first search starting at $s'$ and considering all possible vertices adjacent to $s'$, forming a set of possible paths. For each possible path, we consider the adjacent vertices to the end vertex, forming a new set of possible paths. We continue, discarding paths where:
- We form a cycle, by visiting a vertex we have visited before. We ignore such paths.
- We get to $s$. We consider such paths.
- The length of the path is greater than the support of the base kernel $f$.
This gives us a finite set of possible paths, $P$.
For each path $p\in P$ we will visit vertices $v_1, \cdots, v_{m(p)}$ between $s'$ and $s$. For each $i$ let $n_i$ be the order of the vertex $v_i$, that is, the number of neighbours which $v_i$ has.
Let $l(p)$ be the length of that path.
Our network kernel estimate is then $$ k_{s'}^{(p)}(s) = \sum_{P} \frac{f(l(p))}{(n_1-1)\cdots(n_{m(p)}-1)} $$
This corresponds to "splitting" the kernel at each vertex by the number of possible paths out of that vertex. See (2) for further details.
The final kernel estimate is obtained by summing over all events $s'$.

Which kernels to use

(1) considers a "prospective" hotspotting technique, which takes account of time. We consider forming a prediction at time $t$. Consider events which occur at time $t_i$ for $i=1,\cdots,n$, and which occur at a place $s_i$ on the network. We combine a time and space kernel.

The time kernel is $$ g(t) = \frac{1}{h_T} \exp\Big( -\frac{t}{h_T} \Big). $$ That is, exponential decay, with a "bandwidth" $h_T$.
The space kernel uses a base kernel which is linear decay: $$ f(s) = \begin{cases} \frac{h_S - s}{h_S^2} &: 0 \leq s \leq h_S,\\ 0 &: |s| > h_S. \end{cases} $$ Here $h_S$ is the "spatial bandwidth" or "support" as above.

Thus the final network "risk" estimate is at time $t$ and network location $s$, $$ \lambda(s,t) = \sum_{t_i<t} \frac{1}{h_T} \exp\Big( -\frac{t-t_i}{h_T} \Big) \Big( \sum_i \sum_{p:s_i\rightarrow s, l(p) \leq h_S} \frac{h_S - l(p)}{h_S^2(n_1-1)\cdots(n_{m(p)}-1)}\Big) $$

Efficiency

We are interested in an estimate of the risk at a fixed time $t$. Source (1) suggests sampling $\lambda(s) = \lambda(s,t)$ at various points $s$ across the network; (1) suggests every 30 metres. We make some comments on how to improve this, and towards efficient computation:

If we are only sampling at points, then we may as well actually just deem every edge in the network to have a risk which is constant across that edge. If an edge is too long, then it can simply be split into parts by adding addition vertices.
Thus, we will assign, say, the risk at the centre of each edge to the whole edge.
To correctly normalise, we should multiply this point risk by the length of the edge. An alternative, if the form of the spacial kernel is analytically tractable, is to integrate it over the edge.
It is more efficient to work over each event at time $t_i<t$ in turn, performing the "walk" across the network from that point, adding to the risk of each edge as we encounter it.

To find all paths between a point in the middle of an edge between vertices $u_1, u_2$ and a point in the middle of an edge between vertices $u_3, u_4$, we can proceed as follows:

Decide we will first move to vertex $u_1$, and will finally move to vertex $u_3$.
Find all such paths, ignoring any path which uses the edges between $u_1, u_2$ and $u_3, u_4$, to avoid circular paths.
Do the for paths $u_1 \rightarrow u_4$, then $u_2\rightarrow u_3$ and finally $u_2\rightarrow u_4$.

Finally, once we have summed all the "risks" for each edge, we should normalise and divide by the length of the edge.

This assigns to each edge the average/expected risk.
We can then e.g. take the top 10% of edges to form network "hotspots"
(Or apply some more involved hotspot generation algorithm).

Approximation

The above procedure is very slow if the "spatial bandwidth" of the kernel becomes large (and in practical situations, it is large). We have implemented a caching scheme which:

Uses a lot of memory. For the South side of Chicago, we need a 12GB machine.
Is only of benefit if the same network and events collection will be tested for multiple kernels (for example) or across multiple days.

An alternative scheme, as explored more in the Geometry reduction notebook, is to approximate the sum over all paths by:

Simply compute the shortest path.
But continue to take account of node degree.
We can use a variation of Dijkstra's algorithm to do this.
This seems to lead to a reasonably good approximation.

Comparison to grid based methods

We again follow (1). Given a grid prediction, we can intersect the grid with the network, and assign the risk of each grid cell to every network part which intersects that cell.

As (1) notes, this typically leads to a slightly artificial intersection of the grid and network
It seems likely that the better performance of the network hotspot to the grid hotspot found in (1) is mostly due to the "over coverage" from this intersection procedure. It would be interesting to make this conjecture precise and to test it.

Examples

For networks, it is hard(er) to produce toy examples. So instead of giving examples here, we give examples using real data from Chicago.

See the folder "Case study Chicago"
The notebook "Input data" describes how we produced the data we need as input
The notebook "Hotspotting" shows how to produce a hotspot map on a network
The notebook "Grid to network hotspotting" shows how to move from a grid prediction to a network prediction
The notebook "Cross-Validation" shows a replication of the work of Rosser et al, but for Chicago, on choosing optimal bandwidths
The notebook "Cross-Validation grid" does the same for gird based predictions with the Chicago data



In [ ]: