In [1]:
import numpy as np
import cellgrid
We'll start with 10,000 randomly spaced points inside a 10 x 10 x 10 box.
Note that all coordinates must be inside the box (within 0.0 and 10.0) to use a CellGrid
In [2]:
coords = np.random.random(30000).reshape(10000, 3).astype(np.float32) * 10.0
box = np.ones(3).astype(np.float32) * 10.0
We create a CellGrid by specifying the box, the width of a cell (1.2 in this example), and the coordinates.
In [3]:
cg = cellgrid.CellGrid(box, 1.2, coords)
print cg
The CellGrid object has grouped the coordinates into cells in a 8x8x8 grid. Each cell has a minimum size of 1.2 x 1.2 x 1.2.
The CellGrid then provides access to Cells.
In [4]:
cell = cg[20]
print cell
Cells contain positions and indices. The indices tell you the original identity of the coordinates.
In [5]:
print cell.coordinates
print cell.indices
In [6]:
print 'I live at: ', cell.address
print 'Half my neighbours are: ', cell.half_neighbours
In [7]:
for other_address in cell.half_neighbours:
other_cell = cg[other_address]
print other_cell
Cells can iterate over either "half_neighbours" or "all_neighbours".
In [8]:
print len(cell.half_neighbours)
print len(cell.all_neighbours)
All neighbours returns the address of all 26 neighbouring cells to a given cell. These can be put into a CellGrid to get the Cell at that address.
Half neighbours refer to iterating over all 13 neighbouring cells in one direction.
This pattern prevents pairs being calculated twice.
If you are interested in all pairwise comparisons within a CellGrid, you likely want to use half_neighbours
.
Note that neither list of neighbour addresses includes the cell itself!
In [9]:
n = 0
for c in cg:
ncell = len(c)
n += ncell * (ncell - 1) // 2 # number of pairs within this cell
for other_address in c.half_neighbours:
n += ncell * len(cg[other_address])
print n
Doing a brute force nxn matrix of our 10,000 coordinates would result in 100 million pairwise comparisons, using a CellGrid has reduced this to 2.6 million.
We can now allocate the results arrays for our search. We will calculate both the distances and the identity of the pairs.
In [10]:
distances = np.zeros(n, dtype=np.float32)
pair_indices = np.zeros((n, 2), dtype=np.int)
In [11]:
from cellgrid.cgmath import (
inter_distance_array_withpbc,
intra_distance_array_withpbc,
inter_index_array,
intra_index_array
)
In [12]:
pos = 0 # pointer to where in the array to fill results
for c in cg:
lenc = len(c)
intra_distance_array_withpbc(c.coordinates, box, distances[pos:])
intra_index_array(c.indices, pair_indices[pos:])
pos += lenc * (lenc - 1) // 2
for other_address in c.half_neighbours:
other = cg[other_address]
inter_distance_array_withpbc(c.coordinates, other.coordinates, box, distances[pos:])
inter_index_array(c.indices, other.indices, pair_indices[pos:])
pos += lenc * len(other)
Taking for example the 100th result, this tells us that coordinates 3,152 and 3,093 are separated by 0.374 units of distance.
In [26]:
print distances[100], pair_indices[100]
print np.linalg.norm(coords[3152] - coords[3093])
This method will have found all pairs within 1.2 of each other, but also many pairs that are slightly above this.
The extra pairs can easily be filtered out using a mask.
In [14]:
new_distances, new_pairs = distances[distances < 1.2], pair_indices[distances < 1.2]
In [15]:
len(new_distances)
Out[15]:
In [16]:
print (new_distances < 1.2).all()
We can check that we found all pairs by performing the brute force method to finding pairs.
In [17]:
brute_force = np.zeros(10000 * 10000, dtype=np.float32)
In [18]:
inter_distance_array_withpbc(coords, coords, box, brute_force)
In [19]:
(brute_force < 1.2).sum()
Out[19]:
In [20]:
b2 = brute_force[brute_force < 1.2]
To calculate the number of unique pairs using this approach, we look at all pairs with distance less than 1.2, then subtract 10,000 (the number of coordinates, which are
In [25]:
(len(b2) - 10000) / 2
Out[25]:
In [ ]: