CellGrid tutorial

CellGrids are an object than contains coordinates, which are split into Cells


In [1]:
import numpy as np
import cellgrid

We'll start with 10,000 randomly spaced points inside a 10 x 10 x 10 box.

Note that all coordinates must be inside the box (within 0.0 and 10.0) to use a CellGrid


In [2]:
coords = np.random.random(30000).reshape(10000, 3).astype(np.float32) * 10.0
box = np.ones(3).astype(np.float32) * 10.0

We create a CellGrid by specifying the box, the width of a cell (1.2 in this example), and the coordinates.


In [3]:
cg = cellgrid.CellGrid(box, 1.2, coords)
print cg


<CellGrid with dimensions 8, 8, 8>

The CellGrid object has grouped the coordinates into cells in a 8x8x8 grid. Each cell has a minimum size of 1.2 x 1.2 x 1.2.

The CellGrid then provides access to Cells.


In [4]:
cell = cg[20]
print cell


<Cell at (4, 2, 0) with 14 coords>

Cells contain positions and indices. The indices tell you the original identity of the coordinates.


In [5]:
print cell.coordinates
print cell.indices


[[ 5.76523256  3.29482055  0.66672134]
 [ 5.48817587  2.68501782  0.17203531]
 [ 5.48174429  2.99204135  0.21953689]
 [ 5.63022995  3.58832932  1.09244275]
 [ 5.82996941  2.55967116  0.46783131]
 [ 6.0887394   3.06209278  0.95498252]
 [ 5.36530304  2.55250669  0.09474408]
 [ 5.78432322  3.45284486  0.00706167]
 [ 5.22742748  3.09078646  0.203667  ]
 [ 5.43257761  3.03484559  0.20778763]
 [ 5.18024921  3.36191201  0.12819624]
 [ 6.15215969  2.98257637  0.32198134]
 [ 5.23517132  2.98871493  0.33490485]
 [ 5.83198166  3.21546292  1.23343873]]
[2224  591 8031  325 9925 6296 8608 6862  243 4281 1667  705 1377 4201]

Cells and neighbours

Cells all have an address, which defines their location in cartesian coordinates within the parent CellGrid.


In [6]:
print 'I live at: ', cell.address
print 'Half my neighbours are: ', cell.half_neighbours


I live at:  (4, 2, 0)
Half my neighbours are:  [[ 5  2  0]
 [ 5  3  0]
 [ 4  3  0]
 [ 3  3  0]
 [ 5  2 -1]
 [ 5  3 -1]
 [ 4  3 -1]
 [ 3  3 -1]
 [ 5  2  1]
 [ 5  3  1]
 [ 4  3  1]
 [ 3  3  1]
 [ 4  2  1]]

In [7]:
for other_address in cell.half_neighbours:
    other_cell = cg[other_address]
    print other_cell


<Cell at (5, 2, 0) with 22 coords>
<Cell at (5, 3, 0) with 26 coords>
<Cell at (4, 3, 0) with 17 coords>
<Cell at (3, 3, 0) with 22 coords>
<Cell at (5, 2, 7) with 16 coords>
<Cell at (5, 3, 7) with 17 coords>
<Cell at (4, 3, 7) with 18 coords>
<Cell at (3, 3, 7) with 27 coords>
<Cell at (5, 2, 1) with 22 coords>
<Cell at (5, 3, 1) with 18 coords>
<Cell at (4, 3, 1) with 21 coords>
<Cell at (3, 3, 1) with 22 coords>
<Cell at (4, 2, 1) with 18 coords>

Cells can iterate over either "half_neighbours" or "all_neighbours".


In [8]:
print len(cell.half_neighbours)
print len(cell.all_neighbours)


13
26

All neighbours returns the address of all 26 neighbouring cells to a given cell. These can be put into a CellGrid to get the Cell at that address.

Half neighbours refer to iterating over all 13 neighbouring cells in one direction. This pattern prevents pairs being calculated twice. If you are interested in all pairwise comparisons within a CellGrid, you likely want to use half_neighbours.

Note that neither list of neighbour addresses includes the cell itself!

Finding all pairs within 1.2 of each other

Now that the coordinates have been sorted into cells, all nearby pairs can be found in a more efficient way.

First we need to calculate the total number of pairs we need to calculate.


In [9]:
n = 0
for c in cg:
    ncell = len(c)
    n += ncell * (ncell - 1) // 2  # number of pairs within this cell
    for other_address in c.half_neighbours:
        n += ncell * len(cg[other_address])
        
print n


2634055

Doing a brute force nxn matrix of our 10,000 coordinates would result in 100 million pairwise comparisons, using a CellGrid has reduced this to 2.6 million.

We can now allocate the results arrays for our search. We will calculate both the distances and the identity of the pairs.


In [10]:
distances = np.zeros(n, dtype=np.float32)
pair_indices = np.zeros((n, 2), dtype=np.int)

In [11]:
from cellgrid.cgmath import (
    inter_distance_array_withpbc,
    intra_distance_array_withpbc,
    inter_index_array,
    intra_index_array
)

In [12]:
pos = 0  # pointer to where in the array to fill results
for c in cg:
    lenc = len(c)
    intra_distance_array_withpbc(c.coordinates, box, distances[pos:])
    intra_index_array(c.indices, pair_indices[pos:])
    pos += lenc * (lenc - 1) // 2
    for other_address in c.half_neighbours:
        other = cg[other_address]
        inter_distance_array_withpbc(c.coordinates, other.coordinates, box, distances[pos:])
        inter_index_array(c.indices, other.indices, pair_indices[pos:])
        pos += lenc * len(other)

Taking for example the 100th result, this tells us that coordinates 3,152 and 3,093 are separated by 0.374 units of distance.


In [26]:
print distances[100], pair_indices[100]
print np.linalg.norm(coords[3152] - coords[3093])


0.374015 [3152 3093]
0.374015

This method will have found all pairs within 1.2 of each other, but also many pairs that are slightly above this.

The extra pairs can easily be filtered out using a mask.


In [14]:
new_distances, new_pairs = distances[distances < 1.2], pair_indices[distances < 1.2]

In [15]:
len(new_distances)


Out[15]:
361095

In [16]:
print (new_distances < 1.2).all()


True

We can check that we found all pairs by performing the brute force method to finding pairs.


In [17]:
brute_force = np.zeros(10000 * 10000, dtype=np.float32)

In [18]:
inter_distance_array_withpbc(coords, coords, box, brute_force)

In [19]:
(brute_force < 1.2).sum()


Out[19]:
732190

In [20]:
b2 = brute_force[brute_force < 1.2]

To calculate the number of unique pairs using this approach, we look at all pairs with distance less than 1.2, then subtract 10,000 (the number of coordinates, which are


In [25]:
(len(b2) - 10000) / 2


Out[25]:
361095

In [ ]: