Timing SciPy's compiled kd-tree algorithms

System:

  • MacBook Pro, early 2013
  • 2.4 GHz Intel core i7, 8 GB RAM
  • MacOS X 10.9.5
  • Python 3.4.2 :: Anaconda 2.1.0 (x86_64)

Create some random data:


In [1]:
import numpy as np
from sklearn.datasets import make_classification
data0,y = make_classification(n_samples=10000)
data1,z = make_classification(n_samples=10000)

In [2]:
from ckdtreebench import spatial_015
from ckdtreebench import spatial_016
from ckdtreebench import spatial_PR4890

Benchmark kd-tree construction

SciPy 0.15.x


In [3]:
%timeit spatial_015.cKDTree(data0)


100 loops, best of 3: 2.59 ms per loop

SciPy 0.16.x


In [4]:
%timeit spatial_016.cKDTree(data0, balanced_tree=False)


100 loops, best of 3: 4.57 ms per loop

In [5]:
%timeit spatial_016.cKDTree(data0, balanced_tree=True)


100 loops, best of 3: 5.62 ms per loop

PR4890


In [6]:
%timeit spatial_PR4890.cKDTree(data0, balanced_tree=False)


100 loops, best of 3: 4.47 ms per loop

In [7]:
%timeit spatial_PR4890.cKDTree(data0, balanced_tree=True)


100 loops, best of 3: 5.56 ms per loop

Benchmark k-NN query

SciPy 0.15.x - sliding midpoint


In [8]:
kdtree = spatial_015.cKDTree(data0)

In [9]:
%timeit kdtree.query(data1, k=5)


1 loops, best of 3: 7.05 s per loop

SciPy 0.16.x - sliding midpoint


In [10]:
kdtree = spatial_016.cKDTree(data0, balanced_tree=False)

In [11]:
%timeit kdtree.query(data1, k=5)


1 loops, best of 3: 3.28 s per loop

In [12]:
%timeit kdtree.query(data1, k=5, n_jobs=-1)


1 loops, best of 3: 694 ms per loop

SciPy 0.16.x - median


In [13]:
kdtree = spatial_016.cKDTree(data0, balanced_tree=True)

In [14]:
%timeit kdtree.query(data1, k=5)


1 loops, best of 3: 3.4 s per loop

In [15]:
%timeit kdtree.query(data1, k=5, n_jobs=-1)


1 loops, best of 3: 716 ms per loop

PR4890 - sliding midpoint


In [16]:
kdtree = spatial_PR4890.cKDTree(data0, balanced_tree=False)

In [17]:
%timeit kdtree.query(data1, k=5)


1 loops, best of 3: 3.23 s per loop

In [18]:
%timeit kdtree.query(data1, k=5, n_jobs=-1)


1 loops, best of 3: 690 ms per loop

PR4890 - median


In [19]:
kdtree = spatial_PR4890.cKDTree(data0, balanced_tree=True)

In [20]:
%timeit kdtree.query(data1, k=5)


1 loops, best of 3: 3.35 s per loop

In [21]:
%timeit kdtree.query(data1, k=5, n_jobs=-1)


1 loops, best of 3: 714 ms per loop

Benchmark query_ball_point

SciPy 0.15.x - sliding midpoint


In [22]:
kdtree = spatial_015.cKDTree(data0)

In [23]:
%timeit kdtree.query_ball_point(data1,4.0)


1 loops, best of 3: 7.81 s per loop

SciPy 0.16.x - sliding midpoint


In [24]:
kdtree = spatial_016.cKDTree(data0, balanced_tree=False)

In [25]:
%timeit kdtree.query_ball_point(data1,4.0)


1 loops, best of 3: 5.52 s per loop

SciPy 0.16.x - median


In [26]:
kdtree = spatial_016.cKDTree(data0, balanced_tree=True)

In [27]:
%timeit kdtree.query_ball_point(data1,4.0)


1 loops, best of 3: 5.64 s per loop

PR4890 - sliding midpoint


In [28]:
kdtree = spatial_PR4890.cKDTree(data0, balanced_tree=False)

In [29]:
%timeit kdtree.query_ball_point(data1,4.0)


1 loops, best of 3: 4.52 s per loop

In [30]:
%timeit kdtree.query_ball_point(data1,4.0,n_jobs=-1)


1 loops, best of 3: 1.16 s per loop

PR4890 - median


In [31]:
kdtree = spatial_PR4890.cKDTree(data0, balanced_tree=True)

In [32]:
%timeit kdtree.query_ball_point(data1,4.0)


1 loops, best of 3: 4.7 s per loop

In [33]:
%timeit kdtree.query_ball_point(data1,4.0,n_jobs=-1)


1 loops, best of 3: 1.23 s per loop

Benchmark query_ball_tree

SciPy 0.15.x - sliding midpoint


In [34]:
kdtree = spatial_015.cKDTree(data0)
other = spatial_015.cKDTree(data1)

In [35]:
%timeit kdtree.query_ball_tree(other,4.0)


1 loops, best of 3: 3.6 s per loop

SciPy 0.16.x - sliding midpoint


In [36]:
kdtree = spatial_016.cKDTree(data0, balanced_tree=False)
other = spatial_016.cKDTree(data1, balanced_tree=False)

In [37]:
%timeit kdtree.query_ball_tree(other,4.0)


1 loops, best of 3: 2.68 s per loop

SciPy 0.16.x - median


In [38]:
kdtree = spatial_016.cKDTree(data0, balanced_tree=True)
other = spatial_016.cKDTree(data1, balanced_tree=True)

In [39]:
%timeit kdtree.query_ball_tree(other,4.0)


1 loops, best of 3: 3.05 s per loop

PR4890 - sliding midpoint


In [40]:
kdtree = spatial_PR4890.cKDTree(data0, balanced_tree=False)
other = spatial_PR4890.cKDTree(data1, balanced_tree=False)

In [41]:
%timeit kdtree.query_ball_tree(other,4.0)


1 loops, best of 3: 2.65 s per loop

PR4890 - median


In [42]:
kdtree = spatial_PR4890.cKDTree(data0, balanced_tree=True)
other = spatial_PR4890.cKDTree(data1, balanced_tree=True)

In [43]:
%timeit kdtree.query_ball_tree(other,4.0)


1 loops, best of 3: 2.7 s per loop

Benchmark count_neighbors

SciPy 0.15.x - sliding midpoint


In [44]:
kdtree = spatial_015.cKDTree(data0)
other = spatial_015.cKDTree(data1)

In [45]:
%timeit kdtree.count_neighbors(other,4.0)


1 loops, best of 3: 14 s per loop

SciPy 0.16.x - sliding midpoint


In [46]:
kdtree = spatial_016.cKDTree(data0, balanced_tree=False)
other = spatial_016.cKDTree(data1, balanced_tree=False)

In [47]:
%timeit kdtree.count_neighbors(other,4.0)


1 loops, best of 3: 5.99 s per loop

SciPy 0.16.x - median


In [48]:
kdtree = spatial_016.cKDTree(data0, balanced_tree=True)
other = spatial_016.cKDTree(data1, balanced_tree=True)

In [49]:
%timeit kdtree.count_neighbors(other,4.0)


1 loops, best of 3: 5.87 s per loop

PR4890 - sliding midpoint


In [50]:
kdtree = spatial_PR4890.cKDTree(data0, balanced_tree=False)
other = spatial_PR4890.cKDTree(data1, balanced_tree=False)

In [51]:
%timeit kdtree.count_neighbors(other,4.0)


1 loops, best of 3: 2.48 s per loop

PR4890 - median


In [52]:
kdtree = spatial_PR4890.cKDTree(data0, balanced_tree=True)
other = spatial_PR4890.cKDTree(data1, balanced_tree=True)

In [53]:
%timeit kdtree.count_neighbors(other,4.0)


1 loops, best of 3: 2.52 s per loop

Benchmark query_pairs

SciPy 0.15.x - sliding midpoint


In [54]:
kdtree = spatial_015.cKDTree(data0)

In [55]:
%timeit kdtree.query_pairs(4.0)


1 loops, best of 3: 2 s per loop

SciPy 0.16.x - sliding midpoint


In [56]:
kdtree = spatial_016.cKDTree(data0, balanced_tree=False)

In [57]:
%timeit kdtree.query_pairs(4.0)


1 loops, best of 3: 1.45 s per loop

SciPy 0.16.x - median


In [58]:
kdtree = spatial_016.cKDTree(data0, balanced_tree=True)

In [59]:
%timeit kdtree.query_pairs(4.0)


1 loops, best of 3: 1.51 s per loop

PR4890 - sliding midpoint


In [60]:
kdtree = spatial_PR4890.cKDTree(data0, balanced_tree=False)

In [61]:
%timeit kdtree.query_pairs(4.0, output_type='set')


1 loops, best of 3: 1.45 s per loop

In [62]:
%timeit kdtree.query_pairs(4.0, output_type='ndarray')


1 loops, best of 3: 1.28 s per loop

PR4890 - median


In [63]:
kdtree = spatial_PR4890.cKDTree(data0, balanced_tree=True)

In [64]:
%timeit kdtree.query_pairs(4.0, output_type='set')


1 loops, best of 3: 1.48 s per loop

In [65]:
%timeit kdtree.query_pairs(4.0, output_type='ndarray')


1 loops, best of 3: 1.32 s per loop

Benchmark sparse_distance_matrix

SciPy 0.15.x - sliding midpoint


In [66]:
kdtree = spatial_015.cKDTree(data0)
other = spatial_015.cKDTree(data1)

In [67]:
%timeit kdtree.sparse_distance_matrix(other, 4.0)


1 loops, best of 3: 4.07 s per loop

SciPy 0.16.x - sliding midpoint


In [68]:
kdtree = spatial_016.cKDTree(data0, balanced_tree=False)
other = spatial_016.cKDTree(data1, balanced_tree=False)

In [69]:
%timeit kdtree.sparse_distance_matrix(other, 4.0)


1 loops, best of 3: 3.12 s per loop

SciPy 0.16.x - median


In [70]:
kdtree = spatial_016.cKDTree(data0, balanced_tree=True)
other = spatial_016.cKDTree(data1, balanced_tree=True)

In [71]:
%timeit kdtree.sparse_distance_matrix(other, 4.0)


1 loops, best of 3: 3.14 s per loop

PR4890 - sliding midpoint


In [72]:
kdtree = spatial_PR4890.cKDTree(data0, balanced_tree=False)
other = spatial_PR4890.cKDTree(data1, balanced_tree=False)

In [73]:
%timeit kdtree.sparse_distance_matrix(other, 4.0, output_type='dok_matrix')


1 loops, best of 3: 3.11 s per loop

In [74]:
%timeit kdtree.sparse_distance_matrix(other, 4.0, output_type='coo_matrix')


1 loops, best of 3: 2.52 s per loop

In [75]:
%timeit kdtree.sparse_distance_matrix(other, 4.0, output_type='dict')


1 loops, best of 3: 2.84 s per loop

In [76]:
%timeit kdtree.sparse_distance_matrix(other, 4.0, output_type='recarray')


1 loops, best of 3: 2.53 s per loop

PR4890 - median


In [77]:
kdtree = spatial_PR4890.cKDTree(data0, balanced_tree=True)
other = spatial_PR4890.cKDTree(data1, balanced_tree=True)

In [78]:
%timeit kdtree.sparse_distance_matrix(other, 4.0, output_type='dok_matrix')


1 loops, best of 3: 3.18 s per loop

In [79]:
%timeit kdtree.sparse_distance_matrix(other, 4.0, output_type='coo_matrix')


1 loops, best of 3: 2.58 s per loop

In [80]:
%timeit kdtree.sparse_distance_matrix(other, 4.0, output_type='dict')


1 loops, best of 3: 2.89 s per loop

In [81]:
%timeit kdtree.sparse_distance_matrix(other, 4.0, output_type='recarray')


1 loops, best of 3: 2.55 s per loop