1. Write a Python function using for loops to calculate the pairwise distance matrix given two sets of vectors. The function signature should look like this:
np.ndarray <- cdist(xs, ys, dist)
where xs
and ys
are collections of vectors, and dist
is some function for a distance metric. Write a function for the Euclidean distance metric with the signature:
float <- euclidean_distance(x, y)
Example of usage:
coords = [(35.0456, -85.2672),
(35.1174, -89.9711),
(35.9728, -83.9422),
(36.1667, -86.7833)]
cdist(coords, coords, euclidean_distance)
where coords
is interpreted as 4 row vectors of dimension 2 each. The result should be
Euclidean metric
array([[ 0. , 4.70444794, 1.6171966 , 1.88558331],
[ 4.70444794, 0. , 6.0892811 , 3.35605413],
[ 1.6171966 , 6.0892811 , 0. , 2.84770898],
[ 1.88558331, 3.35605413, 2.84770898, 0. ]])
Time the performance of your function for Euclidean distance metric using %timeit
for the given data sets XA
and XB
.
Use the library function scipy.spatial.distance.cdist
to see how much speed-up is achievable.
In [11]:
import numpy as np
np.random.seed(123)
n1 = 50
n2 = 100
p = 10
XA = np.random.normal(0, 1, (n1, p))
XB = np.random.normal(0, 1, (n2, p))
In [ ]:
2. Write a version cdist_numpy
to speed up calculations using vectorization and broadcasting. Check that it gives the correct answers on coords
and compare timings.
In [ ]:
3. Write a verison cdist_numba
using numba
JIT. Check that it gives the correct answers on coords
and compare timings.
In [ ]:
4. Write a verison cdist_cython
using Cython
AIT. Check that it gives the correct answers on coords
and compare timings.
In [ ]: