K-Nest-Neighbors

1 Unsupervised Neast Neighbors

It acts as a uniform unterface to three different nestest neighbors algorithms: BallTree, KDTree, and burte-force algorithm based on routines in sklearn.metrics.pariwise


In [1]:
from sklearn.neighbors import NearestNeighbors
import numpy as np
X = np.array([[-1,-1],[-2, -1],[1,1],[2,1],[3,2]])
nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)
distance, indices = nbrs.kneighbors(X)
indices


Out[1]:
array([[0, 1],
       [1, 0],
       [2, 3],
       [3, 2],
       [4, 3]])

In [3]:
distance


Out[3]:
array([[ 0.        ,  1.        ],
       [ 0.        ,  1.        ],
       [ 0.        ,  1.        ],
       [ 0.        ,  1.        ],
       [ 0.        ,  1.41421356]])

We can also use the KDTree and BallTree classes directly to find the nearest neighbors, alternatively.


In [4]:
from sklearn.neighbors import KDTree
import numpy as np
X = np.array([[-1,-1],[-2, -1],[1,1],[2,1],[3,2]])
kdt=KDTree(X, leaf_size=30, metric='euclidean')
kdt.query(X, k=2, return_distance=False)


Out[4]:
array([[0, 1],
       [1, 0],
       [2, 3],
       [3, 2],
       [4, 3]])

2 Nearest Neighbors Classification

scikit-learn implements two different nearest neghbors classifiers: KNeighborsClassifier which based on the k nearest neighbors of each query point, where k is specified by the users. RadiusNeighborsClassifier based ont the number of neighbors with in a fixed radius r of each training point.

3 Nearest Neighbors Regression

scikit-learn provides two different neighbors regressors: KNeighborsResgressor implements learning based on the k nearest neighbors of each query points. And RadiusNeighborsRegressor implements learning based on the neighbors within a fixed radius.

4 Nearest Neighbor Algorithm

  • Brute Force

calculate the all pairs of points in the dataset: for $N$ samples in $D$ dimension

  • K-D Tree

Reduce the required number of distance calculations by efficiently encoding aggerate distance information for the sample.

  • Ball Tree

Divide the data into nodes defined by a centroid $C$ and radius $r$, such that each point in the node lies within the hyper-sphere defined by $r$ and $C$.