Title: Find Nearest Neighbors
Slug: find_nearest_neighbors
Summary: Find a observation's nearest neighbors in scikit-learn.
Date: 2017-09-19 12:00
Category: Machine Learning
Tags: Nearest Neighbors
Authors: Chris Albon
In [1]:
# Load libraries
from sklearn.neighbors import NearestNeighbors
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
import numpy as np
In [2]:
# Load data
iris = datasets.load_iris()
X = iris.data
y = iris.target
In [3]:
# Create standardizer
standardizer = StandardScaler()
# Standardize features
X_std = standardizer.fit_transform(X)
In [4]:
# Find three nearest neighbors based on euclidean distance (including itself)
nn_euclidean = NearestNeighbors(n_neighbors=3, metric='euclidean').fit(X)
# List of lists indicating each observation's 3 nearest neighors
nearest_neighbors_with_self = nn_euclidean.kneighbors_graph(X).toarray()
# Remove 1's marking an observation is nearest to itself
for i, x in enumerate(nearest_neighbors_with_self):
x[i] = 0
In [5]:
# View first observation's two nearest neighbors
nearest_neighbors_with_self[0]
Out[5]: