Title: Group Observations Using K-Means Clustering
Slug: group_observations_using_clustering
Summary: How to group observations using clustering for machine learning in Python.
Date: 2016-09-06 12:00
Category: Machine Learning
Tags: Preprocessing Structured Data
Authors: Chris Albon
In [1]:
# Load libraries
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
import pandas as pd
In [2]:
# Make simulated feature matrix
X, _ = make_blobs(n_samples = 50,
n_features = 2,
centers = 3,
random_state = 1)
# Create DataFrame
df = pd.DataFrame(X, columns=['feature_1','feature_2'])
In [3]:
# Make k-means clusterer
clusterer = KMeans(3, random_state=1)
# Fit clusterer
clusterer.fit(X)
Out[3]:
In [4]:
# Predict values
df['group'] = clusterer.predict(X)
# First few observations
df.head(5)
Out[4]: