Title: Make Simulated Data For Clustering
Slug: make_simulated_data_for_clustering
Summary: Make a simulated dataset for clustering.
Date: 2017-01-16 12:00
Category: Machine Learning
Tags: Basics
Authors: Chris Albon
Inspired by Python Machine Learning
In [1]:
    
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
    
In [2]:
    
# Make the features (X) and output (y) with 200 samples,
X, y = make_blobs(n_samples = 200,
                  # two feature variables,
                  n_features = 2,
                  # three clusters,
                  centers = 3,
                  # with .5 cluster standard deviation,
                  cluster_std = 0.5,
                  # shuffled,
                  shuffle = True)
    
In [3]:
    
# Create a scatterplot of the first and second features
plt.scatter(X[:,0],
            X[:,1])
# Show the scatterplot
plt.show()