Title: Preprocessing Iris Data
Slug: preprocessing_iris_data
Summary: Preprocessing iris data using scikit learn.
Date: 2016-09-21 12:00
Category: Machine Learning
Tags: Preprocessing Structured Data
Authors: Chris Albon
In [45]:
from sklearn import datasets
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler
In [40]:
# Load the iris data
iris = datasets.load_iris()
# Create a variable for the feature data
X = iris.data
# Create a variable for the target data
y = iris.target
In [47]:
# Random split the data into four new datasets, training features, training outcome, test features,
# and test outcome. Set the size of the test data to be 30% of the full dataset.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
In [42]:
# Load the standard scaler
sc = StandardScaler()
# Compute the mean and standard deviation based on the training data
sc.fit(X_train)
# Scale the training data to be of mean 0 and of unit variance
X_train_std = sc.transform(X_train)
# Scale the test data to be of mean 0 and of unit variance
X_test_std = sc.transform(X_test)
In [43]:
# Feature Test Data, non-standardized
X_test[0:5]
Out[43]:
In [44]:
# Feature Test Data, standardized.
X_test_std[0:5]
Out[44]: