Title: Split Data Into Training And Test Sets
Slug: split_data_into_training_and_test_sets
Summary: How to split data into training and test sets for machine learning in Python.
Date: 2017-09-15 12:00
Category: Machine Learning
Tags: Model Evaluation
Authors: Chris Albon
In [1]:
# Load libraries
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
In [2]:
# Load the digits dataset
digits = datasets.load_digits()
# Create the features matrix
X = digits.data
# Create the target vector
y = digits.target
In [3]:
# Create training and test sets
X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size=0.1,
random_state=1)
In [4]:
# Create standardizer
standardizer = StandardScaler()
# Fit standardizer to training set
standardizer.fit(X_train)
Out[4]:
In [5]:
# Apply to both training and test sets
X_train_std = standardizer.transform(X_train)
X_test_std = standardizer.transform(X_test)