Title: Loading scikit-learn's Boston Housing Dataset
Slug: loading_scikit-learns_boston_housing-dataset
Summary: Loading the built-in Boston housing datasets of scikit-learn.
Date: 2016-08-31 12:00
Category: Machine Learning
Tags: Basics
Authors: Chris Albon
In [2]:
# Load libraries
from sklearn import datasets
import matplotlib.pyplot as plt
The Boston housing dataset is a famous dataset from the 1970s. It contains 506 observations on housing prices around Boston. It is often used in regression examples and contains 15 features.
In [4]:
# Load digits dataset
boston = datasets.load_boston()
# Create feature matrix
X = boston.data
# Create target vector
y = boston.target
# View the first observation's feature values
X[0]
Out[4]:
As you can see, the features are not standardized. This is more easily seen if we display the values as decimals:
In [9]:
# Display each feature value of the first observation as floats
['{:f}'.format(x) for x in X[0]]
Out[9]:
Therefore, it is often beneficial and/or required to standardize the value of the features.