Title: Decision Tree regression
Slug: decision_tree_regression
Summary: Training a decision tree regression in scikit-learn.
Date: 2017-09-19 12:00
Category: Machine Learning
Tags: Trees And Forests
Authors: Chris Albon

Preliminaries


In [4]:
# Load libraries
from sklearn.tree import DecisionTreeRegressor
from sklearn import datasets

Load Boston Housing Dataset


In [5]:
# Load data with only two features
boston = datasets.load_boston()
X = boston.data[:,0:2]
y = boston.target

Create Decision Tree

Decision tree regression works similar to decision tree classification, however instead of reducing Gini impurity or entropy, potential splits are measured on how much they reduce the mean squared error (MSE):

$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

where $y_i$ is the true value of the target and $\hat{y}_i$ is the predicted value.


In [6]:
# Create decision tree classifer object
regr = DecisionTreeRegressor(random_state=0)

Train Model


In [7]:
# Train model
model = regr.fit(X, y)

Create Observation To Predict


In [8]:
# Make new observation
observation = [[0.02, 16]]
              
# Predict observation's value  
model.predict(observation)


Out[8]:
array([ 33.])