Learning Objectives
In this lab, you explore and analyze data using a Pairplot, train a single Decision Tree, predict and evaluate the Decision Tree, and compare the Decision Tree model to a Random Forest. Recall that the Decision Tree algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, the decision tree algorithm can be used for solving both regression and classification problems too. Simply, the goal of using a Decision Tree is to create a training model that can use to predict the class or value of the target variable by learning simple decision rules inferred from prior data(training data).
Each learning objective will correspond to a #TODO in this student lab notebook -- try to complete this notebook first and then review the solution notebook
In [1]:
!sudo chown -R jupyter:jupyter /home/jupyter/training-data-analyst
In [ ]:
# Ensure the right version of Tensorflow is installed.
!pip freeze | grep tensorflow==2.1 || pip install tensorflow==2.1
In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [ ]:
df = pd.read_csv('../kyphosis.csv')
In [ ]:
df.head()
Out[ ]:
Lab Task #1: Check a pairplot for this small dataset.
In [ ]:
# TODO 1
# TODO -- Your code here.
Out[ ]:
In [ ]:
from sklearn.model_selection import train_test_split
In [ ]:
X = df.drop('Kyphosis',axis=1)
y = df['Kyphosis']
In [ ]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)
Lab Task #2: Train a single decision tree.
In [ ]:
from sklearn.tree import DecisionTreeClassifier
In [ ]:
dtree = DecisionTreeClassifier()
In [ ]:
# TODO 2
# TODO -- Your code here.
Out[ ]:
Lab Task #3: Evaluate our decision tree.
In [ ]:
predictions = dtree.predict(X_test)
In [ ]:
from sklearn.metrics import classification_report,confusion_matrix
In [ ]:
# TODO 3a
# TODO -- Your code here.
In [ ]:
# TODO 3b
# TODO -- Your code here.
In [ ]:
from IPython.display import Image
from sklearn.externals.six import StringIO
from sklearn.tree import export_graphviz
import pydot
features = list(df.columns[1:])
features
Out[ ]:
In [ ]:
dot_data = StringIO()
export_graphviz(dtree, out_file=dot_data,feature_names=features,filled=True,rounded=True)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
Image(graph[0].create_png())
Out[ ]:
Lab Task #4: Compare the decision tree model to a random forest.
In [ ]:
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators=100)
rfc.fit(X_train, y_train)
Out[ ]:
In [ ]:
rfc_pred = rfc.predict(X_test)
In [ ]:
# TODO 4a
# TODO -- Your code here.
In [ ]:
# TODO 4b
# TODO -- Your code here.
Copyright 2020 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.