Learning Objectives
In this lab, you explore and analyze data using a Pairplot, train a single Decision Tree, predict and evaluate the Decision Tree, and compare the Decision Tree model to a Random Forest. Recall that the Decision Tree algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, the decision tree algorithm can be used for solving both regression and classification problems too. Simply, the goal of using a Decision Tree is to create a training model that can use to predict the class or value of the target variable by learning simple decision rules inferred from prior data(training data).
Each learning objective will correspond to a #TODO in the student lab notebook -- try to complete that notebook first before reviewing this solution notebook.
In [1]:
!sudo chown -R jupyter:jupyter /home/jupyter/training-data-analyst
In [ ]:
# Ensure the right version of Tensorflow is installed.
!pip freeze | grep tensorflow==2.1 || pip install tensorflow==2.1
In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [ ]:
df = pd.read_csv('../kyphosis.csv')
In [ ]:
df.head()
Out[ ]:
In [ ]:
# TODO 1
sns.pairplot(df,hue='Kyphosis',palette='Set1')
Out[ ]:
In [ ]:
from sklearn.model_selection import train_test_split
In [ ]:
X = df.drop('Kyphosis',axis=1)
y = df['Kyphosis']
In [ ]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)
In [ ]:
from sklearn.tree import DecisionTreeClassifier
In [ ]:
dtree = DecisionTreeClassifier()
In [ ]:
# TODO 2
dtree.fit(X_train,y_train)
Out[ ]:
In [ ]:
predictions = dtree.predict(X_test)
In [ ]:
from sklearn.metrics import classification_report,confusion_matrix
In [ ]:
# TODO 3a
print(classification_report(y_test,predictions))
In [ ]:
# TODO 3b
print(confusion_matrix(y_test,predictions))
In [ ]:
from IPython.display import Image
from sklearn.externals.six import StringIO
from sklearn.tree import export_graphviz
import pydot
features = list(df.columns[1:])
features
Out[ ]:
In [ ]:
dot_data = StringIO()
export_graphviz(dtree, out_file=dot_data,feature_names=features,filled=True,rounded=True)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
Image(graph[0].create_png())
Out[ ]:
In [ ]:
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators=100)
rfc.fit(X_train, y_train)
Out[ ]:
In [ ]:
rfc_pred = rfc.predict(X_test)
In [ ]:
# TODO 4a
print(confusion_matrix(y_test,rfc_pred))
In [ ]:
# TODO 4b
print(classification_report(y_test,rfc_pred))
Copyright 2020 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.