Lab:使用决策树探索泰坦尼克号乘客存活情况

开始

在引导项目中,你研究了泰坦尼克号存活数据并能够对乘客存活情况作出预测。在该项目中,你手动构建了一个决策树,该决策树在每个阶段都会选择一个与存活情况最相关的特征。幸运的是,这正是决策树的运行原理!在此实验室中,我们将通过在 sklearn 中实现决策树使这一流程速度显著加快。

我们首先将加载数据集并显示某些行。


In [1]:
# Import libraries necessary for this project
import numpy as np
import pandas as pd
from IPython.display import display # Allows the use of display() for DataFrames

# Pretty display for notebooks
%matplotlib inline

# Set a random seed
import random
random.seed(42)

# Load the dataset
in_file = 'titanic_data.csv'
full_data = pd.read_csv(in_file)

# Print the first few entries of the RMS Titanic data
display(full_data.head())


PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

PassengerId Pclass Age SibSp Parch Fare Name_Abbing, Mr. Anthony Name_Abbott, Mr. Rossmore Edward Name_Abbott, Mrs. Stanton (Rosa Hunt) Name_Abelson, Mr. Samuel ... Cabin_F G73 Cabin_F2 Cabin_F33 Cabin_F38 Cabin_F4 Cabin_G6 Cabin_T Embarked_C Embarked_Q Embarked_S
0 1 3 22.0 1 0 7.2500 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1
1 2 1 38.0 1 0 71.2833 0 0 0 0 ... 0 0 0 0 0 0 0 1 0 0
2 3 3 26.0 0 0 7.9250 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1
3 4 1 35.0 1 0 53.1000 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1
4 5 3 35.0 0 0 8.0500 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1

5 rows × 1730 columns