首先通过观察数据,可以了解到每位旅客的详细数据:
通过对原始数据的初步观察可以发现存活率和社会等级,性别,年龄,在船上的兄弟姐妹和配偶数量,在船上的父母以及小孩的数量有着某种联系。因此根据初步推测可以提出以下几个问题并进行分析:
In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pylab as pl
%matplotlib inline
filename = './titanic-data.csv'
titanic_df = pd.read_csv(filename)
titanic_df.describe()
Out[2]:
In [26]:
titanic_df = titanic_df.fillna(method='pad')#用前一个数值填充
titanic_df.describe()
Out[26]:
In [36]:
sort_pclass = titanic_df.groupby('Pclass').count()['PassengerId']
print sort_pclass
titanic_df.groupby('Pclass')['PassengerId'].count().plot(kind = 'pie',autopct = '%.0f%%')
plt.title('Pclass VS Count')
plt.show()
In [37]:
Pclass_survived = titanic_df.groupby('Pclass').mean()['Survived']
print Pclass_survived.plot.bar()
In [46]:
sort_sex = titanic_df.groupby('Sex').count()['PassengerId']
print sort_sex
Sex_survived = titanic_df.groupby('Sex').mean()['Survived']
print Sex_survived
print Sex_survived.plot.bar()
In [47]:
titanic_df['Age_bins'] = pd.cut(titanic_df['Age'],range(0,80,10))
Age_survived = titanic_df.groupby('Age_bins').mean()['Survived']
Sort_survived = titanic_df.groupby('Age_bins').count()['Survived']
print Age_survived
print Sort_survived
In [48]:
Age_survived.plot(kind='bar', stacked=True)
Out[48]:
In [42]:
sort_SibSp = titanic_df.groupby('SibSp').count()['PassengerId']
print sort_SibSp
titanic_df.groupby('SibSp')['PassengerId'].count().plot(kind = 'pie',autopct = '%.0f%%')
plt.title('SibSp VS Count')
plt.show()
In [52]:
SibSp_survived = titanic_df.groupby('SibSp').mean()['Survived']
print SibSp_survived
SibSp_survived.plot.bar()
Out[52]:
In [49]:
sort_Parch = titanic_df.groupby('Parch').count()['PassengerId']
print sort_Parch
titanic_df.groupby('Parch')['PassengerId'].count().plot(kind = 'pie',autopct = '%.0f%%')
plt.title('Parch VS Count')
plt.show()
In [53]:
Parch_survived = titanic_df.groupby('Parch').mean()['Survived']
print Parch_survived
Parch_survived.plot.bar()
Out[53]:
根据对数据的一些分析得到的结果基本和猜测的一致,女性生存率比男性的3倍还要多,对舱位的大体分析可以看出头等舱二等舱的生存率是比较高的,这也客观的反映了当时对富人阶级的优待,可猜测在头等舱和二等舱放的救生艇会更多,关于年龄,只能明确得出0~10岁的生存率最高,老年人最低,符合常理。
In [ ]: