Highlights:
In [82]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
def plot_correlation_map( df ):
corr = data.corr()
_ , ax = plt.subplots( figsize =( 12 , 10 ) )
cmap = sns.diverging_palette( 220 , 10 , as_cmap = True )
_ = sns.heatmap(
corr,
cmap = cmap,
square=True,
cbar_kws={ 'shrink' : .9 },
ax=ax,
annot = True,
annot_kws = { 'fontsize' : 12 }
)
In [83]:
data = pd.read_csv('titanic.csv', na_filter=False)
data.head(20)
Out[83]:
In [84]:
print(data.columns.values)
In [85]:
data.describe()
Out[85]:
What can we do with in those cases?
What can we do with in those cases?
Who has not seen the Titanic film?
In [86]:
plot_correlation_map(data)
In [87]:
sns.countplot(data['Pclass'], hue=data['Survived'])
Out[87]:
In [88]:
sns.countplot(data['Sex'], hue=data['Survived'])
Out[88]:
In [89]:
sns.countplot('Embarked', hue='Survived', data=data)
Out[89]:
In [90]:
data.head(10)
Out[90]:
Tip: can we detect married passengers?
Or can we use the title? Mr, Master, etc.
In [91]:
data.head(10)
Out[91]:
Tip: can we use it to detect the size of the family?
Can you suppose the probability of survival if: singleton, small family and large family?
In [92]:
data['Family']= data['Parch']+ data['SibSp']+1
data.loc[data["Family"] == 1, "FamilySize"] = 'singleton'
data.loc[(data["Family"] > 1) & (data["Family"] < 5) , "FamilySize"] = 'small'
data.loc[data["Family"] >4, "FamilySize"] = 'large'
sns.countplot(data['FamilySize'],hue=data['Survived'])
Out[92]: