Titanic Data Exploration

Overview
Initial Exploration and Plotting
Exploratory Analysis by Variable
- Names
- Families
- Tickets
- Fares
- Cabins
- Embarkment
- Ages



In [334]:

    
import matplotlib.pyplot as plt
import scipy.stats as st
import seaborn as sns
import pandas as pd
import numpy as np

%matplotlib inline

train = pd.read_csv('train.csv', index_col='PassengerId')
test = pd.read_csv('test.csv', index_col='PassengerId')

Initial Exploration and Plotting



In [3]:

    
train.head()









    Out[3]:







  
    
      
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Ticket
      Fare
      Cabin
      Embarked
    
    
      PassengerId
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      1
      0
      3
      Braund, Mr. Owen Harris
      male
      22.0
      1
      0
      A/5 21171
      7.2500
      NaN
      S
    
    
      2
      1
      1
      Cumings, Mrs. John Bradley (Florence Briggs Th...
      female
      38.0
      1
      0
      PC 17599
      71.2833
      C85
      C
    
    
      3
      1
      3
      Heikkinen, Miss. Laina
      female
      26.0
      0
      0
      STON/O2. 3101282
      7.9250
      NaN
      S
    
    
      4
      1
      1
      Futrelle, Mrs. Jacques Heath (Lily May Peel)
      female
      35.0
      1
      0
      113803
      53.1000
      C123
      S
    
    
      5
      0
      3
      Allen, Mr. William Henry
      male
      35.0
      0
      0
      373450
      8.0500
      NaN
      S



In [4]:

    
train.info()









    



<class 'pandas.core.frame.DataFrame'>
Int64Index: 891 entries, 1 to 891
Data columns (total 11 columns):
Survived    891 non-null int64
Pclass      891 non-null int64
Name        891 non-null object
Sex         891 non-null object
Age         714 non-null float64
SibSp       891 non-null int64
Parch       891 non-null int64
Ticket      891 non-null object
Fare        891 non-null float64
Cabin       204 non-null object
Embarked    889 non-null object
dtypes: float64(2), int64(4), object(5)
memory usage: 83.5+ KB



In [5]:

    
test.info()









    



<class 'pandas.core.frame.DataFrame'>
Int64Index: 418 entries, 892 to 1309
Data columns (total 10 columns):
Pclass      418 non-null int64
Name        418 non-null object
Sex         418 non-null object
Age         332 non-null float64
SibSp       418 non-null int64
Parch       418 non-null int64
Ticket      418 non-null object
Fare        417 non-null float64
Cabin       91 non-null object
Embarked    418 non-null object
dtypes: float64(2), int64(3), object(5)
memory usage: 35.9+ KB



In [362]:

    
plt.figure(1, figsize=(6, 6))
sns.barplot(x='Sex', y='Survived', data=train)
plt.show()



In [343]:

    
s_ages = train.loc[train['Survived'] == 1, 'Age'].dropna()
d_ages = train.loc[train['Survived'] == 0, 'Age'].dropna()
s_fares = train.loc[train['Survived'] == 1, 'Fare'].add(1).apply(np.log).dropna()
d_fares = train.loc[train['Survived'] == 0, 'Fare'].add(1).apply(np.log).dropna()

plt.figure(2, figsize=(12, 8))
plt.subplot(231)
sns.barplot(x='Pclass', y='Survived', data=train)
plt.subplot(234)
sns.barplot(x='Embarked', y='Survived', data=train)
plt.subplot(233)
sns.barplot(x='SibSp', y='Survived', data=train)
plt.subplot(236)
sns.barplot(x='Parch', y='Survived', data=train)
plt.subplot(232)
sns.distplot(d_ages, color='C0')
sns.distplot(s_ages, color='C1')
plt.subplot(235)
sns.distplot(d_fares, color='C0')
sns.distplot(s_fares, color='C1')
plt.show()

Exploratory Analysis and Feature Engineering

Here, we'll explore the features of the dataset. Since Sex and PClass are rather clear-cut and have been explored in many other kernels, we will not explore those for now. We'll explore the related features SibSp and Parch together, as a "family size" feature group. Due to the large amount of missing values for Age, we will explore it last - after looking at the other features, we may come up with strategies for imputation.

Finally, we'll create several derived features if necessary.

Names

This doesn't seem like a very promising feature, but take a look:



In [351]:

    
train['Name'].head()









    Out[351]:





PassengerId
1                              Braund, Mr. Owen Harris
2    Cumings, Mrs. John Bradley (Florence Briggs Th...
3                               Heikkinen, Miss. Laina
4         Futrelle, Mrs. Jacques Heath (Lily May Peel)
5                             Allen, Mr. William Henry
Name: Name, dtype: object

The names look very consistently formatted, in the form of (last), (title). (first) (middle) Since there are only a handful of distinct titles (versus the largely unique names), we'll extract this information:



In [352]:

    
train['Title'] = train['Name'].str.extract('\,\s(.*?)[.]', expand=False)
print(train['Title'].unique())









    



['Mr' 'Mrs' 'Miss' 'Master' 'Don' 'Rev' 'Dr' 'Mme' 'Ms' 'Major' 'Lady'
 'Sir' 'Mlle' 'Col' 'Capt' 'the Countess' 'Jonkheer']



In [353]:

    
test['Title'] = test['Name'].str.extract('\,\s(.*?)[.]', expand=False)
print(test['Title'].unique())









    



['Mr' 'Mrs' 'Miss' 'Master' 'Ms' 'Col' 'Rev' 'Dr' 'Dona']

To start with, let's get an idea of how many passengers are holding each title.



In [366]:

    
plt.figure(3, figsize=(14, 4))
plt.subplot(121)
sns.countplot(train.loc[train['Sex'] == 'female', 'Title'])
plt.subplot(122)
sns.countplot(train.loc[train['Sex'] == 'male', 'Title'])
plt.show()

The low number of most of the titles suggest grouping up the more esoteric ones. We'll do so as follows (there are no hard rules, so we'll use some judgment):

Merge Mme. into Mrs. and Mlle. into Miss.
Merge Lady, the Countess, and Dona (from the test set) into a category of noblewomen.
Merge Don, Sir, and Jonkheer into a category of noblemen.
Merge Col, Capt, and Major into a category of military.

For 'Ms.', we'll look at the woman's age, and also check her party.



In [121]:

    
train[train['Title'] == 'Ms']









    Out[121]:







  
    
      
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Ticket
      Fare
      Cabin
      Embarked
      Title
    
    
      PassengerId
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      444
      1
      2
      Reynaldo, Ms. Encarnacion
      female
      28.0
      0
      0
      230434
      13.0
      NaN
      S
      Ms

Since she is relatively young and traveling alone, we'll throw her in with the "Miss" group.



In [359]:

    
title_map = {'Mr': 'Mr',
             'Mrs': 'Mrs',
             'Miss': 'Miss',
             'Master': 'Master',
             'Dr': 'Dr',
             'Rev': 'Rev',
             'Don': 'mnoble',
             'Sir': 'mnoble',
             'Jonkheer': 'mnoble',
             'Lady': 'fnoble',
             'the Countess': 'fnoble',
             'Dona': 'fnoble',
             'Col': 'mil',
             'Capt': 'mil',
             'Major': 'mil',
             'Mme': 'Mrs',
             'Mlle': 'Miss',
             'Ms': 'Miss'}

train['AdjTitle'] = train['Title'].map(title_map)
test['AdjTitle'] = test['Title'].map(title_map)

Let's see how these titles did:



In [365]:

    
plt.figure(4, figsize=(8, 4))
plt.subplot(121)
sns.barplot(x='AdjTitle', y='Survived', data=train[train['Sex'] == 'female'])
plt.subplot(122)
sns.barplot(x='AdjTitle', y='Survived', data=train[train['Sex'] == 'male'])
plt.show()

For women, it seems pretty clear-cut: The women with nobility titles survived (as did women on the whole). The men with titles (all except Rev.) seem to do better on average, but it's highly variable. Since the gender-based model where all women live and men die attains over a 76% accuracy, the hard part of our model seems to be picking out the few male survivors.

Family Size

Here we'll work with the SibSp and Parch features, which involve family size. To look for lone travelers, we'll first look at the distribution of the features added together, separated by gender:



In [364]:

    
train['FamSize'] = train['SibSp'] + train['Parch']
test['FamSize'] = test['SibSp'] + test['Parch']

plt.figure(5, figsize=(8, 4))
plt.subplot(121)
sns.countplot(train.loc[train['Sex'] == 'female', 'FamSize'])
plt.subplot(122)
sns.countplot(train.loc[train['Sex'] == 'male', 'FamSize'])
plt.show()



In [367]:

    
train['FamSize'] = train['SibSp'] + train['Parch']
test['FamSize'] = test['SibSp'] + test['Parch']

plt.figure(6, figsize=(12, 8))
plt.subplot(231)
sns.countplot(train.loc[(train['Sex'] == 'female') & (train['Pclass'] == 1), 'FamSize'])
plt.subplot(234)
sns.countplot(train.loc[(train['Sex'] == 'male') & (train['Pclass'] == 1), 'FamSize'])
plt.subplot(232)
sns.countplot(train.loc[(train['Sex'] == 'female') & (train['Pclass'] == 2), 'FamSize'])
plt.subplot(235)
sns.countplot(train.loc[(train['Sex'] == 'male') & (train['Pclass'] == 2), 'FamSize'])
plt.subplot(233)
sns.countplot(train.loc[(train['Sex'] == 'female') & (train['Pclass'] == 3), 'FamSize'])
plt.subplot(236)
sns.countplot(train.loc[(train['Sex'] == 'male') & (train['Pclass'] == 3), 'FamSize'])
plt.show()

How did this impact survival?



In [368]:

    
plt.figure(7, figsize=(8, 4))
plt.subplot(121)
sns.barplot(x='FamSize', y='Survived', data=train[train['Sex'] == 'female'])
plt.subplot(122)
sns.barplot(x='FamSize', y='Survived', data=train[train['Sex'] == 'male'])
plt.show()



In [369]:

    
plt.figure(9, figsize=(12, 8))
plt.subplot(231)
sns.barplot(x='FamSize', y='Survived', data=train[(train['Sex'] == 'female') & (train['Pclass'] == 1)])
plt.subplot(234)
sns.barplot(x='FamSize', y='Survived', data=train[(train['Sex'] == 'male') & (train['Pclass'] == 1)])
plt.subplot(232)
sns.barplot(x='FamSize', y='Survived', data=train[(train['Sex'] == 'female') & (train['Pclass'] == 2)])
plt.subplot(235)
sns.barplot(x='FamSize', y='Survived', data=train[(train['Sex'] == 'male') & (train['Pclass'] == 2)])
plt.subplot(233)
sns.barplot(x='FamSize', y='Survived', data=train[(train['Sex'] == 'female') & (train['Pclass'] == 3)])
plt.subplot(236)
sns.barplot(x='FamSize', y='Survived', data=train[(train['Sex'] == 'male') & (train['Pclass'] == 3)])
plt.show()

Let's ignore (for the moment) possible effects from passenger class. We can then draw the following conclusions:

Most men traveled alone. Those with families were generally in smaller ones. A huge amount of men traveled alone in third class; they had very low survival chances.
Many women traveled alone, but not as many as men. Larger groups consisted of mostly women.
In first and second class:
- Women seem to have roughly the same survival chance, independent of family size.
- Men with larger family sizes seem to have relatively higher chances of survival.
In third class:
- Women and men seem to have relatively higher chances of survival, up to a family size of 3.
- Women and men with a family size of 4 or higher had drastically lower odds of survival.

Tickets

One thing we can do with ticket numbers is scan for duplicates:



In [283]:

    
ticket_dupes = train[(train['Ticket'].duplicated(keep=False))].set_index('Ticket', append=True).swaplevel(0, 1).sort_index()
ticket_dupes









    Out[283]:







  
    
      
      
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Fare
      Cabin
      Embarked
      Title
      PTitle
      Child
      TicketSize
      AdjFare
      LogFare
      AdjTitle
      FamSize
    
    
      Ticket
      PassengerId
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      110152
      258
      1
      1
      Cherry, Miss. Gladys
      female
      30.00
      0
      0
      86.5000
      B77
      S
      Miss
      Miss
      False
      3
      28.833333
      3.395626
      Miss
      0
    
    
      505
      1
      1
      Maioni, Miss. Roberta
      female
      16.00
      0
      0
      86.5000
      B79
      S
      Miss
      Miss
      False
      3
      28.833333
      3.395626
      Miss
      0
    
    
      760
      1
      1
      Rothes, the Countess. of (Lucy Noel Martha Dye...
      female
      33.00
      0
      0
      86.5000
      B77
      S
      the Countess
      fnoble
      False
      3
      28.833333
      3.395626
      fnoble
      0
    
    
      110413
      263
      0
      1
      Taussig, Mr. Emil
      male
      52.00
      1
      1
      79.6500
      E67
      S
      Mr
      Mr
      False
      3
      26.550000
      3.316003
      Mr
      2
    
    
      559
      1
      1
      Taussig, Mrs. Emil (Tillie Mandelbaum)
      female
      39.00
      1
      1
      79.6500
      E67
      S
      Mrs
      Mrs
      False
      3
      26.550000
      3.316003
      Mrs
      2
    
    
      586
      1
      1
      Taussig, Miss. Ruth
      female
      18.00
      0
      2
      79.6500
      E68
      S
      Miss
      Miss
      False
      3
      26.550000
      3.316003
      Miss
      2
    
    
      110465
      111
      0
      1
      Porter, Mr. Walter Chamberlain
      male
      47.00
      0
      0
      52.0000
      C110
      S
      Mr
      Mr
      False
      2
      26.000000
      3.295837
      Mr
      0
    
    
      476
      0
      1
      Clifford, Mr. George Quincy
      male
      NaN
      0
      0
      52.0000
      A14
      S
      Mr
      Mr
      False
      2
      26.000000
      3.295837
      Mr
      0
    
    
      111361
      330
      1
      1
      Hippach, Miss. Jean Gertrude
      female
      16.00
      0
      1
      57.9792
      B18
      C
      Miss
      Miss
      False
      2
      28.989600
      3.400851
      Miss
      1
    
    
      524
      1
      1
      Hippach, Mrs. Louis Albert (Ida Sophia Fischer)
      female
      44.00
      0
      1
      57.9792
      B18
      C
      Mrs
      Mrs
      False
      2
      28.989600
      3.400851
      Mrs
      1
    
    
      113505
      167
      1
      1
      Chibnall, Mrs. (Edith Martha Bowerman)
      female
      NaN
      0
      1
      55.0000
      E33
      S
      Mrs
      Mrs
      False
      2
      27.500000
      3.349904
      Mrs
      1
    
    
      357
      1
      1
      Bowerman, Miss. Elsie Edith
      female
      22.00
      0
      1
      55.0000
      E33
      S
      Miss
      Miss
      False
      2
      27.500000
      3.349904
      Miss
      1
    
    
      113572
      62
      1
      1
      Icard, Miss. Amelie
      female
      38.00
      0
      0
      80.0000
      B28
      NaN
      Miss
      Miss
      False
      2
      40.000000
      3.713572
      Miss
      0
    
    
      830
      1
      1
      Stone, Mrs. George Nelson (Martha Evelyn)
      female
      62.00
      0
      0
      80.0000
      B28
      NaN
      Mrs
      Mrs
      False
      2
      40.000000
      3.713572
      Mrs
      0
    
    
      113760
      391
      1
      1
      Carter, Mr. William Ernest
      male
      36.00
      1
      2
      120.0000
      B96 B98
      S
      Mr
      Mr
      False
      4
      30.000000
      3.433987
      Mr
      3
    
    
      436
      1
      1
      Carter, Miss. Lucile Polk
      female
      14.00
      1
      2
      120.0000
      B96 B98
      S
      Miss
      Miss
      False
      4
      30.000000
      3.433987
      Miss
      3
    
    
      764
      1
      1
      Carter, Mrs. William Ernest (Lucile Polk)
      female
      36.00
      1
      2
      120.0000
      B96 B98
      S
      Mrs
      Mrs
      False
      4
      30.000000
      3.433987
      Mrs
      3
    
    
      803
      1
      1
      Carter, Master. William Thornton II
      male
      11.00
      1
      2
      120.0000
      B96 B98
      S
      Master
      Master
      True
      4
      30.000000
      3.433987
      Master
      3
    
    
      113776
      152
      1
      1
      Pears, Mrs. Thomas (Edith Wearne)
      female
      22.00
      1
      0
      66.6000
      C2
      S
      Mrs
      Mrs
      False
      2
      33.300000
      3.535145
      Mrs
      1
    
    
      337
      0
      1
      Pears, Mr. Thomas Clinton
      male
      29.00
      1
      0
      66.6000
      C2
      S
      Mr
      Mr
      False
      2
      33.300000
      3.535145
      Mr
      1
    
    
      113781
      298
      0
      1
      Allison, Miss. Helen Loraine
      female
      2.00
      1
      2
      151.5500
      C22 C26
      S
      Miss
      Miss
      True
      4
      37.887500
      3.660673
      Miss
      3
    
    
      306
      1
      1
      Allison, Master. Hudson Trevor
      male
      0.92
      1
      2
      151.5500
      C22 C26
      S
      Master
      Master
      True
      4
      37.887500
      3.660673
      Master
      3
    
    
      499
      0
      1
      Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
      female
      25.00
      1
      2
      151.5500
      C22 C26
      S
      Mrs
      Mrs
      False
      4
      37.887500
      3.660673
      Mrs
      3
    
    
      709
      1
      1
      Cleaver, Miss. Alice
      female
      22.00
      0
      0
      151.5500
      NaN
      S
      Miss
      Miss
      False
      4
      37.887500
      3.660673
      Miss
      0
    
    
      113789
      36
      0
      1
      Holverson, Mr. Alexander Oskar
      male
      42.00
      1
      0
      52.0000
      NaN
      S
      Mr
      Mr
      False
      2
      26.000000
      3.295837
      Mr
      1
    
    
      384
      1
      1
      Holverson, Mrs. Alexander Oskar (Mary Aline To...
      female
      35.00
      1
      0
      52.0000
      NaN
      S
      Mrs
      Mrs
      False
      2
      26.000000
      3.295837
      Mrs
      1
    
    
      113798
      271
      0
      1
      Cairns, Mr. Alexander
      male
      NaN
      0
      0
      31.0000
      NaN
      S
      Mr
      Mr
      False
      2
      15.500000
      2.803360
      Mr
      0
    
    
      843
      1
      1
      Serepeca, Miss. Augusta
      female
      30.00
      0
      0
      31.0000
      NaN
      C
      Miss
      Miss
      False
      2
      15.500000
      2.803360
      Miss
      0
    
    
      113803
      4
      1
      1
      Futrelle, Mrs. Jacques Heath (Lily May Peel)
      female
      35.00
      1
      0
      53.1000
      C123
      S
      Mrs
      Mrs
      False
      2
      26.550000
      3.316003
      Mrs
      1
    
    
      138
      0
      1
      Futrelle, Mr. Jacques Heath
      male
      37.00
      1
      0
      53.1000
      C123
      S
      Mr
      Mr
      False
      2
      26.550000
      3.316003
      Mr
      1
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      PC 17758
      506
      0
      1
      Penasco y Castellana, Mr. Victor de Satode
      male
      18.00
      1
      0
      108.9000
      C65
      C
      Mr
      Mr
      False
      2
      54.450000
      4.015482
      Mr
      1
    
    
      PC 17760
      270
      1
      1
      Bissette, Miss. Amelia
      female
      35.00
      0
      0
      135.6333
      C99
      S
      Miss
      Miss
      False
      3
      45.211100
      3.833220
      Miss
      0
    
    
      326
      1
      1
      Young, Miss. Marie Grice
      female
      36.00
      0
      0
      135.6333
      C32
      C
      Miss
      Miss
      False
      3
      45.211100
      3.833220
      Miss
      0
    
    
      374
      0
      1
      Ringhini, Mr. Sante
      male
      22.00
      0
      0
      135.6333
      NaN
      C
      Mr
      Mr
      False
      3
      45.211100
      3.833220
      Mr
      0
    
    
      PC 17761
      538
      1
      1
      LeRoy, Miss. Bertha
      female
      30.00
      0
      0
      106.4250
      NaN
      C
      Miss
      Miss
      False
      2
      53.212500
      3.992912
      Miss
      0
    
    
      545
      0
      1
      Douglas, Mr. Walter Donald
      male
      50.00
      1
      0
      106.4250
      C86
      C
      Mr
      Mr
      False
      2
      53.212500
      3.992912
      Mr
      1
    
    
      PP 9549
      11
      1
      3
      Sandstrom, Miss. Marguerite Rut
      female
      4.00
      1
      1
      16.7000
      G6
      S
      Miss
      Miss
      True
      2
      8.350000
      2.235376
      Miss
      2
    
    
      395
      1
      3
      Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengt...
      female
      24.00
      0
      2
      16.7000
      G6
      S
      Mrs
      Mrs
      False
      2
      8.350000
      2.235376
      Mrs
      2
    
    
      S.C./PARIS 2079
      818
      0
      2
      Mallet, Mr. Albert
      male
      31.00
      1
      1
      37.0042
      NaN
      C
      Mr
      Mr
      False
      2
      18.502100
      2.970522
      Mr
      2
    
    
      828
      1
      2
      Mallet, Master. Andre
      male
      1.00
      0
      2
      37.0042
      NaN
      C
      Master
      Master
      True
      2
      18.502100
      2.970522
      Master
      2
    
    
      S.O./P.P. 3
      773
      0
      2
      Mack, Mrs. (Mary)
      female
      57.00
      0
      0
      10.5000
      E77
      S
      Mrs
      Mrs
      False
      2
      5.250000
      1.832581
      Mrs
      0
    
    
      842
      0
      2
      Mudd, Mr. Thomas Charles
      male
      16.00
      0
      0
      10.5000
      NaN
      S
      Mr
      Mr
      False
      2
      5.250000
      1.832581
      Mr
      0
    
    
      S.O.C. 14879
      73
      0
      2
      Hood, Mr. Ambrose Jr
      male
      21.00
      0
      0
      73.5000
      NaN
      S
      Mr
      Mr
      False
      5
      14.700000
      2.753661
      Mr
      0
    
    
      121
      0
      2
      Hickman, Mr. Stanley George
      male
      21.00
      2
      0
      73.5000
      NaN
      S
      Mr
      Mr
      False
      5
      14.700000
      2.753661
      Mr
      2
    
    
      386
      0
      2
      Davies, Mr. Charles Henry
      male
      18.00
      0
      0
      73.5000
      NaN
      S
      Mr
      Mr
      False
      5
      14.700000
      2.753661
      Mr
      0
    
    
      656
      0
      2
      Hickman, Mr. Leonard Mark
      male
      24.00
      2
      0
      73.5000
      NaN
      S
      Mr
      Mr
      False
      5
      14.700000
      2.753661
      Mr
      2
    
    
      666
      0
      2
      Hickman, Mr. Lewis
      male
      32.00
      2
      0
      73.5000
      NaN
      S
      Mr
      Mr
      False
      5
      14.700000
      2.753661
      Mr
      2
    
    
      SC/Paris 2123
      44
      1
      2
      Laroche, Miss. Simonne Marie Anne Andree
      female
      3.00
      1
      2
      41.5792
      NaN
      C
      Miss
      Miss
      True
      3
      13.859733
      2.698655
      Miss
      3
    
    
      609
      1
      2
      Laroche, Mrs. Joseph (Juliette Marie Louise La...
      female
      22.00
      1
      2
      41.5792
      NaN
      C
      Mrs
      Mrs
      False
      3
      13.859733
      2.698655
      Mrs
      3
    
    
      686
      0
      2
      Laroche, Mr. Joseph Philippe Lemercier
      male
      25.00
      1
      2
      41.5792
      NaN
      C
      Mr
      Mr
      False
      3
      13.859733
      2.698655
      Mr
      3
    
    
      STON/O2. 3101279
      143
      1
      3
      Hakkarainen, Mrs. Pekka Pietari (Elin Matilda ...
      female
      24.00
      1
      0
      15.8500
      NaN
      S
      Mrs
      Mrs
      False
      2
      7.925000
      2.188856
      Mrs
      1
    
    
      404
      0
      3
      Hakkarainen, Mr. Pekka Pietari
      male
      28.00
      1
      0
      15.8500
      NaN
      S
      Mr
      Mr
      False
      2
      7.925000
      2.188856
      Mr
      1
    
    
      W./C. 6607
      784
      0
      3
      Johnston, Mr. Andrew G
      male
      NaN
      1
      2
      23.4500
      NaN
      S
      Mr
      Mr
      False
      2
      11.725000
      2.543569
      Mr
      3
    
    
      889
      0
      3
      Johnston, Miss. Catherine Helen "Carrie"
      female
      NaN
      1
      2
      23.4500
      NaN
      S
      Miss
      Miss
      False
      2
      11.725000
      2.543569
      Miss
      3
    
    
      W./C. 6608
      87
      0
      3
      Ford, Mr. William Neal
      male
      16.00
      1
      3
      34.3750
      NaN
      S
      Mr
      Mr
      False
      4
      8.593750
      2.261112
      Mr
      4
    
    
      148
      0
      3
      Ford, Miss. Robina Maggie "Ruby"
      female
      9.00
      2
      2
      34.3750
      NaN
      S
      Miss
      Miss
      True
      4
      8.593750
      2.261112
      Miss
      4
    
    
      437
      0
      3
      Ford, Miss. Doolina Margaret "Daisy"
      female
      21.00
      2
      2
      34.3750
      NaN
      S
      Miss
      Miss
      False
      4
      8.593750
      2.261112
      Miss
      4
    
    
      737
      0
      3
      Ford, Mrs. Edward (Margaret Ann Watson)
      female
      48.00
      1
      3
      34.3750
      NaN
      S
      Mrs
      Mrs
      False
      4
      8.593750
      2.261112
      Mrs
      4
    
    
      WE/P 5735
      541
      1
      1
      Crosby, Miss. Harriet R
      female
      36.00
      0
      2
      71.0000
      B22
      S
      Miss
      Miss
      False
      2
      35.500000
      3.597312
      Miss
      2
    
    
      746
      0
      1
      Crosby, Capt. Edward Gifford
      male
      70.00
      1
      1
      71.0000
      B22
      S
      Capt
      mil
      False
      2
      35.500000
      3.597312
      mil
      2
    
  

344 rows × 18 columns

We can check whether holders of duplicate tickets are likely to share cabins, fares, family size and embark location.



In [290]:

    
dupe_counts = ticket_dupes.reset_index().groupby('Ticket')[['Fare', 'Cabin', 'Embarked', 'FamSize']].nunique()
dupe_counts.describe()

It seems like most of them did. Let's take a look at the fares:



In [304]:

    
ticket_dupes.loc[dupe_counts[dupe_counts['Fare'] > 1].index.values]









    Out[304]:







  
    
      
      
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Fare
      Cabin
      Embarked
      Title
      PTitle
      Child
      TicketSize
      AdjFare
      LogFare
      AdjTitle
      FamSize
    
    
      Ticket
      PassengerId
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      7534
      139
      0
      3
      Osen, Mr. Olaf Elon
      male
      16.0
      0
      0
      9.2167
      NaN
      S
      Mr
      Mr
      False
      2
      4.60835
      1.724257
      Mr
      0
    
    
      877
      0
      3
      Gustafsson, Mr. Alfred Ossian
      male
      20.0
      0
      0
      9.8458
      NaN
      S
      Mr
      Mr
      False
      2
      4.92290
      1.778826
      Mr
      0

Only one pair of fares that are different (and not by much). For all we know, this could be an entry error, but let's ignore this for now. Let's look at embark locations:



In [306]:

    
ticket_dupes.loc[dupe_counts[dupe_counts['Embarked'] > 1].index.values]









    Out[306]:







  
    
      
      
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Fare
      Cabin
      Embarked
      Title
      PTitle
      Child
      TicketSize
      AdjFare
      LogFare
      AdjTitle
      FamSize
    
    
      Ticket
      PassengerId
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      113798
      271
      0
      1
      Cairns, Mr. Alexander
      male
      NaN
      0
      0
      31.0000
      NaN
      S
      Mr
      Mr
      False
      2
      15.5000
      2.80336
      Mr
      0
    
    
      843
      1
      1
      Serepeca, Miss. Augusta
      female
      30.0
      0
      0
      31.0000
      NaN
      C
      Miss
      Miss
      False
      2
      15.5000
      2.80336
      Miss
      0
    
    
      PC 17760
      270
      1
      1
      Bissette, Miss. Amelia
      female
      35.0
      0
      0
      135.6333
      C99
      S
      Miss
      Miss
      False
      3
      45.2111
      3.83322
      Miss
      0
    
    
      326
      1
      1
      Young, Miss. Marie Grice
      female
      36.0
      0
      0
      135.6333
      C32
      C
      Miss
      Miss
      False
      3
      45.2111
      3.83322
      Miss
      0
    
    
      374
      0
      1
      Ringhini, Mr. Sante
      male
      22.0
      0
      0
      135.6333
      NaN
      C
      Mr
      Mr
      False
      3
      45.2111
      3.83322
      Mr
      0

Only two! Though these could be mistakes, it is plausible that they did board at different locations, since they do not appear related to each other. Let's look at the last two variables:



In [307]:

    
ticket_dupes.loc[dupe_counts[dupe_counts['FamSize'] > 1].index.values]









    Out[307]:







  
    
      
      
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Fare
      Cabin
      Embarked
      Title
      PTitle
      Child
      TicketSize
      AdjFare
      LogFare
      AdjTitle
      FamSize
    
    
      Ticket
      PassengerId
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      113781
      298
      0
      1
      Allison, Miss. Helen Loraine
      female
      2.00
      1
      2
      151.5500
      C22 C26
      S
      Miss
      Miss
      True
      4
      37.887500
      3.660673
      Miss
      3
    
    
      306
      1
      1
      Allison, Master. Hudson Trevor
      male
      0.92
      1
      2
      151.5500
      C22 C26
      S
      Master
      Master
      True
      4
      37.887500
      3.660673
      Master
      3
    
    
      499
      0
      1
      Allison, Mrs. Hudson J C (Bessie Waldo Daniels)
      female
      25.00
      1
      2
      151.5500
      C22 C26
      S
      Mrs
      Mrs
      False
      4
      37.887500
      3.660673
      Mrs
      3
    
    
      709
      1
      1
      Cleaver, Miss. Alice
      female
      22.00
      0
      0
      151.5500
      NaN
      S
      Miss
      Miss
      False
      4
      37.887500
      3.660673
      Miss
      0
    
    
      11767
      311
      1
      1
      Hays, Miss. Margaret Bechstein
      female
      24.00
      0
      0
      83.1583
      C54
      C
      Miss
      Miss
      False
      2
      41.579150
      3.751365
      Miss
      0
    
    
      880
      1
      1
      Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
      female
      56.00
      0
      1
      83.1583
      C50
      C
      Mrs
      Mrs
      False
      2
      41.579150
      3.751365
      Mrs
      1
    
    
      12749
      521
      1
      1
      Perreault, Miss. Anne
      female
      30.00
      0
      0
      93.5000
      B73
      S
      Miss
      Miss
      False
      2
      46.750000
      3.865979
      Miss
      0
    
    
      821
      1
      1
      Hays, Mrs. Charles Melville (Clara Jennings Gr...
      female
      52.00
      1
      1
      93.5000
      B69
      S
      Mrs
      Mrs
      False
      2
      46.750000
      3.865979
      Mrs
      2
    
    
      13502
      276
      1
      1
      Andrews, Miss. Kornelia Theodosia
      female
      63.00
      1
      0
      77.9583
      D7
      S
      Miss
      Miss
      False
      3
      25.986100
      3.295322
      Miss
      1
    
    
      628
      1
      1
      Longley, Miss. Gretchen Fiske
      female
      21.00
      0
      0
      77.9583
      D9
      S
      Miss
      Miss
      False
      3
      25.986100
      3.295322
      Miss
      0
    
    
      766
      1
      1
      Hogeboom, Mrs. John C (Anna Andrews)
      female
      51.00
      1
      0
      77.9583
      D11
      S
      Mrs
      Mrs
      False
      3
      25.986100
      3.295322
      Mrs
      1
    
    
      16966
      320
      1
      1
      Spedden, Mrs. Frederic Oakley (Margaretta Corn...
      female
      40.00
      1
      1
      134.5000
      E34
      C
      Mrs
      Mrs
      False
      2
      67.250000
      4.223177
      Mrs
      2
    
    
      338
      1
      1
      Burns, Miss. Elizabeth Margaret
      female
      41.00
      0
      0
      134.5000
      E40
      C
      Miss
      Miss
      False
      2
      67.250000
      4.223177
      Miss
      0
    
    
      17421
      307
      1
      1
      Fleming, Miss. Margaret
      female
      NaN
      0
      0
      110.8833
      NaN
      C
      Miss
      Miss
      False
      4
      27.720825
      3.357622
      Miss
      0
    
    
      551
      1
      1
      Thayer, Mr. John Borland Jr
      male
      17.00
      0
      2
      110.8833
      C70
      C
      Mr
      Mr
      False
      4
      27.720825
      3.357622
      Mr
      2
    
    
      582
      1
      1
      Thayer, Mrs. John Borland (Marian Longstreth M...
      female
      39.00
      1
      1
      110.8833
      C68
      C
      Mrs
      Mrs
      False
      4
      27.720825
      3.357622
      Mrs
      2
    
    
      699
      0
      1
      Thayer, Mr. John Borland
      male
      49.00
      1
      1
      110.8833
      C68
      C
      Mr
      Mr
      False
      4
      27.720825
      3.357622
      Mr
      2
    
    
      19877
      291
      1
      1
      Barber, Miss. Ellen "Nellie"
      female
      26.00
      0
      0
      78.8500
      NaN
      S
      Miss
      Miss
      False
      2
      39.425000
      3.699448
      Miss
      0
    
    
      742
      0
      1
      Cavendish, Mr. Tyrell William
      male
      36.00
      1
      0
      78.8500
      C46
      S
      Mr
      Mr
      False
      2
      39.425000
      3.699448
      Mr
      1
    
    
      19928
      246
      0
      1
      Minahan, Dr. William Edward
      male
      44.00
      2
      0
      90.0000
      C78
      Q
      Dr
      Dr
      False
      2
      45.000000
      3.828641
      Dr
      2
    
    
      413
      1
      1
      Minahan, Miss. Daisy E
      female
      33.00
      1
      0
      90.0000
      C78
      Q
      Miss
      Miss
      False
      2
      45.000000
      3.828641
      Miss
      1
    
    
      24160
      690
      1
      1
      Madill, Miss. Georgette Alexandra
      female
      15.00
      0
      1
      211.3375
      B5
      S
      Miss
      Miss
      False
      3
      70.445833
      4.268940
      Miss
      1
    
    
      731
      1
      1
      Allen, Miss. Elisabeth Walton
      female
      29.00
      0
      0
      211.3375
      B5
      S
      Miss
      Miss
      False
      3
      70.445833
      4.268940
      Miss
      0
    
    
      780
      1
      1
      Robert, Mrs. Edward Scott (Elisabeth Walton Mc...
      female
      43.00
      0
      1
      211.3375
      B3
      S
      Mrs
      Mrs
      False
      3
      70.445833
      4.268940
      Mrs
      1
    
    
      243847
      218
      0
      2
      Jacobsohn, Mr. Sidney Samuel
      male
      42.00
      1
      0
      27.0000
      NaN
      S
      Mr
      Mr
      False
      2
      13.500000
      2.674149
      Mr
      1
    
    
      601
      1
      2
      Jacobsohn, Mrs. Sidney Samuel (Amy Frances Chr...
      female
      24.00
      2
      1
      27.0000
      NaN
      S
      Mrs
      Mrs
      False
      2
      13.500000
      2.674149
      Mrs
      3
    
    
      248727
      597
      1
      2
      Leitch, Miss. Jessie Wills
      female
      NaN
      0
      0
      33.0000
      NaN
      S
      Miss
      Miss
      False
      3
      11.000000
      2.484907
      Miss
      0
    
    
      721
      1
      2
      Harper, Miss. Annie Jessie "Nina"
      female
      6.00
      0
      1
      33.0000
      NaN
      S
      Miss
      Miss
      True
      3
      11.000000
      2.484907
      Miss
      1
    
    
      849
      0
      2
      Harper, Rev. John
      male
      28.00
      0
      1
      33.0000
      NaN
      S
      Rev
      Rev
      False
      3
      11.000000
      2.484907
      Rev
      1
    
    
      29106
      408
      1
      2
      Richards, Master. William Rowe
      male
      3.00
      1
      1
      18.7500
      NaN
      S
      Master
      Master
      True
      3
      6.250000
      1.981001
      Master
      2
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      371110
      518
      0
      3
      Ryan, Mr. Patrick
      male
      NaN
      0
      0
      24.1500
      NaN
      Q
      Mr
      Mr
      False
      3
      8.050000
      2.202765
      Mr
      0
    
    
      769
      0
      3
      Moran, Mr. Daniel J
      male
      NaN
      1
      0
      24.1500
      NaN
      Q
      Mr
      Mr
      False
      3
      8.050000
      2.202765
      Mr
      1
    
    
      A/4 48871
      566
      0
      3
      Davies, Mr. Alfred J
      male
      24.00
      2
      0
      24.1500
      NaN
      S
      Mr
      Mr
      False
      2
      12.075000
      2.570702
      Mr
      2
    
    
      812
      0
      3
      Lester, Mr. James
      male
      39.00
      0
      0
      24.1500
      NaN
      S
      Mr
      Mr
      False
      2
      12.075000
      2.570702
      Mr
      0
    
    
      PC 17485
      310
      1
      1
      Francatelli, Miss. Laura Mabel
      female
      30.00
      0
      0
      56.9292
      E36
      C
      Miss
      Miss
      False
      2
      28.464600
      3.383190
      Miss
      0
    
    
      600
      1
      1
      Duff Gordon, Sir. Cosmo Edmund ("Mr Morgan")
      male
      49.00
      1
      0
      56.9292
      A20
      C
      Sir
      mnoble
      False
      2
      28.464600
      3.383190
      mnoble
      1
    
    
      PC 17569
      32
      1
      1
      Spencer, Mrs. William Augustus (Marie Eugenie)
      female
      NaN
      1
      0
      146.5208
      B78
      C
      Mrs
      Mrs
      False
      2
      73.260400
      4.307578
      Mrs
      1
    
    
      196
      1
      1
      Lurette, Miss. Elise
      female
      58.00
      0
      0
      146.5208
      B80
      C
      Miss
      Miss
      False
      2
      73.260400
      4.307578
      Miss
      0
    
    
      PC 17572
      53
      1
      1
      Harper, Mrs. Henry Sleeper (Myna Haxtun)
      female
      49.00
      1
      0
      76.7292
      D33
      C
      Mrs
      Mrs
      False
      3
      25.576400
      3.280024
      Mrs
      1
    
    
      646
      1
      1
      Harper, Mr. Henry Sleeper
      male
      48.00
      1
      0
      76.7292
      D33
      C
      Mr
      Mr
      False
      3
      25.576400
      3.280024
      Mr
      1
    
    
      682
      1
      1
      Hassab, Mr. Hammad
      male
      27.00
      0
      0
      76.7292
      D49
      C
      Mr
      Mr
      False
      3
      25.576400
      3.280024
      Mr
      0
    
    
      PC 17582
      269
      1
      1
      Graham, Mrs. William Thompson (Edith Junkins)
      female
      58.00
      0
      1
      153.4625
      C125
      S
      Mrs
      Mrs
      False
      3
      51.154167
      3.954204
      Mrs
      1
    
    
      333
      0
      1
      Graham, Mr. George Edward
      male
      38.00
      0
      1
      153.4625
      C91
      S
      Mr
      Mr
      False
      3
      51.154167
      3.954204
      Mr
      1
    
    
      610
      1
      1
      Shutes, Miss. Elizabeth W
      female
      40.00
      0
      0
      153.4625
      C125
      S
      Miss
      Miss
      False
      3
      51.154167
      3.954204
      Miss
      0
    
    
      PC 17611
      335
      1
      1
      Frauenthal, Mrs. Henry William (Clara Heinshei...
      female
      NaN
      1
      0
      133.6500
      NaN
      S
      Mrs
      Mrs
      False
      2
      66.825000
      4.216931
      Mrs
      1
    
    
      661
      1
      1
      Frauenthal, Dr. Henry William
      male
      50.00
      2
      0
      133.6500
      NaN
      S
      Dr
      Dr
      False
      2
      66.825000
      4.216931
      Dr
      2
    
    
      PC 17755
      259
      1
      1
      Ward, Miss. Anna
      female
      35.00
      0
      0
      512.3292
      NaN
      C
      Miss
      Miss
      False
      3
      170.776400
      5.146194
      Miss
      0
    
    
      680
      1
      1
      Cardeza, Mr. Thomas Drake Martinez
      male
      36.00
      0
      1
      512.3292
      B51 B53 B55
      C
      Mr
      Mr
      False
      3
      170.776400
      5.146194
      Mr
      1
    
    
      738
      1
      1
      Lesurer, Mr. Gustave J
      male
      35.00
      0
      0
      512.3292
      B101
      C
      Mr
      Mr
      False
      3
      170.776400
      5.146194
      Mr
      0
    
    
      PC 17757
      381
      1
      1
      Bidois, Miss. Rosalie
      female
      42.00
      0
      0
      227.5250
      NaN
      C
      Miss
      Miss
      False
      4
      56.881250
      4.058393
      Miss
      0
    
    
      558
      0
      1
      Robbins, Mr. Victor
      male
      NaN
      0
      0
      227.5250
      NaN
      C
      Mr
      Mr
      False
      4
      56.881250
      4.058393
      Mr
      0
    
    
      701
      1
      1
      Astor, Mrs. John Jacob (Madeleine Talmadge Force)
      female
      18.00
      1
      0
      227.5250
      C62 C64
      C
      Mrs
      Mrs
      False
      4
      56.881250
      4.058393
      Mrs
      1
    
    
      717
      1
      1
      Endres, Miss. Caroline Louise
      female
      38.00
      0
      0
      227.5250
      C45
      C
      Miss
      Miss
      False
      4
      56.881250
      4.058393
      Miss
      0
    
    
      PC 17761
      538
      1
      1
      LeRoy, Miss. Bertha
      female
      30.00
      0
      0
      106.4250
      NaN
      C
      Miss
      Miss
      False
      2
      53.212500
      3.992912
      Miss
      0
    
    
      545
      0
      1
      Douglas, Mr. Walter Donald
      male
      50.00
      1
      0
      106.4250
      C86
      C
      Mr
      Mr
      False
      2
      53.212500
      3.992912
      Mr
      1
    
    
      S.O.C. 14879
      73
      0
      2
      Hood, Mr. Ambrose Jr
      male
      21.00
      0
      0
      73.5000
      NaN
      S
      Mr
      Mr
      False
      5
      14.700000
      2.753661
      Mr
      0
    
    
      121
      0
      2
      Hickman, Mr. Stanley George
      male
      21.00
      2
      0
      73.5000
      NaN
      S
      Mr
      Mr
      False
      5
      14.700000
      2.753661
      Mr
      2
    
    
      386
      0
      2
      Davies, Mr. Charles Henry
      male
      18.00
      0
      0
      73.5000
      NaN
      S
      Mr
      Mr
      False
      5
      14.700000
      2.753661
      Mr
      0
    
    
      656
      0
      2
      Hickman, Mr. Leonard Mark
      male
      24.00
      2
      0
      73.5000
      NaN
      S
      Mr
      Mr
      False
      5
      14.700000
      2.753661
      Mr
      2
    
    
      666
      0
      2
      Hickman, Mr. Lewis
      male
      32.00
      2
      0
      73.5000
      NaN
      S
      Mr
      Mr
      False
      5
      14.700000
      2.753661
      Mr
      2
    
  

72 rows × 18 columns



In [308]:

    
ticket_dupes.loc[dupe_counts[dupe_counts['Cabin'] > 1].index.values]









    Out[308]:







  
    
      
      
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Fare
      Cabin
      Embarked
      Title
      PTitle
      Child
      TicketSize
      AdjFare
      LogFare
      AdjTitle
      FamSize
    
    
      Ticket
      PassengerId
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      110152
      258
      1
      1
      Cherry, Miss. Gladys
      female
      30.0
      0
      0
      86.5000
      B77
      S
      Miss
      Miss
      False
      3
      28.833333
      3.395626
      Miss
      0
    
    
      505
      1
      1
      Maioni, Miss. Roberta
      female
      16.0
      0
      0
      86.5000
      B79
      S
      Miss
      Miss
      False
      3
      28.833333
      3.395626
      Miss
      0
    
    
      760
      1
      1
      Rothes, the Countess. of (Lucy Noel Martha Dye...
      female
      33.0
      0
      0
      86.5000
      B77
      S
      the Countess
      fnoble
      False
      3
      28.833333
      3.395626
      fnoble
      0
    
    
      110413
      263
      0
      1
      Taussig, Mr. Emil
      male
      52.0
      1
      1
      79.6500
      E67
      S
      Mr
      Mr
      False
      3
      26.550000
      3.316003
      Mr
      2
    
    
      559
      1
      1
      Taussig, Mrs. Emil (Tillie Mandelbaum)
      female
      39.0
      1
      1
      79.6500
      E67
      S
      Mrs
      Mrs
      False
      3
      26.550000
      3.316003
      Mrs
      2
    
    
      586
      1
      1
      Taussig, Miss. Ruth
      female
      18.0
      0
      2
      79.6500
      E68
      S
      Miss
      Miss
      False
      3
      26.550000
      3.316003
      Miss
      2
    
    
      110465
      111
      0
      1
      Porter, Mr. Walter Chamberlain
      male
      47.0
      0
      0
      52.0000
      C110
      S
      Mr
      Mr
      False
      2
      26.000000
      3.295837
      Mr
      0
    
    
      476
      0
      1
      Clifford, Mr. George Quincy
      male
      NaN
      0
      0
      52.0000
      A14
      S
      Mr
      Mr
      False
      2
      26.000000
      3.295837
      Mr
      0
    
    
      11767
      311
      1
      1
      Hays, Miss. Margaret Bechstein
      female
      24.0
      0
      0
      83.1583
      C54
      C
      Miss
      Miss
      False
      2
      41.579150
      3.751365
      Miss
      0
    
    
      880
      1
      1
      Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
      female
      56.0
      0
      1
      83.1583
      C50
      C
      Mrs
      Mrs
      False
      2
      41.579150
      3.751365
      Mrs
      1
    
    
      12749
      521
      1
      1
      Perreault, Miss. Anne
      female
      30.0
      0
      0
      93.5000
      B73
      S
      Miss
      Miss
      False
      2
      46.750000
      3.865979
      Miss
      0
    
    
      821
      1
      1
      Hays, Mrs. Charles Melville (Clara Jennings Gr...
      female
      52.0
      1
      1
      93.5000
      B69
      S
      Mrs
      Mrs
      False
      2
      46.750000
      3.865979
      Mrs
      2
    
    
      13502
      276
      1
      1
      Andrews, Miss. Kornelia Theodosia
      female
      63.0
      1
      0
      77.9583
      D7
      S
      Miss
      Miss
      False
      3
      25.986100
      3.295322
      Miss
      1
    
    
      628
      1
      1
      Longley, Miss. Gretchen Fiske
      female
      21.0
      0
      0
      77.9583
      D9
      S
      Miss
      Miss
      False
      3
      25.986100
      3.295322
      Miss
      0
    
    
      766
      1
      1
      Hogeboom, Mrs. John C (Anna Andrews)
      female
      51.0
      1
      0
      77.9583
      D11
      S
      Mrs
      Mrs
      False
      3
      25.986100
      3.295322
      Mrs
      1
    
    
      16966
      320
      1
      1
      Spedden, Mrs. Frederic Oakley (Margaretta Corn...
      female
      40.0
      1
      1
      134.5000
      E34
      C
      Mrs
      Mrs
      False
      2
      67.250000
      4.223177
      Mrs
      2
    
    
      338
      1
      1
      Burns, Miss. Elizabeth Margaret
      female
      41.0
      0
      0
      134.5000
      E40
      C
      Miss
      Miss
      False
      2
      67.250000
      4.223177
      Miss
      0
    
    
      17421
      307
      1
      1
      Fleming, Miss. Margaret
      female
      NaN
      0
      0
      110.8833
      NaN
      C
      Miss
      Miss
      False
      4
      27.720825
      3.357622
      Miss
      0
    
    
      551
      1
      1
      Thayer, Mr. John Borland Jr
      male
      17.0
      0
      2
      110.8833
      C70
      C
      Mr
      Mr
      False
      4
      27.720825
      3.357622
      Mr
      2
    
    
      582
      1
      1
      Thayer, Mrs. John Borland (Marian Longstreth M...
      female
      39.0
      1
      1
      110.8833
      C68
      C
      Mrs
      Mrs
      False
      4
      27.720825
      3.357622
      Mrs
      2
    
    
      699
      0
      1
      Thayer, Mr. John Borland
      male
      49.0
      1
      1
      110.8833
      C68
      C
      Mr
      Mr
      False
      4
      27.720825
      3.357622
      Mr
      2
    
    
      24160
      690
      1
      1
      Madill, Miss. Georgette Alexandra
      female
      15.0
      0
      1
      211.3375
      B5
      S
      Miss
      Miss
      False
      3
      70.445833
      4.268940
      Miss
      1
    
    
      731
      1
      1
      Allen, Miss. Elisabeth Walton
      female
      29.0
      0
      0
      211.3375
      B5
      S
      Miss
      Miss
      False
      3
      70.445833
      4.268940
      Miss
      0
    
    
      780
      1
      1
      Robert, Mrs. Edward Scott (Elisabeth Walton Mc...
      female
      43.0
      0
      1
      211.3375
      B3
      S
      Mrs
      Mrs
      False
      3
      70.445833
      4.268940
      Mrs
      1
    
    
      35273
      216
      1
      1
      Newell, Miss. Madeleine
      female
      31.0
      1
      0
      113.2750
      D36
      C
      Miss
      Miss
      False
      3
      37.758333
      3.657346
      Miss
      1
    
    
      394
      1
      1
      Newell, Miss. Marjorie
      female
      23.0
      1
      0
      113.2750
      D36
      C
      Miss
      Miss
      False
      3
      37.758333
      3.657346
      Miss
      1
    
    
      660
      0
      1
      Newell, Mr. Arthur Webster
      male
      58.0
      0
      2
      113.2750
      D48
      C
      Mr
      Mr
      False
      3
      37.758333
      3.657346
      Mr
      2
    
    
      PC 17485
      310
      1
      1
      Francatelli, Miss. Laura Mabel
      female
      30.0
      0
      0
      56.9292
      E36
      C
      Miss
      Miss
      False
      2
      28.464600
      3.383190
      Miss
      0
    
    
      600
      1
      1
      Duff Gordon, Sir. Cosmo Edmund ("Mr Morgan")
      male
      49.0
      1
      0
      56.9292
      A20
      C
      Sir
      mnoble
      False
      2
      28.464600
      3.383190
      mnoble
      1
    
    
      PC 17569
      32
      1
      1
      Spencer, Mrs. William Augustus (Marie Eugenie)
      female
      NaN
      1
      0
      146.5208
      B78
      C
      Mrs
      Mrs
      False
      2
      73.260400
      4.307578
      Mrs
      1
    
    
      196
      1
      1
      Lurette, Miss. Elise
      female
      58.0
      0
      0
      146.5208
      B80
      C
      Miss
      Miss
      False
      2
      73.260400
      4.307578
      Miss
      0
    
    
      PC 17572
      53
      1
      1
      Harper, Mrs. Henry Sleeper (Myna Haxtun)
      female
      49.0
      1
      0
      76.7292
      D33
      C
      Mrs
      Mrs
      False
      3
      25.576400
      3.280024
      Mrs
      1
    
    
      646
      1
      1
      Harper, Mr. Henry Sleeper
      male
      48.0
      1
      0
      76.7292
      D33
      C
      Mr
      Mr
      False
      3
      25.576400
      3.280024
      Mr
      1
    
    
      682
      1
      1
      Hassab, Mr. Hammad
      male
      27.0
      0
      0
      76.7292
      D49
      C
      Mr
      Mr
      False
      3
      25.576400
      3.280024
      Mr
      0
    
    
      PC 17582
      269
      1
      1
      Graham, Mrs. William Thompson (Edith Junkins)
      female
      58.0
      0
      1
      153.4625
      C125
      S
      Mrs
      Mrs
      False
      3
      51.154167
      3.954204
      Mrs
      1
    
    
      333
      0
      1
      Graham, Mr. George Edward
      male
      38.0
      0
      1
      153.4625
      C91
      S
      Mr
      Mr
      False
      3
      51.154167
      3.954204
      Mr
      1
    
    
      610
      1
      1
      Shutes, Miss. Elizabeth W
      female
      40.0
      0
      0
      153.4625
      C125
      S
      Miss
      Miss
      False
      3
      51.154167
      3.954204
      Miss
      0
    
    
      PC 17593
      140
      0
      1
      Giglio, Mr. Victor
      male
      24.0
      0
      0
      79.2000
      B86
      C
      Mr
      Mr
      False
      2
      39.600000
      3.703768
      Mr
      0
    
    
      790
      0
      1
      Guggenheim, Mr. Benjamin
      male
      46.0
      0
      0
      79.2000
      B82 B84
      C
      Mr
      Mr
      False
      2
      39.600000
      3.703768
      Mr
      0
    
    
      PC 17755
      259
      1
      1
      Ward, Miss. Anna
      female
      35.0
      0
      0
      512.3292
      NaN
      C
      Miss
      Miss
      False
      3
      170.776400
      5.146194
      Miss
      0
    
    
      680
      1
      1
      Cardeza, Mr. Thomas Drake Martinez
      male
      36.0
      0
      1
      512.3292
      B51 B53 B55
      C
      Mr
      Mr
      False
      3
      170.776400
      5.146194
      Mr
      1
    
    
      738
      1
      1
      Lesurer, Mr. Gustave J
      male
      35.0
      0
      0
      512.3292
      B101
      C
      Mr
      Mr
      False
      3
      170.776400
      5.146194
      Mr
      0
    
    
      PC 17757
      381
      1
      1
      Bidois, Miss. Rosalie
      female
      42.0
      0
      0
      227.5250
      NaN
      C
      Miss
      Miss
      False
      4
      56.881250
      4.058393
      Miss
      0
    
    
      558
      0
      1
      Robbins, Mr. Victor
      male
      NaN
      0
      0
      227.5250
      NaN
      C
      Mr
      Mr
      False
      4
      56.881250
      4.058393
      Mr
      0
    
    
      701
      1
      1
      Astor, Mrs. John Jacob (Madeleine Talmadge Force)
      female
      18.0
      1
      0
      227.5250
      C62 C64
      C
      Mrs
      Mrs
      False
      4
      56.881250
      4.058393
      Mrs
      1
    
    
      717
      1
      1
      Endres, Miss. Caroline Louise
      female
      38.0
      0
      0
      227.5250
      C45
      C
      Miss
      Miss
      False
      4
      56.881250
      4.058393
      Miss
      0
    
    
      PC 17760
      270
      1
      1
      Bissette, Miss. Amelia
      female
      35.0
      0
      0
      135.6333
      C99
      S
      Miss
      Miss
      False
      3
      45.211100
      3.833220
      Miss
      0
    
    
      326
      1
      1
      Young, Miss. Marie Grice
      female
      36.0
      0
      0
      135.6333
      C32
      C
      Miss
      Miss
      False
      3
      45.211100
      3.833220
      Miss
      0
    
    
      374
      0
      1
      Ringhini, Mr. Sante
      male
      22.0
      0
      0
      135.6333
      NaN
      C
      Mr
      Mr
      False
      3
      45.211100
      3.833220
      Mr
      0

We have many more duplicate values here; it's plausible different families could split tickets, or bring servants/maids. Also, for family size, it's worth remembering that non-married partners do not count toward SibSp.

Fares

We see that Fare is a highly right-skewed variable.



In [315]:

    
plt.figure(10)
sns.distplot(train['Fare'])
plt.show()
st.skew(train['Fare'])









    














    











    Out[315]:





4.7792532923723545

Let's look at the outliers values that are above 200...



In [323]:

    
train[train['Fare'] > 200]









    Out[323]:







  
    
      
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Ticket
      Fare
      Cabin
      Embarked
      Title
      PTitle
      Child
      TicketSize
      AdjFare
      LogFare
      AdjTitle
      FamSize
    
    
      PassengerId
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      28
      0
      1
      Fortune, Mr. Charles Alexander
      male
      19.0
      3
      2
      19950
      263.0000
      C23 C25 C27
      S
      Mr
      Mr
      False
      4
      65.750000
      4.200954
      Mr
      5
    
    
      89
      1
      1
      Fortune, Miss. Mabel Helen
      female
      23.0
      3
      2
      19950
      263.0000
      C23 C25 C27
      S
      Miss
      Miss
      False
      4
      65.750000
      4.200954
      Miss
      5
    
    
      119
      0
      1
      Baxter, Mr. Quigg Edmond
      male
      24.0
      0
      1
      PC 17558
      247.5208
      B58 B60
      C
      Mr
      Mr
      False
      2
      123.760400
      4.826395
      Mr
      1
    
    
      259
      1
      1
      Ward, Miss. Anna
      female
      35.0
      0
      0
      PC 17755
      512.3292
      NaN
      C
      Miss
      Miss
      False
      3
      170.776400
      5.146194
      Miss
      0
    
    
      300
      1
      1
      Baxter, Mrs. James (Helene DeLaudeniere Chaput)
      female
      50.0
      0
      1
      PC 17558
      247.5208
      B58 B60
      C
      Mrs
      Mrs
      False
      2
      123.760400
      4.826395
      Mrs
      1
    
    
      312
      1
      1
      Ryerson, Miss. Emily Borie
      female
      18.0
      2
      2
      PC 17608
      262.3750
      B57 B59 B63 B66
      C
      Miss
      Miss
      False
      2
      131.187500
      4.884221
      Miss
      4
    
    
      342
      1
      1
      Fortune, Miss. Alice Elizabeth
      female
      24.0
      3
      2
      19950
      263.0000
      C23 C25 C27
      S
      Miss
      Miss
      False
      4
      65.750000
      4.200954
      Miss
      5
    
    
      378
      0
      1
      Widener, Mr. Harry Elkins
      male
      27.0
      0
      2
      113503
      211.5000
      C82
      C
      Mr
      Mr
      False
      1
      211.500000
      5.358942
      Mr
      2
    
    
      381
      1
      1
      Bidois, Miss. Rosalie
      female
      42.0
      0
      0
      PC 17757
      227.5250
      NaN
      C
      Miss
      Miss
      False
      4
      56.881250
      4.058393
      Miss
      0
    
    
      439
      0
      1
      Fortune, Mr. Mark
      male
      64.0
      1
      4
      19950
      263.0000
      C23 C25 C27
      S
      Mr
      Mr
      False
      4
      65.750000
      4.200954
      Mr
      5
    
    
      528
      0
      1
      Farthing, Mr. John
      male
      NaN
      0
      0
      PC 17483
      221.7792
      C95
      S
      Mr
      Mr
      False
      1
      221.779200
      5.406181
      Mr
      0
    
    
      558
      0
      1
      Robbins, Mr. Victor
      male
      NaN
      0
      0
      PC 17757
      227.5250
      NaN
      C
      Mr
      Mr
      False
      4
      56.881250
      4.058393
      Mr
      0
    
    
      680
      1
      1
      Cardeza, Mr. Thomas Drake Martinez
      male
      36.0
      0
      1
      PC 17755
      512.3292
      B51 B53 B55
      C
      Mr
      Mr
      False
      3
      170.776400
      5.146194
      Mr
      1
    
    
      690
      1
      1
      Madill, Miss. Georgette Alexandra
      female
      15.0
      0
      1
      24160
      211.3375
      B5
      S
      Miss
      Miss
      False
      3
      70.445833
      4.268940
      Miss
      1
    
    
      701
      1
      1
      Astor, Mrs. John Jacob (Madeleine Talmadge Force)
      female
      18.0
      1
      0
      PC 17757
      227.5250
      C62 C64
      C
      Mrs
      Mrs
      False
      4
      56.881250
      4.058393
      Mrs
      1
    
    
      717
      1
      1
      Endres, Miss. Caroline Louise
      female
      38.0
      0
      0
      PC 17757
      227.5250
      C45
      C
      Miss
      Miss
      False
      4
      56.881250
      4.058393
      Miss
      0
    
    
      731
      1
      1
      Allen, Miss. Elisabeth Walton
      female
      29.0
      0
      0
      24160
      211.3375
      B5
      S
      Miss
      Miss
      False
      3
      70.445833
      4.268940
      Miss
      0
    
    
      738
      1
      1
      Lesurer, Mr. Gustave J
      male
      35.0
      0
      0
      PC 17755
      512.3292
      B101
      C
      Mr
      Mr
      False
      3
      170.776400
      5.146194
      Mr
      0
    
    
      743
      1
      1
      Ryerson, Miss. Susan Parker "Suzette"
      female
      21.0
      2
      2
      PC 17608
      262.3750
      B57 B59 B63 B66
      C
      Miss
      Miss
      False
      2
      131.187500
      4.884221
      Miss
      4
    
    
      780
      1
      1
      Robert, Mrs. Edward Scott (Elisabeth Walton Mc...
      female
      43.0
      0
      1
      24160
      211.3375
      B3
      S
      Mrs
      Mrs
      False
      3
      70.445833
      4.268940
      Mrs
      1

We see that almost all of the fares have shared cabins and shared tickets. Let's test the theory that 'Fare' refers to a group fare of all tickets with the same number, rather than fare per ticket:



In [371]:

    
train['TicketSize'] = train['Ticket'].value_counts()[train['Ticket']].values
test['TicketSize'] = test['Ticket'].value_counts()[test['Ticket']].values



In [372]:

    
plt.figure(11, figsize=(12, 4))
plt.subplot(131)
sns.regplot(x='TicketSize', y='Fare', data=train[train['Pclass'] == 1])
plt.subplot(132)
sns.regplot(x='TicketSize', y='Fare', data=train[train['Pclass'] == 2])
plt.subplot(133)
sns.regplot(x='TicketSize', y='Fare', data=train[train['Pclass'] == 3])
plt.show()

Let's assume that it is linear. We'll divide by the ticket size, and look at the skew for each class:



In [373]:

    
train['AdjFare'] = train['Fare'].div(train['TicketSize'])
g = sns.FacetGrid(train, col='Pclass')
g = g.map(plt.hist, 'AdjFare')
plt.show()
train.groupby('Pclass')['AdjFare'].apply(st.skew)









    












    Out[373]:





Pclass
1    3.120576
2    1.040021
3    2.319343
Name: AdjFare, dtype: float64

This is still somewhat right skewed. If we want, we can later use a square root transform; however, for now, we will leave the Fare as is.

Cabins

Only a fraction of the passengers have known cabin information. We'll create a feature called CabinKnown that indicates if the cabin is given. Let's see if a known cabin is related to survival:



In [11]:

    
train['CabinKnown'] = train['Cabin'].notnull()
pd.crosstab(train['CabinKnown'], train['Survived'])



In [39]:

    
plt.figure(2)
sns.barplot(x='CabinKnown', y='Survived', data=train)
plt.show()

Let's also search for duplicate cabins, since that may indicate party size and help impute missing values.



In [14]:

    
train[(train['Cabin'].duplicated(keep=False)) & (train['Cabin'].notnull())].set_index('Cabin', append=True).swaplevel(0, 1).sort_index()









    Out[14]:







  
    
      
      
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Ticket
      Fare
      Embarked
      CabinKnown
    
    
      Cabin
      PassengerId
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      B18
      330
      1
      1
      Hippach, Miss. Jean Gertrude
      female
      16.0
      0
      1
      111361
      57.9792
      C
      True
    
    
      524
      1
      1
      Hippach, Mrs. Louis Albert (Ida Sophia Fischer)
      female
      44.0
      0
      1
      111361
      57.9792
      C
      True
    
    
      B20
      691
      1
      1
      Dick, Mr. Albert Adrian
      male
      31.0
      1
      0
      17474
      57.0000
      S
      True
    
    
      782
      1
      1
      Dick, Mrs. Albert Adrian (Vera Gillespie)
      female
      17.0
      1
      0
      17474
      57.0000
      S
      True
    
    
      B22
      541
      1
      1
      Crosby, Miss. Harriet R
      female
      36.0
      0
      2
      WE/P 5735
      71.0000
      S
      True
    
    
      746
      0
      1
      Crosby, Capt. Edward Gifford
      male
      70.0
      1
      1
      WE/P 5735
      71.0000
      S
      True
    
    
      B28
      62
      1
      1
      Icard, Miss. Amelie
      female
      38.0
      0
      0
      113572
      80.0000
      NaN
      True
    
    
      830
      1
      1
      Stone, Mrs. George Nelson (Martha Evelyn)
      female
      62.0
      0
      0
      113572
      80.0000
      NaN
      True
    
    
      B35
      370
      1
      1
      Aubart, Mme. Leontine Pauline
      female
      24.0
      0
      0
      PC 17477
      69.3000
      C
      True
    
    
      642
      1
      1
      Sagesser, Mlle. Emma
      female
      24.0
      0
      0
      PC 17477
      69.3000
      C
      True
    
    
      B49
      292
      1
      1
      Bishop, Mrs. Dickinson H (Helen Walton)
      female
      19.0
      1
      0
      11967
      91.0792
      C
      True
    
    
      485
      1
      1
      Bishop, Mr. Dickinson H
      male
      25.0
      1
      0
      11967
      91.0792
      C
      True
    
    
      B5
      690
      1
      1
      Madill, Miss. Georgette Alexandra
      female
      15.0
      0
      1
      24160
      211.3375
      S
      True
    
    
      731
      1
      1
      Allen, Miss. Elisabeth Walton
      female
      29.0
      0
      0
      24160
      211.3375
      S
      True
    
    
      B51 B53 B55
      680
      1
      1
      Cardeza, Mr. Thomas Drake Martinez
      male
      36.0
      0
      1
      PC 17755
      512.3292
      C
      True
    
    
      873
      0
      1
      Carlsson, Mr. Frans Olof
      male
      33.0
      0
      0
      695
      5.0000
      S
      True
    
    
      B57 B59 B63 B66
      312
      1
      1
      Ryerson, Miss. Emily Borie
      female
      18.0
      2
      2
      PC 17608
      262.3750
      C
      True
    
    
      743
      1
      1
      Ryerson, Miss. Susan Parker "Suzette"
      female
      21.0
      2
      2
      PC 17608
      262.3750
      C
      True
    
    
      B58 B60
      119
      0
      1
      Baxter, Mr. Quigg Edmond
      male
      24.0
      0
      1
      PC 17558
      247.5208
      C
      True
    
    
      300
      1
      1
      Baxter, Mrs. James (Helene DeLaudeniere Chaput)
      female
      50.0
      0
      1
      PC 17558
      247.5208
      C
      True
    
    
      B77
      258
      1
      1
      Cherry, Miss. Gladys
      female
      30.0
      0
      0
      110152
      86.5000
      S
      True
    
    
      760
      1
      1
      Rothes, the Countess. of (Lucy Noel Martha Dye...
      female
      33.0
      0
      0
      110152
      86.5000
      S
      True
    
    
      B96 B98
      391
      1
      1
      Carter, Mr. William Ernest
      male
      36.0
      1
      2
      113760
      120.0000
      S
      True
    
    
      436
      1
      1
      Carter, Miss. Lucile Polk
      female
      14.0
      1
      2
      113760
      120.0000
      S
      True
    
    
      764
      1
      1
      Carter, Mrs. William Ernest (Lucile Polk)
      female
      36.0
      1
      2
      113760
      120.0000
      S
      True
    
    
      803
      1
      1
      Carter, Master. William Thornton II
      male
      11.0
      1
      2
      113760
      120.0000
      S
      True
    
    
      C123
      4
      1
      1
      Futrelle, Mrs. Jacques Heath (Lily May Peel)
      female
      35.0
      1
      0
      113803
      53.1000
      S
      True
    
    
      138
      0
      1
      Futrelle, Mr. Jacques Heath
      male
      37.0
      1
      0
      113803
      53.1000
      S
      True
    
    
      C124
      332
      0
      1
      Partner, Mr. Austen
      male
      45.5
      0
      0
      113043
      28.5000
      S
      True
    
    
      712
      0
      1
      Klaber, Mr. Herman
      male
      NaN
      0
      0
      113028
      26.5500
      S
      True
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      E101
      304
      1
      2
      Keane, Miss. Nora A
      female
      NaN
      0
      0
      226593
      12.3500
      Q
      True
    
    
      718
      1
      2
      Troutt, Miss. Edwina Celia "Winnie"
      female
      27.0
      0
      0
      34218
      10.5000
      S
      True
    
    
      E121
      752
      1
      3
      Moor, Master. Meier
      male
      6.0
      0
      1
      392096
      12.4750
      S
      True
    
    
      824
      1
      3
      Moor, Mrs. (Beila)
      female
      27.0
      0
      1
      392096
      12.4750
      S
      True
    
    
      E24
      702
      1
      1
      Silverthorne, Mr. Spencer Victor
      male
      35.0
      0
      0
      PC 17475
      26.2875
      S
      True
    
    
      708
      1
      1
      Calderhead, Mr. Edward Pennington
      male
      42.0
      0
      0
      PC 17476
      26.2875
      S
      True
    
    
      E25
      513
      1
      1
      McGough, Mr. James Robert
      male
      36.0
      0
      0
      PC 17473
      26.2875
      S
      True
    
    
      573
      1
      1
      Flynn, Mr. John Irwin ("Irving")
      male
      36.0
      0
      0
      PC 17474
      26.3875
      S
      True
    
    
      E33
      167
      1
      1
      Chibnall, Mrs. (Edith Martha Bowerman)
      female
      NaN
      0
      1
      113505
      55.0000
      S
      True
    
    
      357
      1
      1
      Bowerman, Miss. Elsie Edith
      female
      22.0
      0
      1
      113505
      55.0000
      S
      True
    
    
      E44
      435
      0
      1
      Silvey, Mr. William Baird
      male
      50.0
      1
      0
      13507
      55.9000
      S
      True
    
    
      578
      1
      1
      Silvey, Mrs. William Baird (Alice Munger)
      female
      39.0
      1
      0
      13507
      55.9000
      S
      True
    
    
      E67
      263
      0
      1
      Taussig, Mr. Emil
      male
      52.0
      1
      1
      110413
      79.6500
      S
      True
    
    
      559
      1
      1
      Taussig, Mrs. Emil (Tillie Mandelbaum)
      female
      39.0
      1
      1
      110413
      79.6500
      S
      True
    
    
      E8
      725
      1
      1
      Chambers, Mr. Norman Campbell
      male
      27.0
      1
      0
      113806
      53.1000
      S
      True
    
    
      810
      1
      1
      Chambers, Mrs. Norman Campbell (Bertha Griggs)
      female
      33.0
      1
      0
      113806
      53.1000
      S
      True
    
    
      F G73
      76
      0
      3
      Moen, Mr. Sigurd Hansen
      male
      25.0
      0
      0
      348123
      7.6500
      S
      True
    
    
      716
      0
      3
      Soholt, Mr. Peter Andreas Lauritz Andersen
      male
      19.0
      0
      0
      348124
      7.6500
      S
      True
    
    
      F2
      149
      0
      2
      Navratil, Mr. Michel ("Louis M Hoffman")
      male
      36.5
      0
      2
      230080
      26.0000
      S
      True
    
    
      194
      1
      2
      Navratil, Master. Michel M
      male
      3.0
      1
      1
      230080
      26.0000
      S
      True
    
    
      341
      1
      2
      Navratil, Master. Edmond Roger
      male
      2.0
      1
      1
      230080
      26.0000
      S
      True
    
    
      F33
      67
      1
      2
      Nye, Mrs. (Elizabeth Ramell)
      female
      29.0
      0
      0
      C.A. 29395
      10.5000
      S
      True
    
    
      346
      1
      2
      Brown, Miss. Amelia "Mildred"
      female
      24.0
      0
      0
      248733
      13.0000
      S
      True
    
    
      517
      1
      2
      Lemore, Mrs. (Amelia Milley)
      female
      34.0
      0
      0
      C.A. 34260
      10.5000
      S
      True
    
    
      F4
      184
      1
      2
      Becker, Master. Richard F
      male
      1.0
      2
      1
      230136
      39.0000
      S
      True
    
    
      619
      1
      2
      Becker, Miss. Marion Louise
      female
      4.0
      2
      1
      230136
      39.0000
      S
      True
    
    
      G6
      11
      1
      3
      Sandstrom, Miss. Marguerite Rut
      female
      4.0
      1
      1
      PP 9549
      16.7000
      S
      True
    
    
      206
      0
      3
      Strom, Miss. Telma Matilda
      female
      2.0
      0
      1
      347054
      10.4625
      S
      True
    
    
      252
      0
      3
      Strom, Mrs. Wilhelm (Elna Matilda Persson)
      female
      29.0
      1
      1
      347054
      10.4625
      S
      True
    
    
      395
      1
      3
      Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengt...
      female
      24.0
      0
      2
      PP 9549
      16.7000
      S
      True
    
  

103 rows × 11 columns

Embark Location

Here are the two missing values for embarkment:



In [329]:

    
train[train['Embarked'].isnull()]









    Out[329]:







  
    
      
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Ticket
      Fare
      Cabin
      Embarked
      Title
      PTitle
      Child
      TicketSize
      AdjFare
      LogFare
      AdjTitle
      FamSize
    
    
      PassengerId
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      62
      1
      1
      Icard, Miss. Amelie
      female
      38.0
      0
      0
      113572
      80.0
      B28
      NaN
      Miss
      Miss
      False
      2
      40.0
      3.713572
      Miss
      0
    
    
      830
      1
      1
      Stone, Mrs. George Nelson (Martha Evelyn)
      female
      62.0
      0
      0
      113572
      80.0
      B28
      NaN
      Mrs
      Mrs
      False
      2
      40.0
      3.713572
      Mrs
      0

Let's go by the ticket number, since with the different patterns, it's likely ticket number format can match embarkment location. We'll scan all tickets that begin with '113' and see if there is a pattern:



In [333]:

    
train.loc[train['Ticket'].str.startswith('113'), 'Embarked'].value_counts()









    Out[333]:





S    41
C     4
Name: Embarked, dtype: int64

Seems like an overwhelming number of '113' tickets boarded at Southampton. 'S' it is!

Age

Let's start by taking a look at the distribution of ages. Since survival and gender are so strongly correlated, we'll split up age by gender as well.



In [212]:

    
fs_ages = train.loc[(train['Survived'] == 1) & (train['Sex'] == "female"), 'Age'].dropna()
fd_ages = train.loc[(train['Survived'] == 0) & (train['Sex'] == "female"), 'Age'].dropna()
ms_ages = train.loc[(train['Survived'] == 1) & (train['Sex'] == "male"), 'Age'].dropna()
md_ages = train.loc[(train['Survived'] == 0) & (train['Sex'] == "male"), 'Age'].dropna()

plt.figure(10, figsize=(9, 9))
plt.subplot(211)
sns.distplot(fs_ages, bins=range(81), kde=False, color='C1')
sns.distplot(fd_ages, bins=range(81), kde=False, color='C0', axlabel='Female Age')
plt.subplot(212)
sns.distplot(ms_ages, bins=range(81), kde=False, color='C1')
sns.distplot(md_ages, bins=range(81), kde=False, color='C0', axlabel='Male Age')
plt.show()

There's obviously a dichomotomy in both graphs: We see that teenaged or older males had a very poor survival rate compared with younger males. It seems back in the day, teenage boys were not considered "children."

For females, age seems to matter much less. There is a cutoff with about 50/50 survival rate (very young children dependent on others?) somewhere around 11 to 15.

To find a good cutoff point for "child" versus "adult," we can zoom our data in around ages 10-15.



In [148]:

    
train.loc[(train['Age'] < 15) & (train['Age'] > 10), ['Age', 'Survived', 'Sex']].sort_values(['Sex', 'Age'])









    Out[148]:







  
    
      
      Age
      Survived
      Sex
    
    
      PassengerId
      
      
      
    
  
  
    
      543
      11.0
      0
      female
    
    
      447
      13.0
      1
      female
    
    
      781
      13.0
      1
      female
    
    
      10
      14.0
      1
      female
    
    
      15
      14.0
      0
      female
    
    
      40
      14.0
      1
      female
    
    
      436
      14.0
      1
      female
    
    
      112
      14.5
      0
      female
    
    
      60
      11.0
      0
      male
    
    
      732
      11.0
      0
      male
    
    
      803
      11.0
      1
      male
    
    
      126
      12.0
      1
      male
    
    
      684
      14.0
      0
      male
    
    
      687
      14.0
      0
      male

We can set the cutoff at 12 or below to be considered a "child," and 13 or above to be considered an "adult." This will capture the border cases of two 13 year old girls surviving, and a 12 year old boy surviving.



In [149]:

    
train['Child'] = train['Age'] <= 12

Missing Ages

Something we observed is that there are no missing titles. One possibility to impute missing ages would be to check how ages are distributed among titles:



In [203]:

    
train.loc[train['AdjTitle'] == 'Master', 'Age'].describe()









    Out[203]:





count    36.000000
mean      4.574167
std       3.619872
min       0.420000
25%       1.000000
50%       3.500000
75%       8.000000
max      12.000000
Name: Age, dtype: float64



In [204]:

    
train.loc[train['AdjTitle'] == 'Mr', 'Age'].describe()









    Out[204]:





count    398.000000
mean      32.368090
std       12.708793
min       11.000000
25%       23.000000
50%       30.000000
75%       39.000000
max       80.000000
Name: Age, dtype: float64

So any male with the title 'Master' is no more than 12! This puts him in the (luckier) basket of male children, increasing his surival odds. Similarly, most (but not all) males who are 'Mr' are above 12, making them less likely to be lucky.



In [198]:

    
train[train['Title'] == 'Miss']['Age'].describe()









    Out[198]:





count    146.000000
mean      21.773973
std       12.990292
min        0.750000
25%       14.125000
50%       21.000000
75%       30.000000
max       63.000000
Name: Age, dtype: float64

	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
PassengerId
1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	S
2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	0	PC 17599	71.2833	C85	C
3	1	3	Heikkinen, Miss. Laina	female	26.0	0	0	STON/O2. 3101282	7.9250	NaN	S
4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	0	113803	53.1000	C123	S
5	0	3	Allen, Mr. William Henry	male	35.0	0	0	373450	8.0500	NaN	S

	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked	Title
PassengerId
444	1	2	Reynaldo, Ms. Encarnacion	female	28.0	0	0	230434	13.0	NaN	S	Ms

		Survived	Pclass	Name	Sex	Age	SibSp	Parch	Fare	Cabin	Embarked	Title	PTitle	Child	TicketSize	AdjFare	LogFare	AdjTitle	FamSize
Ticket	PassengerId
110152	258	1	1	Cherry, Miss. Gladys	female	30.00	0	0	86.5000	B77	S	Miss	Miss	False	3	28.833333	3.395626	Miss	0
	505	1	1	Maioni, Miss. Roberta	female	16.00	0	0	86.5000	B79	S	Miss	Miss	False	3	28.833333	3.395626	Miss	0
	760	1	1	Rothes, the Countess. of (Lucy Noel Martha Dye...	female	33.00	0	0	86.5000	B77	S	the Countess	fnoble	False	3	28.833333	3.395626	fnoble	0
110413	263	0	1	Taussig, Mr. Emil	male	52.00	1	1	79.6500	E67	S	Mr	Mr	False	3	26.550000	3.316003	Mr	2
	559	1	1	Taussig, Mrs. Emil (Tillie Mandelbaum)	female	39.00	1	1	79.6500	E67	S	Mrs	Mrs	False	3	26.550000	3.316003	Mrs	2
	586	1	1	Taussig, Miss. Ruth	female	18.00	0	2	79.6500	E68	S	Miss	Miss	False	3	26.550000	3.316003	Miss	2
110465	111	0	1	Porter, Mr. Walter Chamberlain	male	47.00	0	0	52.0000	C110	S	Mr	Mr	False	2	26.000000	3.295837	Mr	0
110465	476	0	1	Clifford, Mr. George Quincy	male	NaN	0	0	52.0000	A14	S	Mr	Mr	False	2	26.000000	3.295837	Mr	0
111361	330	1	1	Hippach, Miss. Jean Gertrude	female	16.00	0	1	57.9792	B18	C	Miss	Miss	False	2	28.989600	3.400851	Miss	1
111361	524	1	1	Hippach, Mrs. Louis Albert (Ida Sophia Fischer)	female	44.00	0	1	57.9792	B18	C	Mrs	Mrs	False	2	28.989600	3.400851	Mrs	1
113505	167	1	1	Chibnall, Mrs. (Edith Martha Bowerman)	female	NaN	0	1	55.0000	E33	S	Mrs	Mrs	False	2	27.500000	3.349904	Mrs	1
113505	357	1	1	Bowerman, Miss. Elsie Edith	female	22.00	0	1	55.0000	E33	S	Miss	Miss	False	2	27.500000	3.349904	Miss	1
113572	62	1	1	Icard, Miss. Amelie	female	38.00	0	0	80.0000	B28	NaN	Miss	Miss	False	2	40.000000	3.713572	Miss	0
113572	830	1	1	Stone, Mrs. George Nelson (Martha Evelyn)	female	62.00	0	0	80.0000	B28	NaN	Mrs	Mrs	False	2	40.000000	3.713572	Mrs	0
113760	391	1	1	Carter, Mr. William Ernest	male	36.00	1	2	120.0000	B96 B98	S	Mr	Mr	False	4	30.000000	3.433987	Mr	3
	436	1	1	Carter, Miss. Lucile Polk	female	14.00	1	2	120.0000	B96 B98	S	Miss	Miss	False	4	30.000000	3.433987	Miss	3
	764	1	1	Carter, Mrs. William Ernest (Lucile Polk)	female	36.00	1	2	120.0000	B96 B98	S	Mrs	Mrs	False	4	30.000000	3.433987	Mrs	3
	803	1	1	Carter, Master. William Thornton II	male	11.00	1	2	120.0000	B96 B98	S	Master	Master	True	4	30.000000	3.433987	Master	3
113776	152	1	1	Pears, Mrs. Thomas (Edith Wearne)	female	22.00	1	0	66.6000	C2	S	Mrs	Mrs	False	2	33.300000	3.535145	Mrs	1
113776	337	0	1	Pears, Mr. Thomas Clinton	male	29.00	1	0	66.6000	C2	S	Mr	Mr	False	2	33.300000	3.535145	Mr	1
113781	298	0	1	Allison, Miss. Helen Loraine	female	2.00	1	2	151.5500	C22 C26	S	Miss	Miss	True	4	37.887500	3.660673	Miss	3
	306	1	1	Allison, Master. Hudson Trevor	male	0.92	1	2	151.5500	C22 C26	S	Master	Master	True	4	37.887500	3.660673	Master	3
	499	0	1	Allison, Mrs. Hudson J C (Bessie Waldo Daniels)	female	25.00	1	2	151.5500	C22 C26	S	Mrs	Mrs	False	4	37.887500	3.660673	Mrs	3
	709	1	1	Cleaver, Miss. Alice	female	22.00	0	0	151.5500	NaN	S	Miss	Miss	False	4	37.887500	3.660673	Miss	0
113789	36	0	1	Holverson, Mr. Alexander Oskar	male	42.00	1	0	52.0000	NaN	S	Mr	Mr	False	2	26.000000	3.295837	Mr	1
113789	384	1	1	Holverson, Mrs. Alexander Oskar (Mary Aline To...	female	35.00	1	0	52.0000	NaN	S	Mrs	Mrs	False	2	26.000000	3.295837	Mrs	1
113798	271	0	1	Cairns, Mr. Alexander	male	NaN	0	0	31.0000	NaN	S	Mr	Mr	False	2	15.500000	2.803360	Mr	0
113798	843	1	1	Serepeca, Miss. Augusta	female	30.00	0	0	31.0000	NaN	C	Miss	Miss	False	2	15.500000	2.803360	Miss	0
113803	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.00	1	0	53.1000	C123	S	Mrs	Mrs	False	2	26.550000	3.316003	Mrs	1
113803	138	0	1	Futrelle, Mr. Jacques Heath	male	37.00	1	0	53.1000	C123	S	Mr	Mr	False	2	26.550000	3.316003	Mr	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
PC 17758	506	0	1	Penasco y Castellana, Mr. Victor de Satode	male	18.00	1	0	108.9000	C65	C	Mr	Mr	False	2	54.450000	4.015482	Mr	1
PC 17760	270	1	1	Bissette, Miss. Amelia	female	35.00	0	0	135.6333	C99	S	Miss	Miss	False	3	45.211100	3.833220	Miss	0
	326	1	1	Young, Miss. Marie Grice	female	36.00	0	0	135.6333	C32	C	Miss	Miss	False	3	45.211100	3.833220	Miss	0
	374	0	1	Ringhini, Mr. Sante	male	22.00	0	0	135.6333	NaN	C	Mr	Mr	False	3	45.211100	3.833220	Mr	0
PC 17761	538	1	1	LeRoy, Miss. Bertha	female	30.00	0	0	106.4250	NaN	C	Miss	Miss	False	2	53.212500	3.992912	Miss	0
PC 17761	545	0	1	Douglas, Mr. Walter Donald	male	50.00	1	0	106.4250	C86	C	Mr	Mr	False	2	53.212500	3.992912	Mr	1
PP 9549	11	1	3	Sandstrom, Miss. Marguerite Rut	female	4.00	1	1	16.7000	G6	S	Miss	Miss	True	2	8.350000	2.235376	Miss	2
PP 9549	395	1	3	Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengt...	female	24.00	0	2	16.7000	G6	S	Mrs	Mrs	False	2	8.350000	2.235376	Mrs	2
S.C./PARIS 2079	818	0	2	Mallet, Mr. Albert	male	31.00	1	1	37.0042	NaN	C	Mr	Mr	False	2	18.502100	2.970522	Mr	2
S.C./PARIS 2079	828	1	2	Mallet, Master. Andre	male	1.00	0	2	37.0042	NaN	C	Master	Master	True	2	18.502100	2.970522	Master	2
S.O./P.P. 3	773	0	2	Mack, Mrs. (Mary)	female	57.00	0	0	10.5000	E77	S	Mrs	Mrs	False	2	5.250000	1.832581	Mrs	0
S.O./P.P. 3	842	0	2	Mudd, Mr. Thomas Charles	male	16.00	0	0	10.5000	NaN	S	Mr	Mr	False	2	5.250000	1.832581	Mr	0
S.O.C. 14879	73	0	2	Hood, Mr. Ambrose Jr	male	21.00	0	0	73.5000	NaN	S	Mr	Mr	False	5	14.700000	2.753661	Mr	0
	121	0	2	Hickman, Mr. Stanley George	male	21.00	2	0	73.5000	NaN	S	Mr	Mr	False	5	14.700000	2.753661	Mr	2
	386	0	2	Davies, Mr. Charles Henry	male	18.00	0	0	73.5000	NaN	S	Mr	Mr	False	5	14.700000	2.753661	Mr	0
	656	0	2	Hickman, Mr. Leonard Mark	male	24.00	2	0	73.5000	NaN	S	Mr	Mr	False	5	14.700000	2.753661	Mr	2
	666	0	2	Hickman, Mr. Lewis	male	32.00	2	0	73.5000	NaN	S	Mr	Mr	False	5	14.700000	2.753661	Mr	2
SC/Paris 2123	44	1	2	Laroche, Miss. Simonne Marie Anne Andree	female	3.00	1	2	41.5792	NaN	C	Miss	Miss	True	3	13.859733	2.698655	Miss	3
	609	1	2	Laroche, Mrs. Joseph (Juliette Marie Louise La...	female	22.00	1	2	41.5792	NaN	C	Mrs	Mrs	False	3	13.859733	2.698655	Mrs	3
	686	0	2	Laroche, Mr. Joseph Philippe Lemercier	male	25.00	1	2	41.5792	NaN	C	Mr	Mr	False	3	13.859733	2.698655	Mr	3
STON/O2. 3101279	143	1	3	Hakkarainen, Mrs. Pekka Pietari (Elin Matilda ...	female	24.00	1	0	15.8500	NaN	S	Mrs	Mrs	False	2	7.925000	2.188856	Mrs	1
STON/O2. 3101279	404	0	3	Hakkarainen, Mr. Pekka Pietari	male	28.00	1	0	15.8500	NaN	S	Mr	Mr	False	2	7.925000	2.188856	Mr	1
W./C. 6607	784	0	3	Johnston, Mr. Andrew G	male	NaN	1	2	23.4500	NaN	S	Mr	Mr	False	2	11.725000	2.543569	Mr	3
W./C. 6607	889	0	3	Johnston, Miss. Catherine Helen "Carrie"	female	NaN	1	2	23.4500	NaN	S	Miss	Miss	False	2	11.725000	2.543569	Miss	3
W./C. 6608	87	0	3	Ford, Mr. William Neal	male	16.00	1	3	34.3750	NaN	S	Mr	Mr	False	4	8.593750	2.261112	Mr	4
	148	0	3	Ford, Miss. Robina Maggie "Ruby"	female	9.00	2	2	34.3750	NaN	S	Miss	Miss	True	4	8.593750	2.261112	Miss	4
	437	0	3	Ford, Miss. Doolina Margaret "Daisy"	female	21.00	2	2	34.3750	NaN	S	Miss	Miss	False	4	8.593750	2.261112	Miss	4
	737	0	3	Ford, Mrs. Edward (Margaret Ann Watson)	female	48.00	1	3	34.3750	NaN	S	Mrs	Mrs	False	4	8.593750	2.261112	Mrs	4
WE/P 5735	541	1	1	Crosby, Miss. Harriet R	female	36.00	0	2	71.0000	B22	S	Miss	Miss	False	2	35.500000	3.597312	Miss	2
WE/P 5735	746	0	1	Crosby, Capt. Edward Gifford	male	70.00	1	1	71.0000	B22	S	Capt	mil	False	2	35.500000	3.597312	mil	2

	Fare	Cabin	Embarked	FamSize
count	134.000000	134.000000	134.000000	134.000000
mean	1.007463	0.537313	1.007463	1.201493
std	0.086387	0.742448	0.150001	0.402620
min	1.000000	0.000000	0.000000	1.000000
25%	1.000000	0.000000	1.000000	1.000000
50%	1.000000	0.000000	1.000000	1.000000
75%	1.000000	1.000000	1.000000	1.000000
max	2.000000	3.000000	2.000000	2.000000

		Survived	Pclass	Name	Sex	Age	SibSp	Parch	Fare	Cabin	Embarked	Title	PTitle	Child	TicketSize	AdjFare	LogFare	AdjTitle	FamSize
Ticket	PassengerId
7534	139	0	3	Osen, Mr. Olaf Elon	male	16.0	0	0	9.2167	NaN	S	Mr	Mr	False	2	4.60835	1.724257	Mr	0
7534	877	0	3	Gustafsson, Mr. Alfred Ossian	male	20.0	0	0	9.8458	NaN	S	Mr	Mr	False	2	4.92290	1.778826	Mr	0

	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked	Title	PTitle	Child	TicketSize	AdjFare	LogFare	AdjTitle	FamSize
PassengerId
28	0	1	Fortune, Mr. Charles Alexander	male	19.0	3	2	19950	263.0000	C23 C25 C27	S	Mr	Mr	False	4	65.750000	4.200954	Mr	5
89	1	1	Fortune, Miss. Mabel Helen	female	23.0	3	2	19950	263.0000	C23 C25 C27	S	Miss	Miss	False	4	65.750000	4.200954	Miss	5
119	0	1	Baxter, Mr. Quigg Edmond	male	24.0	0	1	PC 17558	247.5208	B58 B60	C	Mr	Mr	False	2	123.760400	4.826395	Mr	1
259	1	1	Ward, Miss. Anna	female	35.0	0	0	PC 17755	512.3292	NaN	C	Miss	Miss	False	3	170.776400	5.146194	Miss	0
300	1	1	Baxter, Mrs. James (Helene DeLaudeniere Chaput)	female	50.0	0	1	PC 17558	247.5208	B58 B60	C	Mrs	Mrs	False	2	123.760400	4.826395	Mrs	1
312	1	1	Ryerson, Miss. Emily Borie	female	18.0	2	2	PC 17608	262.3750	B57 B59 B63 B66	C	Miss	Miss	False	2	131.187500	4.884221	Miss	4
342	1	1	Fortune, Miss. Alice Elizabeth	female	24.0	3	2	19950	263.0000	C23 C25 C27	S	Miss	Miss	False	4	65.750000	4.200954	Miss	5
378	0	1	Widener, Mr. Harry Elkins	male	27.0	0	2	113503	211.5000	C82	C	Mr	Mr	False	1	211.500000	5.358942	Mr	2
381	1	1	Bidois, Miss. Rosalie	female	42.0	0	0	PC 17757	227.5250	NaN	C	Miss	Miss	False	4	56.881250	4.058393	Miss	0
439	0	1	Fortune, Mr. Mark	male	64.0	1	4	19950	263.0000	C23 C25 C27	S	Mr	Mr	False	4	65.750000	4.200954	Mr	5
528	0	1	Farthing, Mr. John	male	NaN	0	0	PC 17483	221.7792	C95	S	Mr	Mr	False	1	221.779200	5.406181	Mr	0
558	0	1	Robbins, Mr. Victor	male	NaN	0	0	PC 17757	227.5250	NaN	C	Mr	Mr	False	4	56.881250	4.058393	Mr	0
680	1	1	Cardeza, Mr. Thomas Drake Martinez	male	36.0	0	1	PC 17755	512.3292	B51 B53 B55	C	Mr	Mr	False	3	170.776400	5.146194	Mr	1
690	1	1	Madill, Miss. Georgette Alexandra	female	15.0	0	1	24160	211.3375	B5	S	Miss	Miss	False	3	70.445833	4.268940	Miss	1
701	1	1	Astor, Mrs. John Jacob (Madeleine Talmadge Force)	female	18.0	1	0	PC 17757	227.5250	C62 C64	C	Mrs	Mrs	False	4	56.881250	4.058393	Mrs	1
717	1	1	Endres, Miss. Caroline Louise	female	38.0	0	0	PC 17757	227.5250	C45	C	Miss	Miss	False	4	56.881250	4.058393	Miss	0
731	1	1	Allen, Miss. Elisabeth Walton	female	29.0	0	0	24160	211.3375	B5	S	Miss	Miss	False	3	70.445833	4.268940	Miss	0
738	1	1	Lesurer, Mr. Gustave J	male	35.0	0	0	PC 17755	512.3292	B101	C	Mr	Mr	False	3	170.776400	5.146194	Mr	0
743	1	1	Ryerson, Miss. Susan Parker "Suzette"	female	21.0	2	2	PC 17608	262.3750	B57 B59 B63 B66	C	Miss	Miss	False	2	131.187500	4.884221	Miss	4
780	1	1	Robert, Mrs. Edward Scott (Elisabeth Walton Mc...	female	43.0	0	1	24160	211.3375	B3	S	Mrs	Mrs	False	3	70.445833	4.268940	Mrs	1

		Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Embarked	CabinKnown
Cabin	PassengerId
B18	330	1	1	Hippach, Miss. Jean Gertrude	female	16.0	0	1	111361	57.9792	C	True
B18	524	1	1	Hippach, Mrs. Louis Albert (Ida Sophia Fischer)	female	44.0	0	1	111361	57.9792	C	True
B20	691	1	1	Dick, Mr. Albert Adrian	male	31.0	1	0	17474	57.0000	S	True
B20	782	1	1	Dick, Mrs. Albert Adrian (Vera Gillespie)	female	17.0	1	0	17474	57.0000	S	True
B22	541	1	1	Crosby, Miss. Harriet R	female	36.0	0	2	WE/P 5735	71.0000	S	True
B22	746	0	1	Crosby, Capt. Edward Gifford	male	70.0	1	1	WE/P 5735	71.0000	S	True
B28	62	1	1	Icard, Miss. Amelie	female	38.0	0	0	113572	80.0000	NaN	True
B28	830	1	1	Stone, Mrs. George Nelson (Martha Evelyn)	female	62.0	0	0	113572	80.0000	NaN	True
B35	370	1	1	Aubart, Mme. Leontine Pauline	female	24.0	0	0	PC 17477	69.3000	C	True
B35	642	1	1	Sagesser, Mlle. Emma	female	24.0	0	0	PC 17477	69.3000	C	True
B49	292	1	1	Bishop, Mrs. Dickinson H (Helen Walton)	female	19.0	1	0	11967	91.0792	C	True
B49	485	1	1	Bishop, Mr. Dickinson H	male	25.0	1	0	11967	91.0792	C	True
B5	690	1	1	Madill, Miss. Georgette Alexandra	female	15.0	0	1	24160	211.3375	S	True
B5	731	1	1	Allen, Miss. Elisabeth Walton	female	29.0	0	0	24160	211.3375	S	True
B51 B53 B55	680	1	1	Cardeza, Mr. Thomas Drake Martinez	male	36.0	0	1	PC 17755	512.3292	C	True
B51 B53 B55	873	0	1	Carlsson, Mr. Frans Olof	male	33.0	0	0	695	5.0000	S	True
B57 B59 B63 B66	312	1	1	Ryerson, Miss. Emily Borie	female	18.0	2	2	PC 17608	262.3750	C	True
B57 B59 B63 B66	743	1	1	Ryerson, Miss. Susan Parker "Suzette"	female	21.0	2	2	PC 17608	262.3750	C	True
B58 B60	119	0	1	Baxter, Mr. Quigg Edmond	male	24.0	0	1	PC 17558	247.5208	C	True
B58 B60	300	1	1	Baxter, Mrs. James (Helene DeLaudeniere Chaput)	female	50.0	0	1	PC 17558	247.5208	C	True
B77	258	1	1	Cherry, Miss. Gladys	female	30.0	0	0	110152	86.5000	S	True
B77	760	1	1	Rothes, the Countess. of (Lucy Noel Martha Dye...	female	33.0	0	0	110152	86.5000	S	True
B96 B98	391	1	1	Carter, Mr. William Ernest	male	36.0	1	2	113760	120.0000	S	True
	436	1	1	Carter, Miss. Lucile Polk	female	14.0	1	2	113760	120.0000	S	True
	764	1	1	Carter, Mrs. William Ernest (Lucile Polk)	female	36.0	1	2	113760	120.0000	S	True
	803	1	1	Carter, Master. William Thornton II	male	11.0	1	2	113760	120.0000	S	True
C123	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	0	113803	53.1000	S	True
C123	138	0	1	Futrelle, Mr. Jacques Heath	male	37.0	1	0	113803	53.1000	S	True
C124	332	0	1	Partner, Mr. Austen	male	45.5	0	0	113043	28.5000	S	True
C124	712	0	1	Klaber, Mr. Herman	male	NaN	0	0	113028	26.5500	S	True
...	...	...	...	...	...	...	...	...	...	...	...	...
E101	304	1	2	Keane, Miss. Nora A	female	NaN	0	0	226593	12.3500	Q	True
E101	718	1	2	Troutt, Miss. Edwina Celia "Winnie"	female	27.0	0	0	34218	10.5000	S	True
E121	752	1	3	Moor, Master. Meier	male	6.0	0	1	392096	12.4750	S	True
E121	824	1	3	Moor, Mrs. (Beila)	female	27.0	0	1	392096	12.4750	S	True
E24	702	1	1	Silverthorne, Mr. Spencer Victor	male	35.0	0	0	PC 17475	26.2875	S	True
E24	708	1	1	Calderhead, Mr. Edward Pennington	male	42.0	0	0	PC 17476	26.2875	S	True
E25	513	1	1	McGough, Mr. James Robert	male	36.0	0	0	PC 17473	26.2875	S	True
E25	573	1	1	Flynn, Mr. John Irwin ("Irving")	male	36.0	0	0	PC 17474	26.3875	S	True
E33	167	1	1	Chibnall, Mrs. (Edith Martha Bowerman)	female	NaN	0	1	113505	55.0000	S	True
E33	357	1	1	Bowerman, Miss. Elsie Edith	female	22.0	0	1	113505	55.0000	S	True
E44	435	0	1	Silvey, Mr. William Baird	male	50.0	1	0	13507	55.9000	S	True
E44	578	1	1	Silvey, Mrs. William Baird (Alice Munger)	female	39.0	1	0	13507	55.9000	S	True
E67	263	0	1	Taussig, Mr. Emil	male	52.0	1	1	110413	79.6500	S	True
E67	559	1	1	Taussig, Mrs. Emil (Tillie Mandelbaum)	female	39.0	1	1	110413	79.6500	S	True
E8	725	1	1	Chambers, Mr. Norman Campbell	male	27.0	1	0	113806	53.1000	S	True
E8	810	1	1	Chambers, Mrs. Norman Campbell (Bertha Griggs)	female	33.0	1	0	113806	53.1000	S	True
F G73	76	0	3	Moen, Mr. Sigurd Hansen	male	25.0	0	0	348123	7.6500	S	True
F G73	716	0	3	Soholt, Mr. Peter Andreas Lauritz Andersen	male	19.0	0	0	348124	7.6500	S	True
F2	149	0	2	Navratil, Mr. Michel ("Louis M Hoffman")	male	36.5	0	2	230080	26.0000	S	True
	194	1	2	Navratil, Master. Michel M	male	3.0	1	1	230080	26.0000	S	True
	341	1	2	Navratil, Master. Edmond Roger	male	2.0	1	1	230080	26.0000	S	True
F33	67	1	2	Nye, Mrs. (Elizabeth Ramell)	female	29.0	0	0	C.A. 29395	10.5000	S	True
	346	1	2	Brown, Miss. Amelia "Mildred"	female	24.0	0	0	248733	13.0000	S	True
	517	1	2	Lemore, Mrs. (Amelia Milley)	female	34.0	0	0	C.A. 34260	10.5000	S	True
F4	184	1	2	Becker, Master. Richard F	male	1.0	2	1	230136	39.0000	S	True
F4	619	1	2	Becker, Miss. Marion Louise	female	4.0	2	1	230136	39.0000	S	True
G6	11	1	3	Sandstrom, Miss. Marguerite Rut	female	4.0	1	1	PP 9549	16.7000	S	True
	206	0	3	Strom, Miss. Telma Matilda	female	2.0	0	1	347054	10.4625	S	True
	252	0	3	Strom, Mrs. Wilhelm (Elna Matilda Persson)	female	29.0	1	1	347054	10.4625	S	True
	395	1	3	Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengt...	female	24.0	0	2	PP 9549	16.7000	S	True