Import Libraries
The Data
Missing Data
- Cufflinks for plots
Data Cleaning
Converting Categorical Features
Train Test Split
Training and Predicting
Evaluation
Great Job!
Validate against test dataset

Logistic Regression with Python

For this lecture we will be working with the Titanic Data Set from Kaggle. This is a very famous data set and very often is a student's first step in machine learning!

We'll be trying to predict a classification- survival or deceased. Let's begin our understanding of implementing Logistic Regression in Python for classification.

We'll use a "semi-cleaned" version of the titanic data set, if you use the data set hosted directly on Kaggle, you may need to do some additional cleaning not shown in this lecture notebook.

Import Libraries

Let's import some libraries to get started!



In [1]:

    
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

The Data

Let's start by reading in the titanic_train.csv file into a pandas dataframe.



In [2]:

    
train = pd.read_csv('titanic_train.csv')



In [3]:

    
train.head(25)









    Out[3]:







  
    
      
      PassengerId
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Ticket
      Fare
      Cabin
      Embarked
    
  
  
    
      0
      1
      0
      3
      Braund, Mr. Owen Harris
      male
      22.0
      1
      0
      A/5 21171
      7.2500
      NaN
      S
    
    
      1
      2
      1
      1
      Cumings, Mrs. John Bradley (Florence Briggs Th...
      female
      38.0
      1
      0
      PC 17599
      71.2833
      C85
      C
    
    
      2
      3
      1
      3
      Heikkinen, Miss. Laina
      female
      26.0
      0
      0
      STON/O2. 3101282
      7.9250
      NaN
      S
    
    
      3
      4
      1
      1
      Futrelle, Mrs. Jacques Heath (Lily May Peel)
      female
      35.0
      1
      0
      113803
      53.1000
      C123
      S
    
    
      4
      5
      0
      3
      Allen, Mr. William Henry
      male
      35.0
      0
      0
      373450
      8.0500
      NaN
      S
    
    
      5
      6
      0
      3
      Moran, Mr. James
      male
      NaN
      0
      0
      330877
      8.4583
      NaN
      Q
    
    
      6
      7
      0
      1
      McCarthy, Mr. Timothy J
      male
      54.0
      0
      0
      17463
      51.8625
      E46
      S
    
    
      7
      8
      0
      3
      Palsson, Master. Gosta Leonard
      male
      2.0
      3
      1
      349909
      21.0750
      NaN
      S
    
    
      8
      9
      1
      3
      Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
      female
      27.0
      0
      2
      347742
      11.1333
      NaN
      S
    
    
      9
      10
      1
      2
      Nasser, Mrs. Nicholas (Adele Achem)
      female
      14.0
      1
      0
      237736
      30.0708
      NaN
      C
    
    
      10
      11
      1
      3
      Sandstrom, Miss. Marguerite Rut
      female
      4.0
      1
      1
      PP 9549
      16.7000
      G6
      S
    
    
      11
      12
      1
      1
      Bonnell, Miss. Elizabeth
      female
      58.0
      0
      0
      113783
      26.5500
      C103
      S
    
    
      12
      13
      0
      3
      Saundercock, Mr. William Henry
      male
      20.0
      0
      0
      A/5. 2151
      8.0500
      NaN
      S
    
    
      13
      14
      0
      3
      Andersson, Mr. Anders Johan
      male
      39.0
      1
      5
      347082
      31.2750
      NaN
      S
    
    
      14
      15
      0
      3
      Vestrom, Miss. Hulda Amanda Adolfina
      female
      14.0
      0
      0
      350406
      7.8542
      NaN
      S
    
    
      15
      16
      1
      2
      Hewlett, Mrs. (Mary D Kingcome)
      female
      55.0
      0
      0
      248706
      16.0000
      NaN
      S
    
    
      16
      17
      0
      3
      Rice, Master. Eugene
      male
      2.0
      4
      1
      382652
      29.1250
      NaN
      Q
    
    
      17
      18
      1
      2
      Williams, Mr. Charles Eugene
      male
      NaN
      0
      0
      244373
      13.0000
      NaN
      S
    
    
      18
      19
      0
      3
      Vander Planke, Mrs. Julius (Emelia Maria Vande...
      female
      31.0
      1
      0
      345763
      18.0000
      NaN
      S
    
    
      19
      20
      1
      3
      Masselmani, Mrs. Fatima
      female
      NaN
      0
      0
      2649
      7.2250
      NaN
      C
    
    
      20
      21
      0
      2
      Fynney, Mr. Joseph J
      male
      35.0
      0
      0
      239865
      26.0000
      NaN
      S
    
    
      21
      22
      1
      2
      Beesley, Mr. Lawrence
      male
      34.0
      0
      0
      248698
      13.0000
      D56
      S
    
    
      22
      23
      1
      3
      McGowan, Miss. Anna "Annie"
      female
      15.0
      0
      0
      330923
      8.0292
      NaN
      Q
    
    
      23
      24
      1
      1
      Sloper, Mr. William Thompson
      male
      28.0
      0
      0
      113788
      35.5000
      A6
      S
    
    
      24
      25
      0
      3
      Palsson, Miss. Torborg Danira
      female
      8.0
      3
      1
      349909
      21.0750
      NaN
      S

Exploratory Data Analysis

Let's begin some exploratory data analysis! We'll start by checking out missing data!

Missing Data

We can use seaborn to create a simple heatmap to see where we are missing data!



In [4]:

    
sns.heatmap(train.isnull(),yticklabels=False,cbar=False,cmap='viridis')









    Out[4]:





<matplotlib.axes._subplots.AxesSubplot at 0x1a166095f8>

Roughly 20 percent of the Age data is missing. The proportion of Age missing is likely small enough for reasonable replacement with some form of imputation. Looking at the Cabin column, it looks like we are just missing too much of that data to do something useful with at a basic level. We'll probably drop this later, or change it to another feature like "Cabin Known: 1 or 0"

Let's continue on by visualizing some more of the data! Check out the video for full explanations over these plots, this code is just to serve as reference.



In [5]:

    
sns.set_style('whitegrid')
sns.countplot(x='Survived',data=train,palette='RdBu_r')









    Out[5]:





<matplotlib.axes._subplots.AxesSubplot at 0x1a16bbd240>



In [9]:

    
# sns.set_style('whitegrid')
sns.countplot(x='Survived',hue='Sex',data=train,palette='RdBu_r')









    Out[9]:





<matplotlib.axes._subplots.AxesSubplot at 0x2550eaab748>



In [10]:

    
# sns.set_style('whitegrid')
sns.countplot(x='Survived',hue='Pclass',data=train,palette='rainbow')









    Out[10]:





<matplotlib.axes._subplots.AxesSubplot at 0x2550fbbb358>



In [11]:

    
sns.distplot(train['Age'].dropna(),kde=False,color='darkred',bins=30)









    Out[11]:





<matplotlib.axes._subplots.AxesSubplot at 0x2550fc64748>



In [12]:

    
train['Age'].hist(bins=30,color='darkred',alpha=0.7)









    Out[12]:





<matplotlib.axes._subplots.AxesSubplot at 0x2550fde4b70>



In [13]:

    
sns.countplot(x='SibSp',data=train)









    Out[13]:





<matplotlib.axes._subplots.AxesSubplot at 0x2550ff16208>



In [14]:

    
train['Fare'].hist(color='green',bins=40,figsize=(8,4))









    Out[14]:





<matplotlib.axes._subplots.AxesSubplot at 0x2550ff817b8>

Cufflinks for plots

Let's take a quick moment to show an example of cufflinks!



In [15]:

    
import plotly_express as pex



In [17]:

    
pex.histogram(data_frame=train, x='Fare', nbins=30)

Data Cleaning

We want to fill in missing age data instead of just dropping the missing age data rows. One way to do this is by filling in the mean age of all the passengers (imputation). However we can be smarter about this and check the average age by passenger class. For example:



In [6]:

    
plt.figure(figsize=(12, 7))
sns.boxplot(x='Pclass',y='Age',data=train,palette='winter')









    Out[6]:





<matplotlib.axes._subplots.AxesSubplot at 0x1a16bfa7b8>

We can see the wealthier passengers in the higher classes tend to be older, which makes sense. We'll use these average age values to impute based on Pclass for Age.



In [7]:

    
def impute_age(cols):
    Age = cols[0]
    Pclass = cols[1]
    
    if pd.isnull(Age):

        if Pclass == 1:
            return 37

        elif Pclass == 2:
            return 29

        else:
            return 24

    else:
        return Age

Now apply that function!



In [8]:

    
train['Age'] = train[['Age','Pclass']].apply(impute_age,axis=1)

Now let's check that heat map again!



In [9]:

    
sns.heatmap(train.isnull(),yticklabels=False,cbar=False,cmap='viridis')









    Out[9]:





<matplotlib.axes._subplots.AxesSubplot at 0x1a16c95908>

Great! Let's go ahead and drop the Cabin column and the row in Embarked that is NaN.



In [10]:

    
train.drop('Cabin',axis=1,inplace=True)



In [11]:

    
train.head(50)









    Out[11]:







  
    
      
      PassengerId
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Ticket
      Fare
      Embarked
    
  
  
    
      0
      1
      0
      3
      Braund, Mr. Owen Harris
      male
      22.0
      1
      0
      A/5 21171
      7.2500
      S
    
    
      1
      2
      1
      1
      Cumings, Mrs. John Bradley (Florence Briggs Th...
      female
      38.0
      1
      0
      PC 17599
      71.2833
      C
    
    
      2
      3
      1
      3
      Heikkinen, Miss. Laina
      female
      26.0
      0
      0
      STON/O2. 3101282
      7.9250
      S
    
    
      3
      4
      1
      1
      Futrelle, Mrs. Jacques Heath (Lily May Peel)
      female
      35.0
      1
      0
      113803
      53.1000
      S
    
    
      4
      5
      0
      3
      Allen, Mr. William Henry
      male
      35.0
      0
      0
      373450
      8.0500
      S
    
    
      5
      6
      0
      3
      Moran, Mr. James
      male
      24.0
      0
      0
      330877
      8.4583
      Q
    
    
      6
      7
      0
      1
      McCarthy, Mr. Timothy J
      male
      54.0
      0
      0
      17463
      51.8625
      S
    
    
      7
      8
      0
      3
      Palsson, Master. Gosta Leonard
      male
      2.0
      3
      1
      349909
      21.0750
      S
    
    
      8
      9
      1
      3
      Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
      female
      27.0
      0
      2
      347742
      11.1333
      S
    
    
      9
      10
      1
      2
      Nasser, Mrs. Nicholas (Adele Achem)
      female
      14.0
      1
      0
      237736
      30.0708
      C
    
    
      10
      11
      1
      3
      Sandstrom, Miss. Marguerite Rut
      female
      4.0
      1
      1
      PP 9549
      16.7000
      S
    
    
      11
      12
      1
      1
      Bonnell, Miss. Elizabeth
      female
      58.0
      0
      0
      113783
      26.5500
      S
    
    
      12
      13
      0
      3
      Saundercock, Mr. William Henry
      male
      20.0
      0
      0
      A/5. 2151
      8.0500
      S
    
    
      13
      14
      0
      3
      Andersson, Mr. Anders Johan
      male
      39.0
      1
      5
      347082
      31.2750
      S
    
    
      14
      15
      0
      3
      Vestrom, Miss. Hulda Amanda Adolfina
      female
      14.0
      0
      0
      350406
      7.8542
      S
    
    
      15
      16
      1
      2
      Hewlett, Mrs. (Mary D Kingcome)
      female
      55.0
      0
      0
      248706
      16.0000
      S
    
    
      16
      17
      0
      3
      Rice, Master. Eugene
      male
      2.0
      4
      1
      382652
      29.1250
      Q
    
    
      17
      18
      1
      2
      Williams, Mr. Charles Eugene
      male
      29.0
      0
      0
      244373
      13.0000
      S
    
    
      18
      19
      0
      3
      Vander Planke, Mrs. Julius (Emelia Maria Vande...
      female
      31.0
      1
      0
      345763
      18.0000
      S
    
    
      19
      20
      1
      3
      Masselmani, Mrs. Fatima
      female
      24.0
      0
      0
      2649
      7.2250
      C
    
    
      20
      21
      0
      2
      Fynney, Mr. Joseph J
      male
      35.0
      0
      0
      239865
      26.0000
      S
    
    
      21
      22
      1
      2
      Beesley, Mr. Lawrence
      male
      34.0
      0
      0
      248698
      13.0000
      S
    
    
      22
      23
      1
      3
      McGowan, Miss. Anna "Annie"
      female
      15.0
      0
      0
      330923
      8.0292
      Q
    
    
      23
      24
      1
      1
      Sloper, Mr. William Thompson
      male
      28.0
      0
      0
      113788
      35.5000
      S
    
    
      24
      25
      0
      3
      Palsson, Miss. Torborg Danira
      female
      8.0
      3
      1
      349909
      21.0750
      S
    
    
      25
      26
      1
      3
      Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...
      female
      38.0
      1
      5
      347077
      31.3875
      S
    
    
      26
      27
      0
      3
      Emir, Mr. Farred Chehab
      male
      24.0
      0
      0
      2631
      7.2250
      C
    
    
      27
      28
      0
      1
      Fortune, Mr. Charles Alexander
      male
      19.0
      3
      2
      19950
      263.0000
      S
    
    
      28
      29
      1
      3
      O'Dwyer, Miss. Ellen "Nellie"
      female
      24.0
      0
      0
      330959
      7.8792
      Q
    
    
      29
      30
      0
      3
      Todoroff, Mr. Lalio
      male
      24.0
      0
      0
      349216
      7.8958
      S
    
    
      30
      31
      0
      1
      Uruchurtu, Don. Manuel E
      male
      40.0
      0
      0
      PC 17601
      27.7208
      C
    
    
      31
      32
      1
      1
      Spencer, Mrs. William Augustus (Marie Eugenie)
      female
      37.0
      1
      0
      PC 17569
      146.5208
      C
    
    
      32
      33
      1
      3
      Glynn, Miss. Mary Agatha
      female
      24.0
      0
      0
      335677
      7.7500
      Q
    
    
      33
      34
      0
      2
      Wheadon, Mr. Edward H
      male
      66.0
      0
      0
      C.A. 24579
      10.5000
      S
    
    
      34
      35
      0
      1
      Meyer, Mr. Edgar Joseph
      male
      28.0
      1
      0
      PC 17604
      82.1708
      C
    
    
      35
      36
      0
      1
      Holverson, Mr. Alexander Oskar
      male
      42.0
      1
      0
      113789
      52.0000
      S
    
    
      36
      37
      1
      3
      Mamee, Mr. Hanna
      male
      24.0
      0
      0
      2677
      7.2292
      C
    
    
      37
      38
      0
      3
      Cann, Mr. Ernest Charles
      male
      21.0
      0
      0
      A./5. 2152
      8.0500
      S
    
    
      38
      39
      0
      3
      Vander Planke, Miss. Augusta Maria
      female
      18.0
      2
      0
      345764
      18.0000
      S
    
    
      39
      40
      1
      3
      Nicola-Yarred, Miss. Jamila
      female
      14.0
      1
      0
      2651
      11.2417
      C
    
    
      40
      41
      0
      3
      Ahlin, Mrs. Johan (Johanna Persdotter Larsson)
      female
      40.0
      1
      0
      7546
      9.4750
      S
    
    
      41
      42
      0
      2
      Turpin, Mrs. William John Robert (Dorothy Ann ...
      female
      27.0
      1
      0
      11668
      21.0000
      S
    
    
      42
      43
      0
      3
      Kraeff, Mr. Theodor
      male
      24.0
      0
      0
      349253
      7.8958
      C
    
    
      43
      44
      1
      2
      Laroche, Miss. Simonne Marie Anne Andree
      female
      3.0
      1
      2
      SC/Paris 2123
      41.5792
      C
    
    
      44
      45
      1
      3
      Devaney, Miss. Margaret Delia
      female
      19.0
      0
      0
      330958
      7.8792
      Q
    
    
      45
      46
      0
      3
      Rogers, Mr. William John
      male
      24.0
      0
      0
      S.C./A.4. 23567
      8.0500
      S
    
    
      46
      47
      0
      3
      Lennon, Mr. Denis
      male
      24.0
      1
      0
      370371
      15.5000
      Q
    
    
      47
      48
      1
      3
      O'Driscoll, Miss. Bridget
      female
      24.0
      0
      0
      14311
      7.7500
      Q
    
    
      48
      49
      0
      3
      Samaan, Mr. Youssef
      male
      24.0
      2
      0
      2662
      21.6792
      C
    
    
      49
      50
      0
      3
      Arnold-Franchi, Mrs. Josef (Josefine Franchi)
      female
      18.0
      1
      0
      349237
      17.8000
      S



In [12]:

    
train.shape









    Out[12]:





(891, 11)



In [13]:

    
train.dropna(inplace=True)



In [14]:

    
train.shape









    Out[14]:





(889, 11)

Converting Categorical Features

We'll need to convert categorical features to dummy variables using pandas! Otherwise our machine learning algorithm won't be able to directly take in those features as inputs.



In [15]:

    
train.info()









    



<class 'pandas.core.frame.DataFrame'>
Int64Index: 889 entries, 0 to 890
Data columns (total 11 columns):
PassengerId    889 non-null int64
Survived       889 non-null int64
Pclass         889 non-null int64
Name           889 non-null object
Sex            889 non-null object
Age            889 non-null float64
SibSp          889 non-null int64
Parch          889 non-null int64
Ticket         889 non-null object
Fare           889 non-null float64
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(4)
memory usage: 83.3+ KB



In [16]:

    
sex = pd.get_dummies(train['Sex'],drop_first=True)
embark = pd.get_dummies(train['Embarked'],drop_first=True)



In [17]:

    
embark.head()



In [18]:

    
sex.head()



In [19]:

    
train.drop(['Sex','Embarked','Name','Ticket'],axis=1,inplace=True)



In [20]:

    
train = pd.concat([train,sex,embark],axis=1)



In [21]:

    
train.head()









    Out[21]:







  
    
      
      PassengerId
      Survived
      Pclass
      Age
      SibSp
      Parch
      Fare
      male
      Q
      S
    
  
  
    
      0
      1
      0
      3
      22.0
      1
      0
      7.2500
      1
      0
      1
    
    
      1
      2
      1
      1
      38.0
      1
      0
      71.2833
      0
      0
      0
    
    
      2
      3
      1
      3
      26.0
      0
      0
      7.9250
      0
      0
      1
    
    
      3
      4
      1
      1
      35.0
      1
      0
      53.1000
      0
      0
      1
    
    
      4
      5
      0
      3
      35.0
      0
      0
      8.0500
      1
      0
      1

Great! Our data is ready for our model!

Building a Logistic Regression model

Let's start by splitting our data into a training set and test set (there is another test.csv file that you can play around with in case you want to use all this data for training).

Train Test Split



In [22]:

    
from sklearn.model_selection import train_test_split



In [23]:

    
X_train, X_test, y_train, y_test = train_test_split(train.drop('Survived',axis=1), 
                                                    train['Survived'], test_size=0.30, 
                                                    random_state=101)

Training and Predicting



In [24]:

    
from sklearn.linear_model import LogisticRegression



In [29]:

    
logmodel = LogisticRegression()
logmodel.verbose = 1



In [30]:

    
logmodel.fit(X_train,y_train)









    



[LibLinear]





    



/Users/atma6951/anaconda3/envs/pychakras/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)






    Out[30]:





LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=1, warm_start=False)



In [31]:

    
logmodel.coef_









    Out[31]:





array([[ 4.10170317e-04, -7.83334719e-01, -2.61257205e-02,
        -2.09907780e-01, -9.55518385e-02,  4.63201983e-03,
        -2.33696636e+00, -1.21716646e-02, -2.02780740e-01]])



In [32]:

    
logmodel.intercept_









    Out[32]:





array([3.36140356])



In [52]:

    
predictions = logmodel.predict(X_test)

Let's move on to evaluate our model!

Evaluation

We can check precision,recall,f1-score using classification report!



In [42]:

    
from sklearn.metrics import classification_report



In [43]:

    
print(classification_report(y_test,predictions))









    



             precision    recall  f1-score   support

          0       0.81      0.93      0.86       163
          1       0.85      0.65      0.74       104

avg / total       0.82      0.82      0.81       267

Not so bad! You might want to explore other feature engineering and the other titanic_text.csv file, some suggestions for feature engineering:

Try grabbing the Title (Dr.,Mr.,Mrs,etc..) from the name as a feature
Maybe the Cabin letter could be a feature
Is there any info you can get from the ticket?

Great Job!

Validate against test dataset



In [38]:

    
test_df = pd.read_csv('titanic_test.csv')
test_df.head()









    Out[38]:







  
    
      
      PassengerId
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Ticket
      Fare
      Cabin
      Embarked
    
  
  
    
      0
      892
      3
      Kelly, Mr. James
      male
      34.5
      0
      0
      330911
      7.8292
      NaN
      Q
    
    
      1
      893
      3
      Wilkes, Mrs. James (Ellen Needs)
      female
      47.0
      1
      0
      363272
      7.0000
      NaN
      S
    
    
      2
      894
      2
      Myles, Mr. Thomas Francis
      male
      62.0
      0
      0
      240276
      9.6875
      NaN
      Q
    
    
      3
      895
      3
      Wirz, Mr. Albert
      male
      27.0
      0
      0
      315154
      8.6625
      NaN
      S
    
    
      4
      896
      3
      Hirvonen, Mrs. Alexander (Helga E Lindqvist)
      female
      22.0
      1
      1
      3101298
      12.2875
      NaN
      S



In [39]:

    
test_df.shape









    Out[39]:





(418, 11)



In [48]:

    
test_df.iloc[0]









    Out[48]:





PassengerId                 892
Pclass                        3
Name           Kelly, Mr. James
Sex                        male
Age                        34.5
SibSp                         0
Parch                         0
Ticket                   330911
Fare                     7.8292
Cabin                       NaN
Embarked                      Q
Name: 0, dtype: object



In [ ]:

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	0	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	0	STON/O2. 3101282	7.9250	NaN	S
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	0	113803	53.1000	C123	S
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	0	373450	8.0500	NaN	S
5	6	0	3	Moran, Mr. James	male	NaN	0	0	330877	8.4583	NaN	Q
6	7	0	1	McCarthy, Mr. Timothy J	male	54.0	0	0	17463	51.8625	E46	S
7	8	0	3	Palsson, Master. Gosta Leonard	male	2.0	3	1	349909	21.0750	NaN	S
8	9	1	3	Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)	female	27.0	0	2	347742	11.1333	NaN	S
9	10	1	2	Nasser, Mrs. Nicholas (Adele Achem)	female	14.0	1	0	237736	30.0708	NaN	C
10	11	1	3	Sandstrom, Miss. Marguerite Rut	female	4.0	1	1	PP 9549	16.7000	G6	S
11	12	1	1	Bonnell, Miss. Elizabeth	female	58.0	0	0	113783	26.5500	C103	S
12	13	0	3	Saundercock, Mr. William Henry	male	20.0	0	0	A/5. 2151	8.0500	NaN	S
13	14	0	3	Andersson, Mr. Anders Johan	male	39.0	1	5	347082	31.2750	NaN	S
14	15	0	3	Vestrom, Miss. Hulda Amanda Adolfina	female	14.0	0	0	350406	7.8542	NaN	S
15	16	1	2	Hewlett, Mrs. (Mary D Kingcome)	female	55.0	0	0	248706	16.0000	NaN	S
16	17	0	3	Rice, Master. Eugene	male	2.0	4	1	382652	29.1250	NaN	Q
17	18	1	2	Williams, Mr. Charles Eugene	male	NaN	0	0	244373	13.0000	NaN	S
18	19	0	3	Vander Planke, Mrs. Julius (Emelia Maria Vande...	female	31.0	1	0	345763	18.0000	NaN	S
19	20	1	3	Masselmani, Mrs. Fatima	female	NaN	0	0	2649	7.2250	NaN	C
20	21	0	2	Fynney, Mr. Joseph J	male	35.0	0	0	239865	26.0000	NaN	S
21	22	1	2	Beesley, Mr. Lawrence	male	34.0	0	0	248698	13.0000	D56	S
22	23	1	3	McGowan, Miss. Anna "Annie"	female	15.0	0	0	330923	8.0292	NaN	Q
23	24	1	1	Sloper, Mr. William Thompson	male	28.0	0	0	113788	35.5000	A6	S
24	25	0	3	Palsson, Miss. Torborg Danira	female	8.0	3	1	349909	21.0750	NaN	S

	PassengerId	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	892	3	Kelly, Mr. James	male	34.5	0	0	330911	7.8292	NaN	Q
1	893	3	Wilkes, Mrs. James (Ellen Needs)	female	47.0	1	0	363272	7.0000	NaN	S
2	894	2	Myles, Mr. Thomas Francis	male	62.0	0	0	240276	9.6875	NaN	Q
3	895	3	Wirz, Mr. Albert	male	27.0	0	0	315154	8.6625	NaN	S
4	896	3	Hirvonen, Mrs. Alexander (Helga E Lindqvist)	female	22.0	1	1	3101298	12.2875	NaN	S

Table of Contents