Predict survival on the Titanic



In [ ]:

About:

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.

One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.

In this challenge, we ask you to complete the analysis of what sorts of people were likely to survive. In particular, we ask you to apply the tools of machine learning to predict which passengers survived the tragedy.



In [ ]:



In [ ]:



In [59]:

    
# #Import Python Libraries
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
%matplotlib inline



In [ ]:



In [10]:

    
# #Read the Training Dataset into a Pandas Dataframe
titanic_train = pd.read_csv("train.csv")



In [9]:

    
type(titanic_train)









    Out[9]:





pandas.core.frame.DataFrame



In [ ]:



In [20]:

    
# #Overview of the data
titanic_train









    Out[20]:






  
    
      
      PassengerId
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Ticket
      Fare
      Cabin
      Embarked
    
  
  
    
      0
      1
      0
      3
      Braund, Mr. Owen Harris
      male
      22
      1
      0
      A/5 21171
      7.2500
      NaN
      S
    
    
      1
      2
      1
      1
      Cumings, Mrs. John Bradley (Florence Briggs Th...
      female
      38
      1
      0
      PC 17599
      71.2833
      C85
      C
    
    
      2
      3
      1
      3
      Heikkinen, Miss. Laina
      female
      26
      0
      0
      STON/O2. 3101282
      7.9250
      NaN
      S
    
    
      3
      4
      1
      1
      Futrelle, Mrs. Jacques Heath (Lily May Peel)
      female
      35
      1
      0
      113803
      53.1000
      C123
      S
    
    
      4
      5
      0
      3
      Allen, Mr. William Henry
      male
      35
      0
      0
      373450
      8.0500
      NaN
      S
    
    
      5
      6
      0
      3
      Moran, Mr. James
      male
      NaN
      0
      0
      330877
      8.4583
      NaN
      Q
    
    
      6
      7
      0
      1
      McCarthy, Mr. Timothy J
      male
      54
      0
      0
      17463
      51.8625
      E46
      S
    
    
      7
      8
      0
      3
      Palsson, Master. Gosta Leonard
      male
      2
      3
      1
      349909
      21.0750
      NaN
      S
    
    
      8
      9
      1
      3
      Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
      female
      27
      0
      2
      347742
      11.1333
      NaN
      S
    
    
      9
      10
      1
      2
      Nasser, Mrs. Nicholas (Adele Achem)
      female
      14
      1
      0
      237736
      30.0708
      NaN
      C
    
    
      10
      11
      1
      3
      Sandstrom, Miss. Marguerite Rut
      female
      4
      1
      1
      PP 9549
      16.7000
      G6
      S
    
    
      11
      12
      1
      1
      Bonnell, Miss. Elizabeth
      female
      58
      0
      0
      113783
      26.5500
      C103
      S
    
    
      12
      13
      0
      3
      Saundercock, Mr. William Henry
      male
      20
      0
      0
      A/5. 2151
      8.0500
      NaN
      S
    
    
      13
      14
      0
      3
      Andersson, Mr. Anders Johan
      male
      39
      1
      5
      347082
      31.2750
      NaN
      S
    
    
      14
      15
      0
      3
      Vestrom, Miss. Hulda Amanda Adolfina
      female
      14
      0
      0
      350406
      7.8542
      NaN
      S
    
    
      15
      16
      1
      2
      Hewlett, Mrs. (Mary D Kingcome)
      female
      55
      0
      0
      248706
      16.0000
      NaN
      S
    
    
      16
      17
      0
      3
      Rice, Master. Eugene
      male
      2
      4
      1
      382652
      29.1250
      NaN
      Q
    
    
      17
      18
      1
      2
      Williams, Mr. Charles Eugene
      male
      NaN
      0
      0
      244373
      13.0000
      NaN
      S
    
    
      18
      19
      0
      3
      Vander Planke, Mrs. Julius (Emelia Maria Vande...
      female
      31
      1
      0
      345763
      18.0000
      NaN
      S
    
    
      19
      20
      1
      3
      Masselmani, Mrs. Fatima
      female
      NaN
      0
      0
      2649
      7.2250
      NaN
      C
    
    
      20
      21
      0
      2
      Fynney, Mr. Joseph J
      male
      35
      0
      0
      239865
      26.0000
      NaN
      S
    
    
      21
      22
      1
      2
      Beesley, Mr. Lawrence
      male
      34
      0
      0
      248698
      13.0000
      D56
      S
    
    
      22
      23
      1
      3
      McGowan, Miss. Anna "Annie"
      female
      15
      0
      0
      330923
      8.0292
      NaN
      Q
    
    
      23
      24
      1
      1
      Sloper, Mr. William Thompson
      male
      28
      0
      0
      113788
      35.5000
      A6
      S
    
    
      24
      25
      0
      3
      Palsson, Miss. Torborg Danira
      female
      8
      3
      1
      349909
      21.0750
      NaN
      S
    
    
      25
      26
      1
      3
      Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...
      female
      38
      1
      5
      347077
      31.3875
      NaN
      S
    
    
      26
      27
      0
      3
      Emir, Mr. Farred Chehab
      male
      NaN
      0
      0
      2631
      7.2250
      NaN
      C
    
    
      27
      28
      0
      1
      Fortune, Mr. Charles Alexander
      male
      19
      3
      2
      19950
      263.0000
      C23 C25 C27
      S
    
    
      28
      29
      1
      3
      O'Dwyer, Miss. Ellen "Nellie"
      female
      NaN
      0
      0
      330959
      7.8792
      NaN
      Q
    
    
      29
      30
      0
      3
      Todoroff, Mr. Lalio
      male
      NaN
      0
      0
      349216
      7.8958
      NaN
      S
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      861
      862
      0
      2
      Giles, Mr. Frederick Edward
      male
      21
      1
      0
      28134
      11.5000
      NaN
      S
    
    
      862
      863
      1
      1
      Swift, Mrs. Frederick Joel (Margaret Welles Ba...
      female
      48
      0
      0
      17466
      25.9292
      D17
      S
    
    
      863
      864
      0
      3
      Sage, Miss. Dorothy Edith "Dolly"
      female
      NaN
      8
      2
      CA. 2343
      69.5500
      NaN
      S
    
    
      864
      865
      0
      2
      Gill, Mr. John William
      male
      24
      0
      0
      233866
      13.0000
      NaN
      S
    
    
      865
      866
      1
      2
      Bystrom, Mrs. (Karolina)
      female
      42
      0
      0
      236852
      13.0000
      NaN
      S
    
    
      866
      867
      1
      2
      Duran y More, Miss. Asuncion
      female
      27
      1
      0
      SC/PARIS 2149
      13.8583
      NaN
      C
    
    
      867
      868
      0
      1
      Roebling, Mr. Washington Augustus II
      male
      31
      0
      0
      PC 17590
      50.4958
      A24
      S
    
    
      868
      869
      0
      3
      van Melkebeke, Mr. Philemon
      male
      NaN
      0
      0
      345777
      9.5000
      NaN
      S
    
    
      869
      870
      1
      3
      Johnson, Master. Harold Theodor
      male
      4
      1
      1
      347742
      11.1333
      NaN
      S
    
    
      870
      871
      0
      3
      Balkic, Mr. Cerin
      male
      26
      0
      0
      349248
      7.8958
      NaN
      S
    
    
      871
      872
      1
      1
      Beckwith, Mrs. Richard Leonard (Sallie Monypeny)
      female
      47
      1
      1
      11751
      52.5542
      D35
      S
    
    
      872
      873
      0
      1
      Carlsson, Mr. Frans Olof
      male
      33
      0
      0
      695
      5.0000
      B51 B53 B55
      S
    
    
      873
      874
      0
      3
      Vander Cruyssen, Mr. Victor
      male
      47
      0
      0
      345765
      9.0000
      NaN
      S
    
    
      874
      875
      1
      2
      Abelson, Mrs. Samuel (Hannah Wizosky)
      female
      28
      1
      0
      P/PP 3381
      24.0000
      NaN
      C
    
    
      875
      876
      1
      3
      Najib, Miss. Adele Kiamie "Jane"
      female
      15
      0
      0
      2667
      7.2250
      NaN
      C
    
    
      876
      877
      0
      3
      Gustafsson, Mr. Alfred Ossian
      male
      20
      0
      0
      7534
      9.8458
      NaN
      S
    
    
      877
      878
      0
      3
      Petroff, Mr. Nedelio
      male
      19
      0
      0
      349212
      7.8958
      NaN
      S
    
    
      878
      879
      0
      3
      Laleff, Mr. Kristo
      male
      NaN
      0
      0
      349217
      7.8958
      NaN
      S
    
    
      879
      880
      1
      1
      Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
      female
      56
      0
      1
      11767
      83.1583
      C50
      C
    
    
      880
      881
      1
      2
      Shelley, Mrs. William (Imanita Parrish Hall)
      female
      25
      0
      1
      230433
      26.0000
      NaN
      S
    
    
      881
      882
      0
      3
      Markun, Mr. Johann
      male
      33
      0
      0
      349257
      7.8958
      NaN
      S
    
    
      882
      883
      0
      3
      Dahlberg, Miss. Gerda Ulrika
      female
      22
      0
      0
      7552
      10.5167
      NaN
      S
    
    
      883
      884
      0
      2
      Banfield, Mr. Frederick James
      male
      28
      0
      0
      C.A./SOTON 34068
      10.5000
      NaN
      S
    
    
      884
      885
      0
      3
      Sutehall, Mr. Henry Jr
      male
      25
      0
      0
      SOTON/OQ 392076
      7.0500
      NaN
      S
    
    
      885
      886
      0
      3
      Rice, Mrs. William (Margaret Norton)
      female
      39
      0
      5
      382652
      29.1250
      NaN
      Q
    
    
      886
      887
      0
      2
      Montvila, Rev. Juozas
      male
      27
      0
      0
      211536
      13.0000
      NaN
      S
    
    
      887
      888
      1
      1
      Graham, Miss. Margaret Edith
      female
      19
      0
      0
      112053
      30.0000
      B42
      S
    
    
      888
      889
      0
      3
      Johnston, Miss. Catherine Helen "Carrie"
      female
      NaN
      1
      2
      W./C. 6607
      23.4500
      NaN
      S
    
    
      889
      890
      1
      1
      Behr, Mr. Karl Howell
      male
      26
      0
      0
      111369
      30.0000
      C148
      C
    
    
      890
      891
      0
      3
      Dooley, Mr. Patrick
      male
      32
      0
      0
      370376
      7.7500
      NaN
      Q
    
  

891 rows × 12 columns



In [ ]:



In [22]:

    
# #Shape of the Dataframe - The dataframe contains 891 Rows and 12 Columns
titanic_train.shape









    Out[22]:





(891, 12)



In [ ]:

Description about the 12 columns/features in the dataframe

PassengerId - Numerical - Contains a unique id for each passenger(auto-incremented)
Survived - Categorical - 0 = Didn't Survive | 1 = Survived

Pclass(Passenger Class) - Categorical - 1 = 1st | 2 = 2nd | 3 = 3rd;

Pclass serves as a proxy for socio-economic status - 1st ~ Upper; 2nd ~ Middle; 3rd ~ Lower

Name - Name
Sex - Categorical - Male | Female
Age - Numeical
SibSp(Number of Siblings/Spouses Aboard)

Sibling: Brother, Sister, Stepbrother, or Stepsister of Passenger Aboard Titanic

Spouse: Husband or Wife of Passenger Aboard Titanic (Mistresses and Fiances Ignored)

Parch(Number of Parents/Children Aboard)

Parent: Mother or Father of Passenger Aboard Titanic

Child: Son, Daughter, Stepson, or Stepdaughter of Passenger Aboard Titanic

Ticket - Ticket Number
Fare - Passenger Fare
Cabin - Cabin
Embarked - Port of Embarkation - C = Cherbourg | Q = Queenstown | S = Southampton



In [ ]:



In [24]:

    
# #The essence of the problem is that we are trying to predict whether a passenger aboard the titanic survived or not
# #depending on the various features in this training dataset. Therefore, depending on independent variables such as 
# #- Pclass, Sex, Age etc. our goal is to predict the dependent variable - Survived; that contains the value 0 for 
# #Didn't Survive and 1 for Survived. Furthermore, after building the model we will check this prediction on the Test dataset.



In [ ]:



In [31]:

    
pd.value_counts(titanic_train["Survived"])









    Out[31]:





0    549
1    342
dtype: int64



In [ ]:



In [35]:

    
# #There are certain columns in our dataset that might be very helpful as features and can therefore be dropped.
titanic_train_reduced = titanic_train.drop(["Name", "Ticket", "Cabin"], axis = 1)



In [68]:

    
# #To clean the dataset, we could use methods such as multiple imputation to fill in the NA elements.
# #BUT, to start with we can have an extremely clean dataset by dropping all the rows which contains one or more NA elements.
titanic_train_cleaned = titanic_train_reduced.dropna()



In [70]:

    
# #Convert Categorical Variables into Numerical Variables
titanic_train_cleaned.Sex = titanic_train_cleaned.Sex.apply(lambda sex: 1 if sex == "male" else 0)
titanic_train_cleaned









    Out[70]:






  
    
      
      PassengerId
      Survived
      Pclass
      Sex
      Age
      SibSp
      Parch
      Fare
      Embarked
    
  
  
    
      0
      1
      0
      3
      0
      22
      1
      0
      7.2500
      S
    
    
      1
      2
      1
      1
      0
      38
      1
      0
      71.2833
      C
    
    
      2
      3
      1
      3
      0
      26
      0
      0
      7.9250
      S
    
    
      3
      4
      1
      1
      0
      35
      1
      0
      53.1000
      S
    
    
      4
      5
      0
      3
      0
      35
      0
      0
      8.0500
      S
    
    
      6
      7
      0
      1
      0
      54
      0
      0
      51.8625
      S
    
    
      7
      8
      0
      3
      0
      2
      3
      1
      21.0750
      S
    
    
      8
      9
      1
      3
      0
      27
      0
      2
      11.1333
      S
    
    
      9
      10
      1
      2
      0
      14
      1
      0
      30.0708
      C
    
    
      10
      11
      1
      3
      0
      4
      1
      1
      16.7000
      S
    
    
      11
      12
      1
      1
      0
      58
      0
      0
      26.5500
      S
    
    
      12
      13
      0
      3
      0
      20
      0
      0
      8.0500
      S
    
    
      13
      14
      0
      3
      0
      39
      1
      5
      31.2750
      S
    
    
      14
      15
      0
      3
      0
      14
      0
      0
      7.8542
      S
    
    
      15
      16
      1
      2
      0
      55
      0
      0
      16.0000
      S
    
    
      16
      17
      0
      3
      0
      2
      4
      1
      29.1250
      Q
    
    
      18
      19
      0
      3
      0
      31
      1
      0
      18.0000
      S
    
    
      20
      21
      0
      2
      0
      35
      0
      0
      26.0000
      S
    
    
      21
      22
      1
      2
      0
      34
      0
      0
      13.0000
      S
    
    
      22
      23
      1
      3
      0
      15
      0
      0
      8.0292
      Q
    
    
      23
      24
      1
      1
      0
      28
      0
      0
      35.5000
      S
    
    
      24
      25
      0
      3
      0
      8
      3
      1
      21.0750
      S
    
    
      25
      26
      1
      3
      0
      38
      1
      5
      31.3875
      S
    
    
      27
      28
      0
      1
      0
      19
      3
      2
      263.0000
      S
    
    
      30
      31
      0
      1
      0
      40
      0
      0
      27.7208
      C
    
    
      33
      34
      0
      2
      0
      66
      0
      0
      10.5000
      S
    
    
      34
      35
      0
      1
      0
      28
      1
      0
      82.1708
      C
    
    
      35
      36
      0
      1
      0
      42
      1
      0
      52.0000
      S
    
    
      37
      38
      0
      3
      0
      21
      0
      0
      8.0500
      S
    
    
      38
      39
      0
      3
      0
      18
      2
      0
      18.0000
      S
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      856
      857
      1
      1
      0
      45
      1
      1
      164.8667
      S
    
    
      857
      858
      1
      1
      0
      51
      0
      0
      26.5500
      S
    
    
      858
      859
      1
      3
      0
      24
      0
      3
      19.2583
      C
    
    
      860
      861
      0
      3
      0
      41
      2
      0
      14.1083
      S
    
    
      861
      862
      0
      2
      0
      21
      1
      0
      11.5000
      S
    
    
      862
      863
      1
      1
      0
      48
      0
      0
      25.9292
      S
    
    
      864
      865
      0
      2
      0
      24
      0
      0
      13.0000
      S
    
    
      865
      866
      1
      2
      0
      42
      0
      0
      13.0000
      S
    
    
      866
      867
      1
      2
      0
      27
      1
      0
      13.8583
      C
    
    
      867
      868
      0
      1
      0
      31
      0
      0
      50.4958
      S
    
    
      869
      870
      1
      3
      0
      4
      1
      1
      11.1333
      S
    
    
      870
      871
      0
      3
      0
      26
      0
      0
      7.8958
      S
    
    
      871
      872
      1
      1
      0
      47
      1
      1
      52.5542
      S
    
    
      872
      873
      0
      1
      0
      33
      0
      0
      5.0000
      S
    
    
      873
      874
      0
      3
      0
      47
      0
      0
      9.0000
      S
    
    
      874
      875
      1
      2
      0
      28
      1
      0
      24.0000
      C
    
    
      875
      876
      1
      3
      0
      15
      0
      0
      7.2250
      C
    
    
      876
      877
      0
      3
      0
      20
      0
      0
      9.8458
      S
    
    
      877
      878
      0
      3
      0
      19
      0
      0
      7.8958
      S
    
    
      879
      880
      1
      1
      0
      56
      0
      1
      83.1583
      C
    
    
      880
      881
      1
      2
      0
      25
      0
      1
      26.0000
      S
    
    
      881
      882
      0
      3
      0
      33
      0
      0
      7.8958
      S
    
    
      882
      883
      0
      3
      0
      22
      0
      0
      10.5167
      S
    
    
      883
      884
      0
      2
      0
      28
      0
      0
      10.5000
      S
    
    
      884
      885
      0
      3
      0
      25
      0
      0
      7.0500
      S
    
    
      885
      886
      0
      3
      0
      39
      0
      5
      29.1250
      Q
    
    
      886
      887
      0
      2
      0
      27
      0
      0
      13.0000
      S
    
    
      887
      888
      1
      1
      0
      19
      0
      0
      30.0000
      S
    
    
      889
      890
      1
      1
      0
      26
      0
      0
      30.0000
      C
    
    
      890
      891
      0
      3
      0
      32
      0
      0
      7.7500
      Q
    
  

712 rows × 9 columns



In [ ]:



In [71]:

    
# #From 891 rows in our original dataset, we have come down to 712 "clean" rows
titanic_train_cleaned.shape









    Out[71]:





(712, 9)



In [ ]:



In [ ]:



In [ ]:



In [72]:

    
titanic_train_cleaned.groupby([titanic_train_cleaned.Survived, titanic_train_cleaned.Sex]).size()









    Out[72]:





Survived  Sex
0         0      424
1         0      288
dtype: int64



In [73]:

    
# #Percentage of Women who survived
195*100/(195+64)









    Out[73]:





75



In [74]:

    
# #Percentage of Men who survived
93*100/(93+360)









    Out[74]:





20



In [ ]:



In [90]:

    
# #TESTING DATASET
titanic_test = pd.read_csv("test.csv")



In [92]:

    
titanic_test_reduced = titanic_test.drop(["Name", "Ticket", "Cabin"], axis = 1)
titanic_test_cleaned = titanic_test_reduced.dropna()
titanic_test_cleaned.Sex = titanic_test_cleaned.Sex.apply(lambda sex: 1 if sex == "male" else 0)
titanic_test_cleaned









    Out[92]:






  
    
      
      PassengerId
      Pclass
      Sex
      Age
      SibSp
      Parch
      Fare
      Embarked
    
  
  
    
      0
      892
      3
      1
      34.5
      0
      0
      7.8292
      Q
    
    
      1
      893
      3
      0
      47.0
      1
      0
      7.0000
      S
    
    
      2
      894
      2
      1
      62.0
      0
      0
      9.6875
      Q
    
    
      3
      895
      3
      1
      27.0
      0
      0
      8.6625
      S
    
    
      4
      896
      3
      0
      22.0
      1
      1
      12.2875
      S
    
    
      5
      897
      3
      1
      14.0
      0
      0
      9.2250
      S
    
    
      6
      898
      3
      0
      30.0
      0
      0
      7.6292
      Q
    
    
      7
      899
      2
      1
      26.0
      1
      1
      29.0000
      S
    
    
      8
      900
      3
      0
      18.0
      0
      0
      7.2292
      C
    
    
      9
      901
      3
      1
      21.0
      2
      0
      24.1500
      S
    
    
      11
      903
      1
      1
      46.0
      0
      0
      26.0000
      S
    
    
      12
      904
      1
      0
      23.0
      1
      0
      82.2667
      S
    
    
      13
      905
      2
      1
      63.0
      1
      0
      26.0000
      S
    
    
      14
      906
      1
      0
      47.0
      1
      0
      61.1750
      S
    
    
      15
      907
      2
      0
      24.0
      1
      0
      27.7208
      C
    
    
      16
      908
      2
      1
      35.0
      0
      0
      12.3500
      Q
    
    
      17
      909
      3
      1
      21.0
      0
      0
      7.2250
      C
    
    
      18
      910
      3
      0
      27.0
      1
      0
      7.9250
      S
    
    
      19
      911
      3
      0
      45.0
      0
      0
      7.2250
      C
    
    
      20
      912
      1
      1
      55.0
      1
      0
      59.4000
      C
    
    
      21
      913
      3
      1
      9.0
      0
      1
      3.1708
      S
    
    
      23
      915
      1
      1
      21.0
      0
      1
      61.3792
      C
    
    
      24
      916
      1
      0
      48.0
      1
      3
      262.3750
      C
    
    
      25
      917
      3
      1
      50.0
      1
      0
      14.5000
      S
    
    
      26
      918
      1
      0
      22.0
      0
      1
      61.9792
      C
    
    
      27
      919
      3
      1
      22.5
      0
      0
      7.2250
      C
    
    
      28
      920
      1
      1
      41.0
      0
      0
      30.5000
      S
    
    
      30
      922
      2
      1
      50.0
      1
      0
      26.0000
      S
    
    
      31
      923
      2
      1
      24.0
      2
      0
      31.5000
      S
    
    
      32
      924
      3
      0
      33.0
      1
      2
      20.5750
      S
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      381
      1273
      3
      1
      26.0
      0
      0
      7.8792
      Q
    
    
      383
      1275
      3
      0
      19.0
      1
      0
      16.1000
      S
    
    
      385
      1277
      2
      0
      24.0
      1
      2
      65.0000
      S
    
    
      386
      1278
      3
      1
      24.0
      0
      0
      7.7750
      S
    
    
      387
      1279
      2
      1
      57.0
      0
      0
      13.0000
      S
    
    
      388
      1280
      3
      1
      21.0
      0
      0
      7.7500
      Q
    
    
      389
      1281
      3
      1
      6.0
      3
      1
      21.0750
      S
    
    
      390
      1282
      1
      1
      23.0
      0
      0
      93.5000
      S
    
    
      391
      1283
      1
      0
      51.0
      0
      1
      39.4000
      S
    
    
      392
      1284
      3
      1
      13.0
      0
      2
      20.2500
      S
    
    
      393
      1285
      2
      1
      47.0
      0
      0
      10.5000
      S
    
    
      394
      1286
      3
      1
      29.0
      3
      1
      22.0250
      S
    
    
      395
      1287
      1
      0
      18.0
      1
      0
      60.0000
      S
    
    
      396
      1288
      3
      1
      24.0
      0
      0
      7.2500
      Q
    
    
      397
      1289
      1
      0
      48.0
      1
      1
      79.2000
      C
    
    
      398
      1290
      3
      1
      22.0
      0
      0
      7.7750
      S
    
    
      399
      1291
      3
      1
      31.0
      0
      0
      7.7333
      Q
    
    
      400
      1292
      1
      0
      30.0
      0
      0
      164.8667
      S
    
    
      401
      1293
      2
      1
      38.0
      1
      0
      21.0000
      S
    
    
      402
      1294
      1
      0
      22.0
      0
      1
      59.4000
      C
    
    
      403
      1295
      1
      1
      17.0
      0
      0
      47.1000
      S
    
    
      404
      1296
      1
      1
      43.0
      1
      0
      27.7208
      C
    
    
      405
      1297
      2
      1
      20.0
      0
      0
      13.8625
      C
    
    
      406
      1298
      2
      1
      23.0
      1
      0
      10.5000
      S
    
    
      407
      1299
      1
      1
      50.0
      1
      1
      211.5000
      C
    
    
      409
      1301
      3
      0
      3.0
      1
      1
      13.7750
      S
    
    
      411
      1303
      1
      0
      37.0
      1
      0
      90.0000
      Q
    
    
      412
      1304
      3
      0
      28.0
      0
      0
      7.7750
      S
    
    
      414
      1306
      1
      0
      39.0
      0
      0
      108.9000
      C
    
    
      415
      1307
      3
      1
      38.5
      0
      0
      7.2500
      S
    
  

331 rows × 8 columns



In [ ]:



In [ ]:



In [ ]:



In [75]:

    
# #LOGISTIC REGRESSION



In [76]:

    
model_1 = linear_model.LogisticRegression()



In [80]:

    
model_1_dependent_vars = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare"]



In [84]:

    
titanic_train_cleaned[model_1_dependent_vars].values









    Out[84]:





array([[  3.    ,   0.    ,  22.    ,   1.    ,   0.    ,   7.25  ],
       [  1.    ,   0.    ,  38.    ,   1.    ,   0.    ,  71.2833],
       [  3.    ,   0.    ,  26.    ,   0.    ,   0.    ,   7.925 ],
       ..., 
       [  1.    ,   0.    ,  19.    ,   0.    ,   0.    ,  30.    ],
       [  1.    ,   0.    ,  26.    ,   0.    ,   0.    ,  30.    ],
       [  3.    ,   0.    ,  32.    ,   0.    ,   0.    ,   7.75  ]])



In [ ]:



In [85]:

    
model_1.fit(titanic_train_cleaned[model_1_dependent_vars].values, titanic_train_cleaned.Survived.values)









    Out[85]:





LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr',
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0)



In [97]:

    
model_1_result = model_1.predict(titanic_test_cleaned[model_1_dependent_vars])
model_1_result









    Out[97]:





array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1,
       0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1,
       0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
       1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0,
       1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
       0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1,
       1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
       0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0,
       0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1,
       1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1,
       1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1,
       0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1,
       0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1,
       0, 1, 0, 1, 0, 1, 0, 1, 0], dtype=int64)



In [ ]:



In [98]:

    
len(model_1_result)









    Out[98]:





331



In [ ]:

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22	1	0	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38	1	0	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss. Laina	female	26	0	0	STON/O2. 3101282	7.9250	NaN	S
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35	1	0	113803	53.1000	C123	S
4	5	0	3	Allen, Mr. William Henry	male	35	0	0	373450	8.0500	NaN	S
5	6	0	3	Moran, Mr. James	male	NaN	0	0	330877	8.4583	NaN	Q
6	7	0	1	McCarthy, Mr. Timothy J	male	54	0	0	17463	51.8625	E46	S
7	8	0	3	Palsson, Master. Gosta Leonard	male	2	3	1	349909	21.0750	NaN	S
8	9	1	3	Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)	female	27	0	2	347742	11.1333	NaN	S
9	10	1	2	Nasser, Mrs. Nicholas (Adele Achem)	female	14	1	0	237736	30.0708	NaN	C
10	11	1	3	Sandstrom, Miss. Marguerite Rut	female	4	1	1	PP 9549	16.7000	G6	S
11	12	1	1	Bonnell, Miss. Elizabeth	female	58	0	0	113783	26.5500	C103	S
12	13	0	3	Saundercock, Mr. William Henry	male	20	0	0	A/5. 2151	8.0500	NaN	S
13	14	0	3	Andersson, Mr. Anders Johan	male	39	1	5	347082	31.2750	NaN	S
14	15	0	3	Vestrom, Miss. Hulda Amanda Adolfina	female	14	0	0	350406	7.8542	NaN	S
15	16	1	2	Hewlett, Mrs. (Mary D Kingcome)	female	55	0	0	248706	16.0000	NaN	S
16	17	0	3	Rice, Master. Eugene	male	2	4	1	382652	29.1250	NaN	Q
17	18	1	2	Williams, Mr. Charles Eugene	male	NaN	0	0	244373	13.0000	NaN	S
18	19	0	3	Vander Planke, Mrs. Julius (Emelia Maria Vande...	female	31	1	0	345763	18.0000	NaN	S
19	20	1	3	Masselmani, Mrs. Fatima	female	NaN	0	0	2649	7.2250	NaN	C
20	21	0	2	Fynney, Mr. Joseph J	male	35	0	0	239865	26.0000	NaN	S
21	22	1	2	Beesley, Mr. Lawrence	male	34	0	0	248698	13.0000	D56	S
22	23	1	3	McGowan, Miss. Anna "Annie"	female	15	0	0	330923	8.0292	NaN	Q
23	24	1	1	Sloper, Mr. William Thompson	male	28	0	0	113788	35.5000	A6	S
24	25	0	3	Palsson, Miss. Torborg Danira	female	8	3	1	349909	21.0750	NaN	S
25	26	1	3	Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...	female	38	1	5	347077	31.3875	NaN	S
26	27	0	3	Emir, Mr. Farred Chehab	male	NaN	0	0	2631	7.2250	NaN	C
27	28	0	1	Fortune, Mr. Charles Alexander	male	19	3	2	19950	263.0000	C23 C25 C27	S
28	29	1	3	O'Dwyer, Miss. Ellen "Nellie"	female	NaN	0	0	330959	7.8792	NaN	Q
29	30	0	3	Todoroff, Mr. Lalio	male	NaN	0	0	349216	7.8958	NaN	S
...	...	...	...	...	...	...	...	...	...	...	...	...
861	862	0	2	Giles, Mr. Frederick Edward	male	21	1	0	28134	11.5000	NaN	S
862	863	1	1	Swift, Mrs. Frederick Joel (Margaret Welles Ba...	female	48	0	0	17466	25.9292	D17	S
863	864	0	3	Sage, Miss. Dorothy Edith "Dolly"	female	NaN	8	2	CA. 2343	69.5500	NaN	S
864	865	0	2	Gill, Mr. John William	male	24	0	0	233866	13.0000	NaN	S
865	866	1	2	Bystrom, Mrs. (Karolina)	female	42	0	0	236852	13.0000	NaN	S
866	867	1	2	Duran y More, Miss. Asuncion	female	27	1	0	SC/PARIS 2149	13.8583	NaN	C
867	868	0	1	Roebling, Mr. Washington Augustus II	male	31	0	0	PC 17590	50.4958	A24	S
868	869	0	3	van Melkebeke, Mr. Philemon	male	NaN	0	0	345777	9.5000	NaN	S
869	870	1	3	Johnson, Master. Harold Theodor	male	4	1	1	347742	11.1333	NaN	S
870	871	0	3	Balkic, Mr. Cerin	male	26	0	0	349248	7.8958	NaN	S
871	872	1	1	Beckwith, Mrs. Richard Leonard (Sallie Monypeny)	female	47	1	1	11751	52.5542	D35	S
872	873	0	1	Carlsson, Mr. Frans Olof	male	33	0	0	695	5.0000	B51 B53 B55	S
873	874	0	3	Vander Cruyssen, Mr. Victor	male	47	0	0	345765	9.0000	NaN	S
874	875	1	2	Abelson, Mrs. Samuel (Hannah Wizosky)	female	28	1	0	P/PP 3381	24.0000	NaN	C
875	876	1	3	Najib, Miss. Adele Kiamie "Jane"	female	15	0	0	2667	7.2250	NaN	C
876	877	0	3	Gustafsson, Mr. Alfred Ossian	male	20	0	0	7534	9.8458	NaN	S
877	878	0	3	Petroff, Mr. Nedelio	male	19	0	0	349212	7.8958	NaN	S
878	879	0	3	Laleff, Mr. Kristo	male	NaN	0	0	349217	7.8958	NaN	S
879	880	1	1	Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)	female	56	0	1	11767	83.1583	C50	C
880	881	1	2	Shelley, Mrs. William (Imanita Parrish Hall)	female	25	0	1	230433	26.0000	NaN	S
881	882	0	3	Markun, Mr. Johann	male	33	0	0	349257	7.8958	NaN	S
882	883	0	3	Dahlberg, Miss. Gerda Ulrika	female	22	0	0	7552	10.5167	NaN	S
883	884	0	2	Banfield, Mr. Frederick James	male	28	0	0	C.A./SOTON 34068	10.5000	NaN	S
884	885	0	3	Sutehall, Mr. Henry Jr	male	25	0	0	SOTON/OQ 392076	7.0500	NaN	S
885	886	0	3	Rice, Mrs. William (Margaret Norton)	female	39	0	5	382652	29.1250	NaN	Q
886	887	0	2	Montvila, Rev. Juozas	male	27	0	0	211536	13.0000	NaN	S
887	888	1	1	Graham, Miss. Margaret Edith	female	19	0	0	112053	30.0000	B42	S
888	889	0	3	Johnston, Miss. Catherine Helen "Carrie"	female	NaN	1	2	W./C. 6607	23.4500	NaN	S
889	890	1	1	Behr, Mr. Karl Howell	male	26	0	0	111369	30.0000	C148	C
890	891	0	3	Dooley, Mr. Patrick	male	32	0	0	370376	7.7500	NaN	Q

	PassengerId	Pclass	Sex	Age	SibSp	Parch	Fare	Embarked
0	892	3	1	34.5	0	0	7.8292	Q
1	893	3	0	47.0	1	0	7.0000	S
2	894	2	1	62.0	0	0	9.6875	Q
3	895	3	1	27.0	0	0	8.6625	S
4	896	3	0	22.0	1	1	12.2875	S
5	897	3	1	14.0	0	0	9.2250	S
6	898	3	0	30.0	0	0	7.6292	Q
7	899	2	1	26.0	1	1	29.0000	S
8	900	3	0	18.0	0	0	7.2292	C
9	901	3	1	21.0	2	0	24.1500	S
11	903	1	1	46.0	0	0	26.0000	S
12	904	1	0	23.0	1	0	82.2667	S
13	905	2	1	63.0	1	0	26.0000	S
14	906	1	0	47.0	1	0	61.1750	S
15	907	2	0	24.0	1	0	27.7208	C
16	908	2	1	35.0	0	0	12.3500	Q
17	909	3	1	21.0	0	0	7.2250	C
18	910	3	0	27.0	1	0	7.9250	S
19	911	3	0	45.0	0	0	7.2250	C
20	912	1	1	55.0	1	0	59.4000	C
21	913	3	1	9.0	0	1	3.1708	S
23	915	1	1	21.0	0	1	61.3792	C
24	916	1	0	48.0	1	3	262.3750	C
25	917	3	1	50.0	1	0	14.5000	S
26	918	1	0	22.0	0	1	61.9792	C
27	919	3	1	22.5	0	0	7.2250	C
28	920	1	1	41.0	0	0	30.5000	S
30	922	2	1	50.0	1	0	26.0000	S
31	923	2	1	24.0	2	0	31.5000	S
32	924	3	0	33.0	1	2	20.5750	S
...	...	...	...	...	...	...	...	...
381	1273	3	1	26.0	0	0	7.8792	Q
383	1275	3	0	19.0	1	0	16.1000	S
385	1277	2	0	24.0	1	2	65.0000	S
386	1278	3	1	24.0	0	0	7.7750	S
387	1279	2	1	57.0	0	0	13.0000	S
388	1280	3	1	21.0	0	0	7.7500	Q
389	1281	3	1	6.0	3	1	21.0750	S
390	1282	1	1	23.0	0	0	93.5000	S
391	1283	1	0	51.0	0	1	39.4000	S
392	1284	3	1	13.0	0	2	20.2500	S
393	1285	2	1	47.0	0	0	10.5000	S
394	1286	3	1	29.0	3	1	22.0250	S
395	1287	1	0	18.0	1	0	60.0000	S
396	1288	3	1	24.0	0	0	7.2500	Q
397	1289	1	0	48.0	1	1	79.2000	C
398	1290	3	1	22.0	0	0	7.7750	S
399	1291	3	1	31.0	0	0	7.7333	Q
400	1292	1	0	30.0	0	0	164.8667	S
401	1293	2	1	38.0	1	0	21.0000	S
402	1294	1	0	22.0	0	1	59.4000	C
403	1295	1	1	17.0	0	0	47.1000	S
404	1296	1	1	43.0	1	0	27.7208	C
405	1297	2	1	20.0	0	0	13.8625	C
406	1298	2	1	23.0	1	0	10.5000	S
407	1299	1	1	50.0	1	1	211.5000	C
409	1301	3	0	3.0	1	1	13.7750	S
411	1303	1	0	37.0	1	0	90.0000	Q
412	1304	3	0	28.0	0	0	7.7750	S
414	1306	1	0	39.0	0	0	108.9000	C
415	1307	3	1	38.5	0	0	7.2500	S