Imagine yourself on Titanic. You heard the news - the ship is sinking! Will you survive to tell the story?

Background

According to Wikipedia the ship carried 2,224 passengers and crew. Titanic carried lifeboats for only 1,178 people. At 11:40 ship time she hit an iceberg. The disaster resulted in more than 1,500 lost lifes. During the evacuation "women and children first" policy was adopted.

Getting our hands Tidy

Let's see what our environment looks like:


In [1]:
version


               _                           
platform       x86_64-apple-darwin16.7.0   
arch           x86_64                      
os             darwin16.7.0                
system         x86_64, darwin16.7.0        
status                                     
major          3                           
minor          4.2                         
year           2017                        
month          09                          
day            28                          
svn rev        73368                       
language       R                           
version.string R version 3.4.2 (2017-09-28)
nickname       Short Summer                

Importing the data

The training and test data is provided by Kaggle in csv format. Description of the variables is also available. Let's load it into R and have a peek.


In [2]:
library(tidyverse)
library(forcats) # factors munging
library(stringr) # string manipulation
library(ggthemes) # visualization
library(scales) # visualization
library(party) # random forest
library(nnet) # neural nets
library(caret) # ML
library(VIM) # missing data


Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ---------------------------------------------------
filter(): dplyr, stats
lag():    dplyr, stats

Attaching package: ‘scales’

The following object is masked from ‘package:purrr’:

    discard

The following object is masked from ‘package:readr’:

    col_factor

Loading required package: grid
Loading required package: mvtnorm
Loading required package: modeltools
Loading required package: stats4
Loading required package: strucchange
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Loading required package: sandwich

Attaching package: ‘strucchange’

The following object is masked from ‘package:stringr’:

    boundary

Loading required package: lattice

Attaching package: ‘caret’

The following object is masked from ‘package:purrr’:

    lift

Loading required package: colorspace
Loading required package: data.table

Attaching package: ‘data.table’

The following objects are masked from ‘package:dplyr’:

    between, first, last

The following object is masked from ‘package:purrr’:

    transpose

VIM is ready to use. 
 Since version 4.0.0 the GUI is in its own package VIMGUI.

          Please use the package to use the new (and old) GUI.

Suggestions and bug-reports can be submitted at: https://github.com/alexkowa/VIM/issues

Attaching package: ‘VIM’

The following object is masked from ‘package:datasets’:

    sleep


In [3]:
train <- read_csv("data/train.csv")


Parsed with column specification:
cols(
  PassengerId = col_integer(),
  Survived = col_integer(),
  Pclass = col_integer(),
  Name = col_character(),
  Sex = col_character(),
  Age = col_double(),
  SibSp = col_integer(),
  Parch = col_integer(),
  Ticket = col_character(),
  Fare = col_double(),
  Cabin = col_character(),
  Embarked = col_character()
)

891 rows and 12 columns in our training set. But who the passengers really are? Let's delve a bit deeper...

Visualize (skip transform)


In [4]:
glimpse(train)


Observations: 891
Variables: 12
$ PassengerId <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, ...
$ Survived    <int> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0...
$ Pclass      <int> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2, 3, 2, 3...
$ Name        <chr> "Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley ...
$ Sex         <chr> "male", "female", "female", "female", "male", "male", "...
$ Age         <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58, 20, 39, 1...
$ SibSp       <int> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0, 4, 0, 1...
$ Parch       <int> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0, 1, 0, 0...
$ Ticket      <chr> "A/5 21171", "PC 17599", "STON/O2. 3101282", "113803", ...
$ Fare        <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, 51.86...
$ Cabin       <chr> NA, "C85", NA, "C123", NA, NA, "E46", NA, NA, NA, "G6",...
$ Embarked    <chr> "S", "C", "S", "S", "S", "Q", "S", "S", "S", "C", "S", ...

In [5]:
head(train)


PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.2500 NA S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer)female 38 1 0 PC 17599 71.2833 C85 C
3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.9250 NA S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1000 C123 S
5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.0500 NA S
6 0 3 Moran, Mr. James male NA 0 0 330877 8.4583 NA Q

In [6]:
summary(train)


  PassengerId       Survived          Pclass          Name          
 Min.   :  1.0   Min.   :0.0000   Min.   :1.000   Length:891        
 1st Qu.:223.5   1st Qu.:0.0000   1st Qu.:2.000   Class :character  
 Median :446.0   Median :0.0000   Median :3.000   Mode  :character  
 Mean   :446.0   Mean   :0.3838   Mean   :2.309                     
 3rd Qu.:668.5   3rd Qu.:1.0000   3rd Qu.:3.000                     
 Max.   :891.0   Max.   :1.0000   Max.   :3.000                     
                                                                    
     Sex                 Age            SibSp           Parch       
 Length:891         Min.   : 0.42   Min.   :0.000   Min.   :0.0000  
 Class :character   1st Qu.:20.12   1st Qu.:0.000   1st Qu.:0.0000  
 Mode  :character   Median :28.00   Median :0.000   Median :0.0000  
                    Mean   :29.70   Mean   :0.523   Mean   :0.3816  
                    3rd Qu.:38.00   3rd Qu.:1.000   3rd Qu.:0.0000  
                    Max.   :80.00   Max.   :8.000   Max.   :6.0000  
                    NA's   :177                                     
    Ticket               Fare           Cabin             Embarked        
 Length:891         Min.   :  0.00   Length:891         Length:891        
 Class :character   1st Qu.:  7.91   Class :character   Class :character  
 Mode  :character   Median : 14.45   Mode  :character   Mode  :character  
                    Mean   : 32.20                                        
                    3rd Qu.: 31.00                                        
                    Max.   :512.33                                        
                                                                          

Survival rate


In [7]:
summarise(train, SurvivalRate = sum(Survived) / nrow(train))


SurvivalRate
0.3838384

Model

Our first model is pretty simple - we predict that everyone perish:


In [8]:
test <- read_csv("data/test.csv")
model <- tibble(PassengerID = test$PassengerId, Survived = 0)
write_csv(model, "results/baseline.csv")


Parsed with column specification:
cols(
  PassengerId = col_integer(),
  Pclass = col_integer(),
  Name = col_character(),
  Sex = col_character(),
  Age = col_double(),
  SibSp = col_integer(),
  Parch = col_integer(),
  Ticket = col_character(),
  Fare = col_double(),
  Cabin = col_character(),
  Embarked = col_character()
)

How much gender affects survival rate?

Transform

Change survived to yes and no:


In [9]:
titanic <- train %>%
                mutate(Survived = factor(Survived)) %>%
                mutate(Survived = fct_recode(Survived, "No" = "0", "Yes" = "1"))

Normalize the sex column:


In [10]:
titanic <- titanic %>%
        mutate(Sex = factor(Sex)) %>%
        mutate(Sex = fct_recode(Sex, "Female" = "female", "Male" = "male"))

Visualize


In [11]:
options(repr.plot.width=7, repr.plot.height=5)

In [12]:
ggplot(titanic, aes(Sex, fill=Survived)) +
            geom_bar(position = "fill") +
            ylab("Survival Rate") +
            geom_hline(yintercept = (sum(train$Survived)/nrow(train)), col = "white", lty = 2) +
            ggtitle("Survival Rate by Gender") +
            theme_hc() +
            scale_colour_hc()


Model

Our second model is simple, still. We predict that all (and only) women survive.


In [13]:
model <- tibble(PassengerID = test$PassengerId, Survived = ifelse(test$Sex == 'female', 1, 0))
write_csv(model, "results/females_survive.csv")

What social status does passangers have?

Transform

Let's try to extract person titles and try to group them based on that


In [14]:
titanic <- titanic %>%
    mutate(Title = str_sub(Name, str_locate(Name, ",")[ , 1] + 2, str_locate(Name, "\\.")[ , 1] - 1))

Visualize


In [15]:
titanic %>% group_by(Title) %>%
              summarise(count = n()) %>%
              arrange(desc(count))


Titlecount
Mr 517
Miss 182
Mrs 125
Master 40
Dr 7
Rev 6
Col 2
Major 2
Mlle 2
Capt 1
Don 1
Jonkheer 1
Lady 1
Mme 1
Ms 1
Sir 1
the Countess 1

Are mothers more likely to survive?

Transform


In [16]:
titanic <- titanic %>% 
    mutate(Mother = factor(
        ifelse(
            c(
                titanic$Title == "Mrs" | 
                titanic$Title == "Mme" | 
                titanic$Title == "the Countess" | 
                titanic$Title == "Dona" | 
                titanic$Title == "Lady"
            ) & titanic$Parch > 0, 
            "Yes", "No"
            )
        )
    )

Visualize


In [17]:
ggplot(titanic, aes(x = Mother, fill = Survived)) +
    geom_bar(position = "fill") +
    ylab("Survival Rate") +
    geom_hline(yintercept = (sum(train$Survived)/nrow(train)), col = "white", lty = 2) +
    ggtitle("Survival Rate by Motherhood Status") +
    theme_hc() +
    scale_colour_hc()


Does social status influence the chance of survival?

Transform


In [18]:
titanic <- titanic %>%
    mutate(Title = factor(Title)) %>%
    mutate(Title = fct_collapse(Title, 
                                "Miss" = c("Mlle", "Ms"), 
                                "Mrs" = "Mme",
                                "Ranked" = c( "Major", "Dr", "Capt", "Col", "Rev"),
                                "Royalty" = c("Lady", "the Countess", "Don", "Sir", "Jonkheer")))

Visualize


In [19]:
ggplot(titanic, aes(x = Title, fill = Survived)) +
        geom_bar(position = "fill") +
        ylab("Survival Rate") +
        geom_hline(yintercept = (sum(train$Survived)/nrow(train)), col = "white", lty = 2) +
        ggtitle("Survival Rate by Title") +
        theme_hc() +
        scale_colour_hc()


Does being part of a family help?

Let define 3 types of families - Large (more than 4 members), Medium (more than 1 but less than 5 members) and Single (1 member).

Transform


In [20]:
titanic <- titanic %>% 
    mutate(FamilySize = SibSp + Parch + 1) %>% 
    mutate(FamilyType = 
        factor(
            ifelse(FamilySize > 4, 
                "Large", 
                ifelse(FamilySize == 1, 
                    "Single", 
                    "Medium"
                    )
              )
        )
    )

Visualize


In [21]:
ggplot(titanic, aes(x = FamilyType, fill = Survived)) +
    geom_bar(position = "fill") +
    ylab("Survival Rate") +
    geom_hline(yintercept = (sum(train$Survived)/nrow(train)), col = "white", lty = 2) + 
    ggtitle ("Survival Rate by Family Group Size") +
    theme_hc() +
    scale_colour_hc()


How likely are high class passangers to survive?

Visualize


In [22]:
ggplot(titanic, aes(x = Pclass, fill = Survived)) +
    geom_bar(position = "fill") +
    ylab("Survival Rate") +
    geom_hline(yintercept = (sum(train$Survived)/nrow(train)), col = "white", lty = 2) +
    ggtitle("Survival Rates by Passenger Class") +
    theme_hc() +
    scale_colour_hc()



In [23]:
ggplot(titanic, aes(x = log(Fare), fill = Survived)) +
    geom_density(alpha = 0.4)  + 
    ggtitle("Survival Rates by Fare (log)") +
    theme_hc() +
    scale_colour_hc()


Warning message:
“Removed 15 rows containing non-finite values (stat_density).”

Are younger passangers more likely to survive?

Transform

Let break the age into three groups: Child (under 18), Adult and OAP (over 60):


In [24]:
titanic <- titanic %>% 
        mutate(LifeStage = factor(ifelse(Age < 18, "Child", ifelse(Age <= 60, "Adult", "OAP"))))

Visualize


In [25]:
ggplot(titanic, aes(x = LifeStage, fill = Survived)) +
      geom_bar(position = "fill") +
      ylab("Survival Rate") +
      geom_hline(yintercept = (sum(train$Survived)/nrow(train)), col = "white", lty = 2) +
      ggtitle("Survival Rates by Life Stage")


Model

Let's build a generalized linear model (GLM) using our data:


In [26]:
titanic <- titanic %>% 
        mutate(Fare = ifelse(Fare == 0, 0.001, Fare)) %>%
        mutate(LogFare = log(Fare))

In [27]:
train_data <- select(titanic, Survived, Pclass, Sex, LifeStage, LogFare)

In [28]:
train_glm <- glm(Survived ~ Pclass + Sex + LifeStage + LogFare, 
                 family = binomial, 
                 data = train_data)

In [29]:
summary(train_glm)


Call:
glm(formula = Survived ~ Pclass + Sex + LifeStage + LogFare, 
    family = binomial, data = train_data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.6190  -0.6935  -0.4073   0.7228   2.2509  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     3.505726   0.519538   6.748 1.50e-11 ***
Pclass         -1.146695   0.143291  -8.003 1.22e-15 ***
SexMale        -2.510750   0.208371 -12.049  < 2e-16 ***
LifeStageChild  1.044437   0.274183   3.809 0.000139 ***
LifeStageOAP   -1.045240   0.593733  -1.760 0.078331 .  
LogFare        -0.001359   0.082806  -0.016 0.986902    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 964.52  on 713  degrees of freedom
Residual deviance: 653.27  on 708  degrees of freedom
  (177 observations deleted due to missingness)
AIC: 665.27

Number of Fisher Scoring iterations: 4

Prediction

First, we have to apply all our transformations to the test data:


In [30]:
titanic_test <- test %>%
        mutate(Sex = factor(Sex)) %>%
        mutate(Sex = fct_recode(Sex, "Female" = "female", "Male" = "male")) %>%
        mutate(LifeStage = factor(ifelse(Age < 18, "Child", ifelse(Age <= 60, "Adult", "OAP")))) %>%
        mutate(Fare = ifelse(Fare == 0, 0.001, Fare)) %>%
        mutate(LogFare = log(Fare))

In [31]:
test_data <- select(titanic_test, Pclass, Sex, LifeStage, LogFare)

In [32]:
p_hats <- predict.glm(train_glm, newdata = test_data, type = "response", na.action = na.pass)

In [33]:
survived_hat <- ifelse(is.na(p_hats) | p_hats <= 0.5, 0, 1)

In [34]:
glm_results <- tibble(PassengerID = test$PassengerId, Survived = survived_hat)
write_csv(glm_results, "results/glm.csv")

Can we do better?

Model

Let's try with Conditional Random Forest model:


In [35]:
cf_model <- cforest(Survived ~ Pclass + Sex + LifeStage + LogFare,
                 data = train_data, 
                 controls = cforest_unbiased(ntree = 1000, mtry = 3))

Evaluation


In [36]:
table(predict(cf_model), train_data$Survived)


     
       No Yes
  No  513 113
  Yes  36 229

In [37]:
varimp(cf_model)


Pclass
0.0870336391437309
Sex
0.197434250764526
LifeStage
0.0191253822629969
LogFare
0.0219755351681957

Prediction


In [38]:
cf_predictions <- predict(cf_model, test_data, OOB=TRUE, type="response")
cf_predictions <- ifelse(cf_predictions == "No", 0, 1)

In [39]:
cf_results <- tibble(PassengerID = test$PassengerId, Survived = cf_predictions)
write_csv(cf_results, "results/crf.csv")

Model

Finally, let's use a Neural Network:


In [40]:
nn_model <- nnet(Survived ~ Pclass + Sex + LifeStage + LogFare, 
                 data = train_data, size = 2,
                 linout = FALSE, maxit = 10000)


# weights:  15
initial  value 501.127651 
iter  10 value 355.317817
iter  20 value 321.836629
iter  30 value 315.214916
iter  40 value 314.476622
iter  50 value 308.777809
iter  60 value 301.235215
iter  70 value 301.034584
iter  80 value 300.285095
iter  90 value 298.957481
iter 100 value 294.629577
iter 110 value 294.595563
iter 120 value 294.580210
iter 130 value 294.558582
final  value 294.556974 
converged

In [41]:
nn_predictions <- predict(nn_model, test_data, type = "class")
nn_predictions <- ifelse(is.na(nn_predictions) | nn_predictions == "No", 0, 1)

In [42]:
nn_results <- tibble(PassengerID = test$PassengerId, Survived = nn_predictions)
write_csv(model, "results/nn.csv")

Can we do better?

Do we have missing data and how much of it is missing?


In [43]:
aggr(train, prop = FALSE, combined = FALSE, numbers = TRUE, sortVars = TRUE, sortCombs = TRUE)


 Variables sorted by number of missings: 
    Variable Count
       Cabin   687
         Age   177
    Embarked     2
 PassengerId     0
    Survived     0
      Pclass     0
        Name     0
         Sex     0
       SibSp     0
       Parch     0
      Ticket     0
        Fare     0

Imputing data

We can replace the missing values with some sensible ones:


In [45]:
impute_data <- function(data, columns = colnames(table)) {
    kNN(data, columns, weightDist = TRUE, 10)
}

In [46]:
train_imputed <- impute_data(train, c("Age", "Embarked"))


Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”

In [47]:
head(train_imputed, 20)


PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedAge_impEmbarked_imp
1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.2500 NA S FALSE FALSE
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0 PC 17599 71.2833 C85 C FALSE FALSE
3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.9250 NA S FALSE FALSE
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1000 C123 S FALSE FALSE
5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.0500 NA S FALSE FALSE
6 0 3 Moran, Mr. James male 21 0 0 330877 8.4583 NA Q TRUE FALSE
7 0 1 McCarthy, Mr. Timothy J male 54 0 0 17463 51.8625 E46 S FALSE FALSE
8 0 3 Palsson, Master. Gosta Leonard male 2 3 1 349909 21.0750 NA S FALSE FALSE
9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27 0 2 347742 11.1333 NA S FALSE FALSE
10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14 1 0 237736 30.0708 NA C FALSE FALSE
11 1 3 Sandstrom, Miss. Marguerite Rut female 4 1 1 PP 9549 16.7000 G6 S FALSE FALSE
12 1 1 Bonnell, Miss. Elizabeth female 58 0 0 113783 26.5500 C103 S FALSE FALSE
13 0 3 Saundercock, Mr. William Henry male 20 0 0 A/5. 2151 8.0500 NA S FALSE FALSE
14 0 3 Andersson, Mr. Anders Johan male 39 1 5 347082 31.2750 NA S FALSE FALSE
15 0 3 Vestrom, Miss. Hulda Amanda Adolfina female 14 0 0 350406 7.8542 NA S FALSE FALSE
16 1 2 Hewlett, Mrs. (Mary D Kingcome) female 55 0 0 248706 16.0000 NA S FALSE FALSE
17 0 3 Rice, Master. Eugene male 2 4 1 382652 29.1250 NA Q FALSE FALSE
18 1 2 Williams, Mr. Charles Eugene male 29 0 0 244373 13.0000 NA S TRUE FALSE
19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)female 31 1 0 345763 18.0000 NA S FALSE FALSE
20 1 3 Masselmani, Mrs. Fatima female 26 0 0 2649 7.2250 NA C TRUE FALSE

In [48]:
head(train, 20)


PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.2500 NA S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0 PC 17599 71.2833 C85 C
3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.9250 NA S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1000 C123 S
5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.0500 NA S
6 0 3 Moran, Mr. James male NA 0 0 330877 8.4583 NA Q
7 0 1 McCarthy, Mr. Timothy J male 54 0 0 17463 51.8625 E46 S
8 0 3 Palsson, Master. Gosta Leonard male 2 3 1 349909 21.0750 NA S
9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27 0 2 347742 11.1333 NA S
10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14 1 0 237736 30.0708 NA C
11 1 3 Sandstrom, Miss. Marguerite Rut female 4 1 1 PP 9549 16.7000 G6 S
12 1 1 Bonnell, Miss. Elizabeth female 58 0 0 113783 26.5500 C103 S
13 0 3 Saundercock, Mr. William Henry male 20 0 0 A/5. 2151 8.0500 NA S
14 0 3 Andersson, Mr. Anders Johan male 39 1 5 347082 31.2750 NA S
15 0 3 Vestrom, Miss. Hulda Amanda Adolfina female 14 0 0 350406 7.8542 NA S
16 1 2 Hewlett, Mrs. (Mary D Kingcome) female 55 0 0 248706 16.0000 NA S
17 0 3 Rice, Master. Eugene male 2 4 1 382652 29.1250 NA Q
18 1 2 Williams, Mr. Charles Eugene male NA 0 0 244373 13.0000 NA S
19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)female 31 1 0 345763 18.0000 NA S
20 1 3 Masselmani, Mrs. Fatima female NA 0 0 2649 7.2250 NA C

In [49]:
aggr(train_imputed, prop = FALSE, combined = FALSE, numbers = TRUE, sortVars = TRUE, sortCombs = TRUE)


 Variables sorted by number of missings: 
     Variable Count
        Cabin   687
  PassengerId     0
     Survived     0
       Pclass     0
         Name     0
          Sex     0
          Age     0
        SibSp     0
        Parch     0
       Ticket     0
         Fare     0
     Embarked     0
      Age_imp     0
 Embarked_imp     0

Modeling with missing data

Things ain't gonna change much when building a model using imputed data:


In [50]:
transform_data <- function(data) {
    data %>%
        mutate(Sex = factor(Sex)) %>%
        mutate(Sex = fct_recode(Sex, "Female" = "female", "Male" = "male")) %>%
        mutate(LifeStage = factor(ifelse(Age < 18, "Child", ifelse(Age <= 60, "Adult", "OAP")))) %>%
        mutate(Fare = ifelse(Fare == 0, 0.001, Fare)) %>%
        mutate(LogFare = log(Fare))
}

In [51]:
train_imputed <- train_imputed %>%
        mutate(Survived = factor(Survived)) %>%
        mutate(Survived = fct_recode(Survived, "No" = "0", "Yes" = "1"))

train_data_imputed <- 
    transform_data(train_imputed) %>%
    select(Survived, Pclass, Sex, LifeStage, LogFare)

In [52]:
test_data_imputed <- 
    transform_data(impute_data(test, c("Age", "Embarked"))) %>%
    select(Pclass, Sex, LifeStage, LogFare)


Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”

In [53]:
cf_model <- cforest(Survived ~ Pclass + Sex + LifeStage + LogFare,
                 data = train_data_imputed, 
                 controls = cforest_unbiased(ntree = 1000, mtry = 3))

In [54]:
cf_predictions <- predict(cf_model, test_data_imputed, OOB=TRUE, type="response")
cf_predictions <- ifelse(cf_predictions == "No", 0, 1)

In [55]:
cf_results <- tibble(PassengerID = test$PassengerId, Survived = cf_predictions)
write_csv(cf_results, "results/crf_imputed.csv")

Assessing accuracy localy


In [67]:
train_index <- createDataPartition(train_data_imputed$Survived, p=0.8, list=FALSE)
cv_train <- train_data_imputed[train_index, ]
cv_validation <- train_data_imputed[-train_index, ]

In [68]:
cat("total rows:", nrow(train_data_imputed), "train rows:", nrow(cv_train), "validation rows", nrow(cv_validation))


total rows: 891 train rows: 714 validation rows 177

In [69]:
cv_nn_model <- nnet(Survived ~ Pclass + Sex + LifeStage + LogFare, 
                 data = cv_train, size = 2,
                 linout = FALSE, maxit = 10000)


# weights:  15
initial  value 508.353244 
iter  10 value 322.103304
iter  20 value 306.206454
iter  30 value 302.414138
iter  40 value 296.958258
iter  50 value 295.519315
iter  60 value 295.046188
iter  70 value 294.988406
iter  80 value 294.971424
iter  90 value 294.940541
final  value 294.928958 
converged

In [70]:
cv_predictions <- predict(cv_nn_model, cv_validation, type = "class")

In [78]:
caret::confusionMatrix(table(cv_predictions, cv_validation$Survived), positive='Yes')


Confusion Matrix and Statistics

              
cv_predictions  No Yes
           No  106  40
           Yes   3  28
                                         
               Accuracy : 0.7571         
                 95% CI : (0.687, 0.8183)
    No Information Rate : 0.6158         
    P-Value [Acc > NIR] : 4.823e-05      
                                         
                  Kappa : 0.428          
 Mcnemar's Test P-Value : 4.021e-08      
                                         
            Sensitivity : 0.4118         
            Specificity : 0.9725         
         Pos Pred Value : 0.9032         
         Neg Pred Value : 0.7260         
             Prevalence : 0.3842         
         Detection Rate : 0.1582         
   Detection Prevalence : 0.1751         
      Balanced Accuracy : 0.6921         
                                         
       'Positive' Class : Yes            
                                         

In [77]:
?caret::confusionMatrix