Imagine yourself on Titanic. You heard the news - the ship is sinking! Will you survive to tell the story?
According to Wikipedia the ship carried 2,224 passengers and crew. Titanic carried lifeboats for only 1,178 people. At 11:40 ship time she hit an iceberg. The disaster resulted in more than 1,500 lost lifes. During the evacuation "women and children first" policy was adopted.
Let's see what our environment looks like:
In [1]:
version
_
platform x86_64-apple-darwin16.7.0
arch x86_64
os darwin16.7.0
system x86_64, darwin16.7.0
status
major 3
minor 4.2
year 2017
month 09
day 28
svn rev 73368
language R
version.string R version 3.4.2 (2017-09-28)
nickname Short Summer
The training and test data is provided by Kaggle in csv format. Description of the variables is also available. Let's load it into R and have a peek.
In [2]:
library(tidyverse)
library(forcats) # factors munging
library(stringr) # string manipulation
library(ggthemes) # visualization
library(scales) # visualization
library(party) # random forest
library(nnet) # neural nets
library(caret) # ML
library(VIM) # missing data
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ---------------------------------------------------
filter(): dplyr, stats
lag(): dplyr, stats
Attaching package: ‘scales’
The following object is masked from ‘package:purrr’:
discard
The following object is masked from ‘package:readr’:
col_factor
Loading required package: grid
Loading required package: mvtnorm
Loading required package: modeltools
Loading required package: stats4
Loading required package: strucchange
Loading required package: zoo
Attaching package: ‘zoo’
The following objects are masked from ‘package:base’:
as.Date, as.Date.numeric
Loading required package: sandwich
Attaching package: ‘strucchange’
The following object is masked from ‘package:stringr’:
boundary
Loading required package: lattice
Attaching package: ‘caret’
The following object is masked from ‘package:purrr’:
lift
Loading required package: colorspace
Loading required package: data.table
Attaching package: ‘data.table’
The following objects are masked from ‘package:dplyr’:
between, first, last
The following object is masked from ‘package:purrr’:
transpose
VIM is ready to use.
Since version 4.0.0 the GUI is in its own package VIMGUI.
Please use the package to use the new (and old) GUI.
Suggestions and bug-reports can be submitted at: https://github.com/alexkowa/VIM/issues
Attaching package: ‘VIM’
The following object is masked from ‘package:datasets’:
sleep
In [3]:
train <- read_csv("data/train.csv")
Parsed with column specification:
cols(
PassengerId = col_integer(),
Survived = col_integer(),
Pclass = col_integer(),
Name = col_character(),
Sex = col_character(),
Age = col_double(),
SibSp = col_integer(),
Parch = col_integer(),
Ticket = col_character(),
Fare = col_double(),
Cabin = col_character(),
Embarked = col_character()
)
891 rows and 12 columns in our training set. But who the passengers really are? Let's delve a bit deeper...
In [4]:
glimpse(train)
Observations: 891
Variables: 12
$ PassengerId <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, ...
$ Survived <int> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0...
$ Pclass <int> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2, 3, 2, 3...
$ Name <chr> "Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley ...
$ Sex <chr> "male", "female", "female", "female", "male", "male", "...
$ Age <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58, 20, 39, 1...
$ SibSp <int> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0, 4, 0, 1...
$ Parch <int> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0, 1, 0, 0...
$ Ticket <chr> "A/5 21171", "PC 17599", "STON/O2. 3101282", "113803", ...
$ Fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, 51.86...
$ Cabin <chr> NA, "C85", NA, "C123", NA, NA, "E46", NA, NA, NA, "G6",...
$ Embarked <chr> "S", "C", "S", "S", "S", "Q", "S", "S", "S", "C", "S", ...
In [5]:
head(train)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.2500 NA S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0 PC 17599 71.2833 C85 C
3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.9250 NA S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1000 C123 S
5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.0500 NA S
6 0 3 Moran, Mr. James male NA 0 0 330877 8.4583 NA Q
In [6]:
summary(train)
PassengerId Survived Pclass Name
Min. : 1.0 Min. :0.0000 Min. :1.000 Length:891
1st Qu.:223.5 1st Qu.:0.0000 1st Qu.:2.000 Class :character
Median :446.0 Median :0.0000 Median :3.000 Mode :character
Mean :446.0 Mean :0.3838 Mean :2.309
3rd Qu.:668.5 3rd Qu.:1.0000 3rd Qu.:3.000
Max. :891.0 Max. :1.0000 Max. :3.000
Sex Age SibSp Parch
Length:891 Min. : 0.42 Min. :0.000 Min. :0.0000
Class :character 1st Qu.:20.12 1st Qu.:0.000 1st Qu.:0.0000
Mode :character Median :28.00 Median :0.000 Median :0.0000
Mean :29.70 Mean :0.523 Mean :0.3816
3rd Qu.:38.00 3rd Qu.:1.000 3rd Qu.:0.0000
Max. :80.00 Max. :8.000 Max. :6.0000
NA's :177
Ticket Fare Cabin Embarked
Length:891 Min. : 0.00 Length:891 Length:891
Class :character 1st Qu.: 7.91 Class :character Class :character
Mode :character Median : 14.45 Mode :character Mode :character
Mean : 32.20
3rd Qu.: 31.00
Max. :512.33
In [7]:
summarise(train, SurvivalRate = sum(Survived) / nrow(train))
SurvivalRate
0.3838384
In [8]:
test <- read_csv("data/test.csv")
model <- tibble(PassengerID = test$PassengerId, Survived = 0)
write_csv(model, "results/baseline.csv")
Parsed with column specification:
cols(
PassengerId = col_integer(),
Pclass = col_integer(),
Name = col_character(),
Sex = col_character(),
Age = col_double(),
SibSp = col_integer(),
Parch = col_integer(),
Ticket = col_character(),
Fare = col_double(),
Cabin = col_character(),
Embarked = col_character()
)
In [9]:
titanic <- train %>%
mutate(Survived = factor(Survived)) %>%
mutate(Survived = fct_recode(Survived, "No" = "0", "Yes" = "1"))
Normalize the sex column:
In [10]:
titanic <- titanic %>%
mutate(Sex = factor(Sex)) %>%
mutate(Sex = fct_recode(Sex, "Female" = "female", "Male" = "male"))
In [11]:
options(repr.plot.width=7, repr.plot.height=5)
In [12]:
ggplot(titanic, aes(Sex, fill=Survived)) +
geom_bar(position = "fill") +
ylab("Survival Rate") +
geom_hline(yintercept = (sum(train$Survived)/nrow(train)), col = "white", lty = 2) +
ggtitle("Survival Rate by Gender") +
theme_hc() +
scale_colour_hc()
In [13]:
model <- tibble(PassengerID = test$PassengerId, Survived = ifelse(test$Sex == 'female', 1, 0))
write_csv(model, "results/females_survive.csv")
In [14]:
titanic <- titanic %>%
mutate(Title = str_sub(Name, str_locate(Name, ",")[ , 1] + 2, str_locate(Name, "\\.")[ , 1] - 1))
In [15]:
titanic %>% group_by(Title) %>%
summarise(count = n()) %>%
arrange(desc(count))
Title count
Mr 517
Miss 182
Mrs 125
Master 40
Dr 7
Rev 6
Col 2
Major 2
Mlle 2
Capt 1
Don 1
Jonkheer 1
Lady 1
Mme 1
Ms 1
Sir 1
the Countess 1
In [16]:
titanic <- titanic %>%
mutate(Mother = factor(
ifelse(
c(
titanic$Title == "Mrs" |
titanic$Title == "Mme" |
titanic$Title == "the Countess" |
titanic$Title == "Dona" |
titanic$Title == "Lady"
) & titanic$Parch > 0,
"Yes", "No"
)
)
)
In [17]:
ggplot(titanic, aes(x = Mother, fill = Survived)) +
geom_bar(position = "fill") +
ylab("Survival Rate") +
geom_hline(yintercept = (sum(train$Survived)/nrow(train)), col = "white", lty = 2) +
ggtitle("Survival Rate by Motherhood Status") +
theme_hc() +
scale_colour_hc()
In [18]:
titanic <- titanic %>%
mutate(Title = factor(Title)) %>%
mutate(Title = fct_collapse(Title,
"Miss" = c("Mlle", "Ms"),
"Mrs" = "Mme",
"Ranked" = c( "Major", "Dr", "Capt", "Col", "Rev"),
"Royalty" = c("Lady", "the Countess", "Don", "Sir", "Jonkheer")))
In [19]:
ggplot(titanic, aes(x = Title, fill = Survived)) +
geom_bar(position = "fill") +
ylab("Survival Rate") +
geom_hline(yintercept = (sum(train$Survived)/nrow(train)), col = "white", lty = 2) +
ggtitle("Survival Rate by Title") +
theme_hc() +
scale_colour_hc()
In [20]:
titanic <- titanic %>%
mutate(FamilySize = SibSp + Parch + 1) %>%
mutate(FamilyType =
factor(
ifelse(FamilySize > 4,
"Large",
ifelse(FamilySize == 1,
"Single",
"Medium"
)
)
)
)
In [21]:
ggplot(titanic, aes(x = FamilyType, fill = Survived)) +
geom_bar(position = "fill") +
ylab("Survival Rate") +
geom_hline(yintercept = (sum(train$Survived)/nrow(train)), col = "white", lty = 2) +
ggtitle ("Survival Rate by Family Group Size") +
theme_hc() +
scale_colour_hc()
In [22]:
ggplot(titanic, aes(x = Pclass, fill = Survived)) +
geom_bar(position = "fill") +
ylab("Survival Rate") +
geom_hline(yintercept = (sum(train$Survived)/nrow(train)), col = "white", lty = 2) +
ggtitle("Survival Rates by Passenger Class") +
theme_hc() +
scale_colour_hc()
In [23]:
ggplot(titanic, aes(x = log(Fare), fill = Survived)) +
geom_density(alpha = 0.4) +
ggtitle("Survival Rates by Fare (log)") +
theme_hc() +
scale_colour_hc()
Warning message:
“Removed 15 rows containing non-finite values (stat_density).”
In [24]:
titanic <- titanic %>%
mutate(LifeStage = factor(ifelse(Age < 18, "Child", ifelse(Age <= 60, "Adult", "OAP"))))
In [25]:
ggplot(titanic, aes(x = LifeStage, fill = Survived)) +
geom_bar(position = "fill") +
ylab("Survival Rate") +
geom_hline(yintercept = (sum(train$Survived)/nrow(train)), col = "white", lty = 2) +
ggtitle("Survival Rates by Life Stage")
In [26]:
titanic <- titanic %>%
mutate(Fare = ifelse(Fare == 0, 0.001, Fare)) %>%
mutate(LogFare = log(Fare))
In [27]:
train_data <- select(titanic, Survived, Pclass, Sex, LifeStage, LogFare)
In [28]:
train_glm <- glm(Survived ~ Pclass + Sex + LifeStage + LogFare,
family = binomial,
data = train_data)
In [29]:
summary(train_glm)
Call:
glm(formula = Survived ~ Pclass + Sex + LifeStage + LogFare,
family = binomial, data = train_data)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.6190 -0.6935 -0.4073 0.7228 2.2509
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.505726 0.519538 6.748 1.50e-11 ***
Pclass -1.146695 0.143291 -8.003 1.22e-15 ***
SexMale -2.510750 0.208371 -12.049 < 2e-16 ***
LifeStageChild 1.044437 0.274183 3.809 0.000139 ***
LifeStageOAP -1.045240 0.593733 -1.760 0.078331 .
LogFare -0.001359 0.082806 -0.016 0.986902
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 964.52 on 713 degrees of freedom
Residual deviance: 653.27 on 708 degrees of freedom
(177 observations deleted due to missingness)
AIC: 665.27
Number of Fisher Scoring iterations: 4
In [30]:
titanic_test <- test %>%
mutate(Sex = factor(Sex)) %>%
mutate(Sex = fct_recode(Sex, "Female" = "female", "Male" = "male")) %>%
mutate(LifeStage = factor(ifelse(Age < 18, "Child", ifelse(Age <= 60, "Adult", "OAP")))) %>%
mutate(Fare = ifelse(Fare == 0, 0.001, Fare)) %>%
mutate(LogFare = log(Fare))
In [31]:
test_data <- select(titanic_test, Pclass, Sex, LifeStage, LogFare)
In [32]:
p_hats <- predict.glm(train_glm, newdata = test_data, type = "response", na.action = na.pass)
In [33]:
survived_hat <- ifelse(is.na(p_hats) | p_hats <= 0.5, 0, 1)
In [34]:
glm_results <- tibble(PassengerID = test$PassengerId, Survived = survived_hat)
write_csv(glm_results, "results/glm.csv")
Can we do better?
In [35]:
cf_model <- cforest(Survived ~ Pclass + Sex + LifeStage + LogFare,
data = train_data,
controls = cforest_unbiased(ntree = 1000, mtry = 3))
In [36]:
table(predict(cf_model), train_data$Survived)
No Yes
No 513 113
Yes 36 229
In [37]:
varimp(cf_model)
- Pclass
- 0.0870336391437309
- Sex
- 0.197434250764526
- LifeStage
- 0.0191253822629969
- LogFare
- 0.0219755351681957
In [38]:
cf_predictions <- predict(cf_model, test_data, OOB=TRUE, type="response")
cf_predictions <- ifelse(cf_predictions == "No", 0, 1)
In [39]:
cf_results <- tibble(PassengerID = test$PassengerId, Survived = cf_predictions)
write_csv(cf_results, "results/crf.csv")
In [40]:
nn_model <- nnet(Survived ~ Pclass + Sex + LifeStage + LogFare,
data = train_data, size = 2,
linout = FALSE, maxit = 10000)
# weights: 15
initial value 501.127651
iter 10 value 355.317817
iter 20 value 321.836629
iter 30 value 315.214916
iter 40 value 314.476622
iter 50 value 308.777809
iter 60 value 301.235215
iter 70 value 301.034584
iter 80 value 300.285095
iter 90 value 298.957481
iter 100 value 294.629577
iter 110 value 294.595563
iter 120 value 294.580210
iter 130 value 294.558582
final value 294.556974
converged
In [41]:
nn_predictions <- predict(nn_model, test_data, type = "class")
nn_predictions <- ifelse(is.na(nn_predictions) | nn_predictions == "No", 0, 1)
In [42]:
nn_results <- tibble(PassengerID = test$PassengerId, Survived = nn_predictions)
write_csv(model, "results/nn.csv")
In [43]:
aggr(train, prop = FALSE, combined = FALSE, numbers = TRUE, sortVars = TRUE, sortCombs = TRUE)
Variables sorted by number of missings:
Variable Count
Cabin 687
Age 177
Embarked 2
PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
SibSp 0
Parch 0
Ticket 0
Fare 0
In [45]:
impute_data <- function(data, columns = colnames(table)) {
kNN(data, columns, weightDist = TRUE, 10)
}
In [46]:
train_imputed <- impute_data(train, c("Age", "Embarked"))
Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”
In [47]:
head(train_imputed, 20)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked Age_imp Embarked_imp
1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.2500 NA S FALSE FALSE
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0 PC 17599 71.2833 C85 C FALSE FALSE
3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.9250 NA S FALSE FALSE
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1000 C123 S FALSE FALSE
5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.0500 NA S FALSE FALSE
6 0 3 Moran, Mr. James male 21 0 0 330877 8.4583 NA Q TRUE FALSE
7 0 1 McCarthy, Mr. Timothy J male 54 0 0 17463 51.8625 E46 S FALSE FALSE
8 0 3 Palsson, Master. Gosta Leonard male 2 3 1 349909 21.0750 NA S FALSE FALSE
9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27 0 2 347742 11.1333 NA S FALSE FALSE
10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14 1 0 237736 30.0708 NA C FALSE FALSE
11 1 3 Sandstrom, Miss. Marguerite Rut female 4 1 1 PP 9549 16.7000 G6 S FALSE FALSE
12 1 1 Bonnell, Miss. Elizabeth female 58 0 0 113783 26.5500 C103 S FALSE FALSE
13 0 3 Saundercock, Mr. William Henry male 20 0 0 A/5. 2151 8.0500 NA S FALSE FALSE
14 0 3 Andersson, Mr. Anders Johan male 39 1 5 347082 31.2750 NA S FALSE FALSE
15 0 3 Vestrom, Miss. Hulda Amanda Adolfina female 14 0 0 350406 7.8542 NA S FALSE FALSE
16 1 2 Hewlett, Mrs. (Mary D Kingcome) female 55 0 0 248706 16.0000 NA S FALSE FALSE
17 0 3 Rice, Master. Eugene male 2 4 1 382652 29.1250 NA Q FALSE FALSE
18 1 2 Williams, Mr. Charles Eugene male 29 0 0 244373 13.0000 NA S TRUE FALSE
19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) female 31 1 0 345763 18.0000 NA S FALSE FALSE
20 1 3 Masselmani, Mrs. Fatima female 26 0 0 2649 7.2250 NA C TRUE FALSE
In [48]:
head(train, 20)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.2500 NA S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0 PC 17599 71.2833 C85 C
3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.9250 NA S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.1000 C123 S
5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.0500 NA S
6 0 3 Moran, Mr. James male NA 0 0 330877 8.4583 NA Q
7 0 1 McCarthy, Mr. Timothy J male 54 0 0 17463 51.8625 E46 S
8 0 3 Palsson, Master. Gosta Leonard male 2 3 1 349909 21.0750 NA S
9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27 0 2 347742 11.1333 NA S
10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14 1 0 237736 30.0708 NA C
11 1 3 Sandstrom, Miss. Marguerite Rut female 4 1 1 PP 9549 16.7000 G6 S
12 1 1 Bonnell, Miss. Elizabeth female 58 0 0 113783 26.5500 C103 S
13 0 3 Saundercock, Mr. William Henry male 20 0 0 A/5. 2151 8.0500 NA S
14 0 3 Andersson, Mr. Anders Johan male 39 1 5 347082 31.2750 NA S
15 0 3 Vestrom, Miss. Hulda Amanda Adolfina female 14 0 0 350406 7.8542 NA S
16 1 2 Hewlett, Mrs. (Mary D Kingcome) female 55 0 0 248706 16.0000 NA S
17 0 3 Rice, Master. Eugene male 2 4 1 382652 29.1250 NA Q
18 1 2 Williams, Mr. Charles Eugene male NA 0 0 244373 13.0000 NA S
19 0 3 Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele) female 31 1 0 345763 18.0000 NA S
20 1 3 Masselmani, Mrs. Fatima female NA 0 0 2649 7.2250 NA C
In [49]:
aggr(train_imputed, prop = FALSE, combined = FALSE, numbers = TRUE, sortVars = TRUE, sortCombs = TRUE)
Variables sorted by number of missings:
Variable Count
Cabin 687
PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 0
SibSp 0
Parch 0
Ticket 0
Fare 0
Embarked 0
Age_imp 0
Embarked_imp 0
In [50]:
transform_data <- function(data) {
data %>%
mutate(Sex = factor(Sex)) %>%
mutate(Sex = fct_recode(Sex, "Female" = "female", "Male" = "male")) %>%
mutate(LifeStage = factor(ifelse(Age < 18, "Child", ifelse(Age <= 60, "Adult", "OAP")))) %>%
mutate(Fare = ifelse(Fare == 0, 0.001, Fare)) %>%
mutate(LogFare = log(Fare))
}
In [51]:
train_imputed <- train_imputed %>%
mutate(Survived = factor(Survived)) %>%
mutate(Survived = fct_recode(Survived, "No" = "0", "Yes" = "1"))
train_data_imputed <-
transform_data(train_imputed) %>%
select(Survived, Pclass, Sex, LifeStage, LogFare)
In [52]:
test_data_imputed <-
transform_data(impute_data(test, c("Age", "Embarked"))) %>%
select(Pclass, Sex, LifeStage, LogFare)
Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in gowerD(don_dist_var, imp_dist_var, weights = weightsx, numericalX, :
“NAs introduced by coercion”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”Warning message in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]):
“the condition has length > 1 and only the first element will be used”
In [53]:
cf_model <- cforest(Survived ~ Pclass + Sex + LifeStage + LogFare,
data = train_data_imputed,
controls = cforest_unbiased(ntree = 1000, mtry = 3))
In [54]:
cf_predictions <- predict(cf_model, test_data_imputed, OOB=TRUE, type="response")
cf_predictions <- ifelse(cf_predictions == "No", 0, 1)
In [55]:
cf_results <- tibble(PassengerID = test$PassengerId, Survived = cf_predictions)
write_csv(cf_results, "results/crf_imputed.csv")
In [67]:
train_index <- createDataPartition(train_data_imputed$Survived, p=0.8, list=FALSE)
cv_train <- train_data_imputed[train_index, ]
cv_validation <- train_data_imputed[-train_index, ]
In [68]:
cat("total rows:", nrow(train_data_imputed), "train rows:", nrow(cv_train), "validation rows", nrow(cv_validation))
total rows: 891 train rows: 714 validation rows 177
In [69]:
cv_nn_model <- nnet(Survived ~ Pclass + Sex + LifeStage + LogFare,
data = cv_train, size = 2,
linout = FALSE, maxit = 10000)
# weights: 15
initial value 508.353244
iter 10 value 322.103304
iter 20 value 306.206454
iter 30 value 302.414138
iter 40 value 296.958258
iter 50 value 295.519315
iter 60 value 295.046188
iter 70 value 294.988406
iter 80 value 294.971424
iter 90 value 294.940541
final value 294.928958
converged
In [70]:
cv_predictions <- predict(cv_nn_model, cv_validation, type = "class")
In [78]:
caret::confusionMatrix(table(cv_predictions, cv_validation$Survived), positive='Yes')
Confusion Matrix and Statistics
cv_predictions No Yes
No 106 40
Yes 3 28
Accuracy : 0.7571
95% CI : (0.687, 0.8183)
No Information Rate : 0.6158
P-Value [Acc > NIR] : 4.823e-05
Kappa : 0.428
Mcnemar's Test P-Value : 4.021e-08
Sensitivity : 0.4118
Specificity : 0.9725
Pos Pred Value : 0.9032
Neg Pred Value : 0.7260
Prevalence : 0.3842
Detection Rate : 0.1582
Detection Prevalence : 0.1751
Balanced Accuracy : 0.6921
'Positive' Class : Yes
In [77]:
?caret::confusionMatrix
Content source: Data-Science-FMI/introduction-to-data-science-2017
Similar notebooks: