This file contains the code used for implementing the classifer in paper Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression data. This notebook could be pretty slow as a lot of function implemented are not time efficient enough in R (will try to optimize it if possible in the future).
Data Set: Merge the train and test data in the original paper and split the merged data to form train and test during learning.
- Merge Data: 72 samples(47 ALL, 25 AML, 62 bone marrow, 10 peripheral blood samples)
- Predictors: 7129 gene expression levels represent 6817 genes.
Main Purpose: FLDA, DLDA, DQDA, Decision Tree, Adaboost, Bagging(CPD), NN.
In [1]:
# The implementation below is very similar to the one in paper1 notehbook and details chould be checked there.
## The code below is commented out since it is unnecessary and time-consuming to run it everytime. Run it if needed.
#options(repos='http://cran.rstudio.com/')
#source("http://bioconductor.org/biocLite.R")
#biocLite("golubEsets")
#install.packages("tree")
#install.packages("fastAdaboost")
#install.packages("sparsediscrim", dependencies = T)
suppressMessages(library(sparsediscrim))
suppressMessages(library(tree))
suppressMessages(library(golubEsets))
suppressMessages(library(fastAdaboost))
# load data from golubEsets
data(Golub_Merge)
golub_merge_p = t(exprs(Golub_Merge))
golub_merge_r =pData(Golub_Merge)[, "ALL.AML"]
golub_merge_l = ifelse(golub_merge_r == "AML", 1, 0)
#show summary
dim(golub_merge_p)
table(golub_merge_r)
# Thresholding
golub_merge_pp = golub_merge_p
golub_merge_pp[golub_merge_pp<100] = 100
golub_merge_pp[golub_merge_pp>16000] = 16000
# Filtering
golub_filter = function(x, r = 5, d=500){
minval = min(x)
maxval = max(x)
(maxval/minval>r)&&(maxval-minval>d)
}
merge_index = apply(golub_merge_pp, 2, golub_filter)
golub_merge_index = (1:7129)[merge_index]
golub_merge_pp = golub_merge_pp[, golub_merge_index]
# Base 10 logarithmic transformation
golub_merge_p_trans = log10(golub_merge_pp)
#show summary again
dim(golub_merge_p_trans)
table(golub_merge_r)
total3571_predictor = golub_merge_p_trans
total3571_response = golub_merge_r
save(total3571_predictor, total3571_response, file = "../transformed data/golub3571.rda")
# Further standardization to mean 0 variance 1.
scale_golub_merge = scale(golub_merge_p_trans)
set.seed(201703)
As we see in the result above, the dataset is indeed reduce to $72 \times 3571$ and this is also the origin of the $72 \times 3571$ dataset available online.
In [2]:
# Settings as specified in the paper
p = 40 # number of genes for FLDA
B = 50 # Aggregation predictors
N = 200 # repeat classification N times
d = c(0.05, 0.1,0.25, 0.5, 0.75, 1) # CPD parameter
In [3]:
# Split train test as specified in the paper
mysplit = function(n){
sample(1:n, floor(n/3))
}
# implement function for calculating BW as stated in the paper(the ratio of between-group to within group sums of squares)
BW = function(predictor, response){
overall = colMeans(predictor)
ALL_mean = apply(predictor, 2, function(x) mean(x[response == "ALL"]))
AML_mean = apply(predictor, 2, function(x) mean(x[response == "AML"]))
numerator = sum(response == "ALL")*(ALL_mean-overall)^2+sum(response == "AML")*(AML_mean-overall)^2
denumerator = colSums((t(t(predictor[response == "ALL", ])-ALL_mean))^2)+colSums((t(t(predictor[response == "AML", ])-AML_mean))^2)
numerator/denumerator
}
# randomly feature select once for comparison for furthur study
id = mysplit(nrow(scale_golub_merge))
train_p = scale_golub_merge[-id,]
train_r = golub_merge_r[-id]
test_p = scale_golub_merge[id,]
test_r = golub_merge_r[id]
temp_bw = order(BW(train_p, train_r), decreasing = T)[1:50]
train_BW_predictor = train_p[,temp_bw]
test_BW_predictor = test_p[,temp_bw]
save(train_BW_predictor, train_r, test_BW_predictor, test_r, file = "../transformed data/paper9.rda")
As the distance measure defined in the paper are not general distance measure used by R knn function. We reimplement KNN in R and this cell could be a little bit slow.
In [4]:
k = seq(1, 21, 2)
# Distance measure used in the paper
Distance = function(predictor, test){
1- apply(predictor, 1, cor, test)
}
# NN classification process
nn = function(test, pk, learning, response){
distance = Distance(learning, test)
index = order(distance)[1:pk]
cl = ifelse(sum(response[index] == "AML")>sum(response[index]=="ALL"), "AML", "ALL")
cl
}
# leave-one-cross-validation to tune k
mycv= function(pk,learning,response){
error = 0
for(i in 1:nrow(learning)){
cl = nn(learning[i,], pk, learning[-i, ], response[-i])
error = error+(cl == response[i])
}
error
}
error_count = numeric(N)
for(i in 1:N){
# split data
nn_index = mysplit(nrow(scale_golub_merge))
nn_train_p = scale_golub_merge[-nn_index,]
nn_train_r = golub_merge_r[-nn_index]
nn_test_p = scale_golub_merge[nn_index,]
nn_test_r = golub_merge_r[nn_index]
# gene selection/feature selection
temp_bw = order(BW(nn_train_p, nn_train_r), decreasing = T)[1:p]
nn_train_p = nn_train_p[,temp_bw]
nn_test_p = nn_test_p[,temp_bw]
# cross-validation to choose k
choose_k = sapply(k,mycv, learning = nn_train_p, response = nn_train_r)
# nn classification
nn_r = apply(nn_test_p,1, nn, k[which.min(choose_k)], nn_train_p, nn_train_r)
error_count[i] = sum(nn_r != nn_test_r)
}
resultNN = c(Median = median(error_count), Upper_quartile = quantile(error_count, 0.75))
This part is accomplished using tree package in R as used in the paper. Since not much is mentioned in the paper about the set up of the decision tree, the default depth in tree are used. Also 10 fold cross-validation and prune.tree are used as specified in the paper.
In [5]:
cbine_data = data.frame(response = factor(golub_merge_r), scale_golub_merge)
tree_mdl = tree(response~.,data = cbine_data)
cv_error = replicate(N, ceiling(cv.tree(tree_mdl, , prune.tree, K = 10, method = "misclass")$dev[1]/10))
resultDT = c(Median = median(cv_error), Upper_quartile = quantile(cv_error, 0.75))
Implemented without using packages in R.
In [6]:
# initialize the result vector
bg_error = numeric(N)
# implement helper function for each bagging
my_baghelper = function(train, test){
bg = sample(nrow(train), replace = T)
temp_md = tree(response~., data = train[bg, ])
predict(temp_md, test, type = "class")
}
# perform N time classification(bootstrap)
for(i in 1:N){
bg_index = mysplit(nrow(cbine_data))
bg_train = cbine_data[-bg_index,]
bg_test = cbine_data[bg_index,]
# gene selection
temp_bw = order(BW(bg_train[, -1], bg_train$response), decreasing = T)[1:p]
bg_train_p = data.frame(response = bg_train$response, bg_train[,temp_bw+1])
bg_test_p= data.frame(response = bg_test$response, bg_test[,temp_bw+1])
t1 = replicate(B, my_baghelper(bg_train_p, bg_test_p))
pred = apply(t1, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
bg_error[i] = sum(pred != bg_test_p$response)
}
resultBag = c(Median = median(bg_error), Upper_quartile = quantile(bg_error, 0.75))
In [7]:
d = 0.75
# implement CPD
CPD = function(d, x1, x2){
a = runif(nrow(x1), 0, d)
a*x1+(1-a)*x2
}
# helper function for each bagging with CPD
my_cpdhelper = function(train, test){
id1 = sample(nrow(train), replace = T)
id2 = sample(nrow(train), replace = T)
temp = CPD(d, train[id1, -1], train[id2,-1])
temp_md = tree(response~., data = data.frame(temp, response = train$response[id1]))
predict(temp_md, test, type = "class")
}
#initialize the error vector
cpd_error = numeric(N)
# repeat N times
for(i in 1:N){
cpd_index = mysplit(nrow(cbine_data))
cpd_train = cbine_data[-cpd_index,]
cpd_test = cbine_data[cpd_index,]
# gene selection
temp_bw = order(BW(cpd_train[, -1], cpd_train$response), decreasing = T)[1:p]
cpd_train_t = data.frame(response = cpd_train$response, cpd_train[,temp_bw+1])
cpd_test_t= data.frame(response = cpd_test$response, cpd_test[,temp_bw+1])
t1 = replicate(B, my_cpdhelper(cpd_train_t, cpd_test_t))
pred = apply(t1, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
cpd_error[i] = sum(pred != cpd_test_t$response)
}
resultCPD = c(Median = median(cpd_error), Upper_quartile = quantile(cpd_error, 0.75))
Using adaboost function from fastAdaboost package in R.
In [8]:
ada_error = numeric(N)
for(i in 1:N){
ad_index = mysplit(nrow(cbine_data))
ad_train = cbine_data[-ad_index,]
ad_test = cbine_data[ad_index,]
# gene selection
temp_bw = order(BW(ad_train[, -1], ad_train$response), decreasing = T)[1:p]
ad_train_t = data.frame(response = ad_train$response, ad_train[,temp_bw+1])
ad_test_t= data.frame(response = ad_test$response, ad_test[,temp_bw+1])
ada_cl = adaboost(response~., data = ad_train_t, 50)
ada_test_pr = predict(ada_cl, ad_test_t)$class
ada_error[i] = sum(ada_test_pr != ad_test_t$response)
}
resultAda = c(Median = median(ada_error), Upper_quartile = quantile(ada_error, 0.75))
As the current LDA is very similar to the FLDA, lda function in MASS package is used to accomplish this part.
In [9]:
flda_error = numeric(N)
for(i in 1:200){
flda_index = mysplit(nrow(cbine_data))
flda_train = cbine_data[-flda_index,]
flda_test = cbine_data[flda_index,]
# gene selection
temp_bw = order(BW(flda_train[, -1], flda_train$response), decreasing = T)[1:p]
flda_train_t = data.frame(response = flda_train$response, flda_train[,temp_bw+1])
flda_test_t= data.frame(response = flda_test$response, flda_test[,temp_bw+1])
flda_md = MASS::lda(response~., data = flda_train_t)
flda_pred = predict(flda_md, flda_test_t)$class
flda_error[i] = sum(flda_pred != flda_test_t$response)
}
resultFLDA = c(Median = median(flda_error), Upper_quartile = quantile(flda_error, 0.75))
Accomplished using R dlda in sparsediscrim packages.
In [10]:
dlda_error = numeric(N)
for(i in 1:200){
dlda_index = mysplit(nrow(cbine_data))
dlda_train = cbine_data[-dlda_index,]
dlda_test = cbine_data[dlda_index,]
# gene selection
temp_bw = order(BW(dlda_train[, -1], dlda_train$response), decreasing = T)[1:p]
dlda_train_t = data.frame(response = dlda_train$response, dlda_train[,temp_bw+1])
dlda_test_t= data.frame(response = dlda_test$response, dlda_test[,temp_bw+1])
dlda_md = dlda(response~., data = dlda_train_t)
dlda_pred = predict(dlda_md, dlda_test_t[, -1])$class
dlda_error[i] = sum(dlda_pred != dlda_test_t$response)
}
resultDLDA = c(Median = median(dlda_error), Upper_quartile = quantile(dlda_error, 0.75))
In [11]:
dqda_error = numeric(N)
for(i in 1:200){
dqda_index = mysplit(nrow(cbine_data))
dqda_train = cbine_data[-dqda_index,]
dqda_test = cbine_data[dqda_index,]
# gene selection
temp_bw = order(BW(dqda_train[, -1], dqda_train$response), decreasing = T)[1:p]
dqda_train_t = data.frame(response = dqda_train$response, dqda_train[,temp_bw+1])
dqda_test_t= data.frame(response = dqda_test$response, dqda_test[,temp_bw+1])
dqda_md = dlda(response~., data = dqda_train_t)
dqda_pred = predict(dqda_md, dqda_test_t[,-1])$class
dqda_error[i] = sum(dqda_pred != dqda_test_t$response)
}
resultDQDA = c(Median = median(dqda_error), Upper_quartile = quantile(dqda_error, 0.75))
In this notebook, we reproduce paper 9 and have several algorithms reproduce, such as FLDA, DLDA, DQDA etc. We made some implicit assumptions where details are not specified in the paper and when using existing packages in R. However, most of our work follows procedures exactly as described in the paper.
Prediction Comparison: Summary of our reproduce result is shown below. We also include the result in the paper below our result.
In [12]:
rbind(FLDA = resultFLDA, DLDA = resultDLDA, DQDA = resultDQDA, DT = resultDT, Bag = resultBag ,Boost = resultAda, CPD = resultCPD, NN = resultNN)
Classifier | Median quartile | Upper quartile |
---|---|---|
FLDA | 3 | 4 |
DLDA | 0 | 1 |
DQDA | 1 | 2 |
DT | 3 | 4 |
Bag | 2 | 2 |
Boost | 1 | 2 |
CPD | 1 | 2 |
NN | 1 | 1 |
As we can see from the above two tables, we basically have the same result as presented in the paper. Although our result is a slightly worse.
In [5]:
# golub_test_50 golub_test_response golub_train_50 golub_train_response loaded
load("../transformed data/golub50gene.rda")
# cross-validation to choose k
k = seq(1, 21, 2)
choose_k_p1 = sapply(k,mycv, learning = golub_train_50, response= golub_train_response)
# nn classification
nn_train_p1 = apply(golub_train_50 ,1, nn, k[which.min(choose_k_p1)], golub_train_50,golub_train_response)
nn_test_p1 = apply(golub_test_50 ,1, nn, k[which.min(choose_k_p1)], golub_train_50,golub_train_response)
table(Train_Predict = nn_train_p1, Train_Actual = golub_train_response)
table(Test_Predict = nn_test_p1, Test_Actual = golub_test_response)
In [6]:
# test_paper3 test_response train_paper3 train_response loaded
load("../transformed data/paper3.rda")
# cross-validation to choose k
choose_k_p3 = sapply(k,mycv, learning = train_paper3, response= train_response)
# nn classification
nn_train_p3 = apply(train_paper3 ,1, nn, k[which.min(choose_k_p3)], train_paper3, train_response)
nn_test_p3 = apply(test_paper3 ,1, nn, k[which.min(choose_k_p3)], train_paper3, train_response)
table(Train_Predict = nn_train_p3, Train_Actual = train_response)
table(Test_Predict = nn_test_p3, Test_Actual = test_response)
In [37]:
# pca_train, pca_test, pls_train, pls_test loaded
load("../transformed data/paper6.rda")
## PCA
# cross-validation to choose k
pca_tr = as.matrix(pca_train[, -1])
pca_te = as.matrix(pca_test[, -1])
choose_k_p6_pca = sapply(k,mycv, learning = pca_tr, response= pca_train$response)
# nn classification
nn_train_p6 = apply(pca_tr ,1, nn, k[which.min(choose_k_p6_pca)], pca_tr, pca_train$response)
nn_test_p6 = apply(pca_te ,1, nn, k[which.min(choose_k_p6_pca)], pca_tr, pca_train$response)
table(Train_Predict = nn_train_p6, Train_Actual = pca_train$response)
table(Test_Predict = nn_test_p6, Test_Actual = pca_test$response)
In [39]:
## PLS
# cross-validation to choose k
pls_tr = as.matrix(pls_train[, -1])
pls_te = as.matrix(pls_test[, -1])
choose_k_p6_pls = sapply(k,mycv, learning = pls_tr, response= pls_train$response)
# nn classification
nn_train_p6_pls = apply(pls_tr ,1, nn, k[which.min(choose_k_p6_pls)], pls_tr, pls_train$response)
nn_test_p6_pls = apply(pls_te ,1, nn, k[which.min(choose_k_p6_pls)], pls_tr, pls_train$response)
table(Train_Predict = nn_train_p6_pls, Train_Actual = pls_train$response)
table(Test_Predict = nn_test_p6_pls, Test_Actual = pls_test$response)
In [40]:
# test_kmeans train_kmeans golub_test_r golub_train_r loaded
load("../transformed data/paper29.rda")
# cross-validation to choose k
choose_k_p29 = sapply(k,mycv, learning = train_kmeans, response= golub_train_r)
# nn classification
nn_train_p29 = apply(train_kmeans ,1, nn, k[which.min(choose_k_p29)], train_kmeans, golub_train_r)
nn_test_p29 = apply(test_kmeans ,1, nn, k[which.min(choose_k_p29)], train_kmeans, golub_train_r)
table(Train_Predict = nn_train_p29, Train_Actual = golub_train_r)
table(Test_Predict = nn_test_p29, Test_Actual = golub_test_r)
In [53]:
# golub_test_50 golub_test_response golub_train_50 golub_train_response loaded
cbine_data_p1 = data.frame(response = factor(golub_train_response), golub_train_50)
tree_mdl_p1 = tree(response~.,data = cbine_data_p1)
tree_tr_p1 = predict(tree_mdl_p1, data.frame(golub_train_50), type = "class")
tree_te_p1 = predict(tree_mdl_p1, data.frame(golub_test_50), type = "class")
table(Train_predict = tree_tr_p1, Train_Actual = golub_train_response)
table(Test_predict = tree_te_p1, Test_Actual = golub_test_response)
In [54]:
# test_paper3 test_response train_paper3 train_response loaded
cbine_data_p3 = data.frame(response = factor(train_response), train_paper3)
tree_mdl_p3 = tree(response~.,data = cbine_data_p3)
tree_tr_p3 = predict(tree_mdl_p3, data.frame(train_paper3), type = "class")
tree_te_p3 = predict(tree_mdl_p3, data.frame(test_paper3), type = "class")
table(Train_predict = tree_tr_p3, Train_Actual = train_response)
table(Test_predict = tree_te_p3, Test_Actual = test_response)
In [56]:
# pca_train, pca_test
cbine_data_pca = data.frame(response = factor(pca_train$response), pca_train[,-1])
tree_mdl_pca = tree(response~.,data = cbine_data_pca)
tree_tr_pca = predict(tree_mdl_pca, pca_train[,-1], type = "class")
tree_te_pca = predict(tree_mdl_pca, pca_test[,-1], type = "class")
table(Train_predict = tree_tr_pca, Train_Actual = pca_train[, 1])
table(Test_predict = tree_te_pca, Test_Actual = pca_test[, 1])
In [59]:
# pls_train, pls_test loaded
cbine_data_pls = data.frame(response = factor(pls_train$response), pls_train[,-1])
tree_mdl_pls = tree(response~.,data = cbine_data_pls)
tree_tr_pls = predict(tree_mdl_pls, pls_train[,-1], type = "class")
tree_te_pls = predict(tree_mdl_pls, pls_test[,-1], type = "class")
table(Train_predict = tree_tr_pls, Train_Actual = pls_train[, 1])
table(Test_predict = tree_te_pls, Test_Actual = pls_test[, 1])
In [61]:
# test_kmeans train_kmeans golub_test_r golub_train_r loaded
cbine_data_p29 = data.frame(response = factor(golub_train_r), train_kmeans)
tree_mdl_p29 = tree(response~.,data = cbine_data_p29)
tree_tr_p29 = predict(tree_mdl_p29, data.frame(train_kmeans), type = "class")
tree_te_p29 = predict(tree_mdl_p29, data.frame(test_kmeans), type = "class")
table(Train_predict = tree_tr_p29, Train_Actual = golub_train_r)
table(Test_predict = tree_te_p29, Test_Actual = golub_test_r)
In [68]:
t_tr_p1 = replicate(B, my_baghelper(cbine_data_p1, data.frame(golub_train_50)))
pred_tr_p1 = apply(t_tr_p1, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
t_te_p1 = replicate(B, my_baghelper(cbine_data_p1, data.frame(golub_test_50)))
pred_te_p1 = apply(t_te_p1, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
table(Train_predict = pred_tr_p1, Train_Actual = golub_train_response)
table(Test_predict = pred_te_p1, Test_Actual = golub_test_response)
In [70]:
t_tr_p3 = replicate(B, my_baghelper(cbine_data_p3, data.frame(train_paper3)))
pred_tr_p3 = apply(t_tr_p3, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
t_te_p3 = replicate(B, my_baghelper(cbine_data_p3, data.frame(test_paper3)))
pred_te_p3 = apply(t_te_p3, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
table(Train_predict = pred_tr_p3, Train_Actual = train_response)
table(Test_predict = pred_te_p3, Test_Actual = test_response)
In [75]:
# pca
t_tr_p6 = replicate(B, my_baghelper(pca_train, pca_train))
pred_tr_p6 = apply(t_tr_p6, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
t_te_p6 = replicate(B, my_baghelper(pca_train, pca_test))
pred_te_p6 = apply(t_te_p6, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
table(Train_predict = pred_tr_p6, Train_Actual =pca_train[,1])
table(Test_predict = pred_te_p6, Test_Actual = pca_test[, 1])
In [76]:
# pls
t_tr_p6_pls = replicate(B, my_baghelper(pls_train, pls_train))
pred_tr_p6_pls = apply(t_tr_p6_pls, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
t_te_p6_pls = replicate(B, my_baghelper(pls_train, pls_test))
pred_te_p6_pls = apply(t_te_p6_pls, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
table(Train_predict = pred_tr_p6_pls, Train_Actual =pls_train[,1])
table(Test_predict = pred_te_p6_pls, Test_Actual = pls_test[, 1])
In [79]:
t_tr_p29 = replicate(B, my_baghelper(cbine_data_p29, data.frame(train_kmeans)))
pred_tr_p29 = apply(t_tr_p29, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
t_te_p29 = replicate(B, my_baghelper(cbine_data_p29,data.frame(test_kmeans)))
pred_te_p29 = apply(t_te_p29, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
table(Train_predict = pred_tr_p29, Train_Actual = golub_train_response)
table(Test_predict = pred_te_p29, Test_Actual = golub_test_response)
In [81]:
c_tr_p1 = replicate(B, my_cpdhelper(cbine_data_p1, data.frame(golub_train_50)))
c_pred_tr_p1 = apply(c_tr_p1, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
c_te_p1 = replicate(B, my_cpdhelper(cbine_data_p1, data.frame(golub_test_50)))
c_pred_te_p1 = apply(c_te_p1, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
table(Train_predict = c_pred_tr_p1, Train_Actual = golub_train_response)
table(Test_predict = c_pred_te_p1, Test_Actual = golub_test_response)
In [83]:
c_tr_p3 = replicate(B, my_cpdhelper(cbine_data_p3, data.frame(train_paper3)))
c_pred_tr_p3 = apply(c_tr_p3, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
c_te_p3 = replicate(B, my_cpdhelper(cbine_data_p3, data.frame(test_paper3)))
c_pred_te_p3 = apply(c_te_p3, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
table(Train_predict = c_pred_tr_p3, Train_Actual = train_response)
table(Test_predict = c_pred_te_p3, Test_Actual = test_response)
In [86]:
# pca
c_tr_p6 = replicate(B, my_cpdhelper(pca_train, pca_train))
c_pred_tr_p6 = apply(c_tr_p6, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
c_te_p6 = replicate(B, my_cpdhelper(pca_train, pca_test))
c_pred_te_p6 = apply(c_te_p6, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
table(Train_predict = c_pred_tr_p6, Train_Actual =pca_train[,1])
table(Test_predict = c_pred_te_p6, Test_Actual = pca_test[, 1])
In [87]:
# pls
c_tr_p6_pls = replicate(B, my_cpdhelper(pls_train, pls_train))
c_pred_tr_p6_pls = apply(c_tr_p6_pls, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
c_te_p6_pls = replicate(B, my_cpdhelper(pls_train, pls_test))
c_pred_te_p6_pls = apply(c_te_p6_pls, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
table(Train_predict = c_pred_tr_p6_pls, Train_Actual =pls_train[,1])
table(Test_predict = c_pred_te_p6_pls, Test_Actual = pls_test[, 1])
In [89]:
c_tr_p29 = replicate(B, my_cpdhelper(cbine_data_p29, data.frame(train_kmeans)))
c_pred_tr_p29 = apply(c_tr_p29, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
c_te_p29 = replicate(B, my_cpdhelper(cbine_data_p29,data.frame(test_kmeans)))
c_pred_te_p29 = apply(c_te_p29, 1, function(x) ifelse(sum(x == "AML")>sum(x =="ALL"), "AML", "ALL"))
table(Train_predict = c_pred_tr_p29, Train_Actual = golub_train_response)
table(Test_predict = c_pred_te_p29, Test_Actual = golub_test_response)
In [91]:
ada_cl_p1 = adaboost(response~., data = cbine_data_p1, 50)
ada_train_pr_p1 = predict(ada_cl_p1, data.frame(golub_train_50))$class
ada_test_pr_p1 = predict(ada_cl_p1, data.frame(golub_test_50))$class
table(Train_predict = ada_train_pr_p1, Train_Actual = golub_train_response)
table(Test_predict = ada_test_pr_p1, Test_Actual = golub_test_response)
In [93]:
ada_cl_p3 = adaboost(response~., data = cbine_data_p3, 50)
ada_train_pr_p3 = predict(ada_cl_p3, data.frame(train_paper3))$class
ada_test_pr_p3 = predict(ada_cl_p3, data.frame(test_paper3))$class
table(Train_predict =ada_train_pr_p3, Train_Actual = train_response)
table(Test_predict = ada_test_pr_p3, Test_Actual = test_response)
In [94]:
# pca
ada_cl_p6 = adaboost(response~., data = cbine_data_pca, 50)
ada_train_pr_p6 = predict(ada_cl_p6, pca_train)$class
ada_test_pr_p6 = predict(ada_cl_p6, pca_test)$class
table(Train_predict = ada_train_pr_p6, Train_Actual =pca_train[,1])
table(Test_predict =ada_test_pr_p6, Test_Actual = pca_test[, 1])
In [95]:
# pls
ada_cl_p6_pls = adaboost(response~., data = cbine_data_pls, 50)
ada_train_pr_p6_pls = predict(ada_cl_p6_pls, pls_train)$class
ada_test_pr_p6_pls = predict(ada_cl_p6_pls, pls_test)$class
table(Train_predict = ada_train_pr_p6_pls, Train_Actual =pls_train[,1])
table(Test_predict =ada_test_pr_p6_pls, Test_Actual = pls_test[, 1])
In [97]:
ada_cl_p29 = adaboost(response~., data = cbine_data_p29, 50)
ada_train_pr_p29 = predict(ada_cl_p29, data.frame(train_kmeans))$class
ada_test_pr_p29 = predict(ada_cl_p29, data.frame(test_kmeans))$class
table(Train_predict = c_pred_tr_p29, Train_Actual = golub_train_response)
table(Test_predict = c_pred_te_p29, Test_Actual = golub_test_response)
In [99]:
flda_md_p1 = MASS::lda(response~., data = cbine_data_p1)
flda_tr_p1 = predict(flda_md_p1, data.frame(golub_train_50))$class
flda_te_p1 = predict(flda_md_p1, data.frame(golub_test_50))$class
table(Train_predict = flda_tr_p1, Train_Actual = golub_train_response)
table(Test_predict = flda_te_p1, Test_Actual = golub_test_response)
In [100]:
flda_md_p3 = MASS::lda(response~., data = cbine_data_p3)
flda_tr_p3 = predict(flda_md_p3, data.frame(train_paper3))$class
flda_te_p3 = predict(flda_md_p3, data.frame(test_paper3))$class
table(Train_predict = flda_tr_p3, Train_Actual = train_response)
table(Test_predict = flda_te_p3, Test_Actual = test_response)
In [101]:
# pca
flda_md_p6 = MASS::lda(response~., data = cbine_data_pca)
flda_tr_p6 = predict(flda_md_p6, pca_train)$class
flda_te_p6 = predict(flda_md_p6, pca_test)$class
table(Train_predict = flda_tr_p6, Train_Actual =pca_train[,1])
table(Test_predict = flda_te_p6, Test_Actual = pca_test[, 1])
In [102]:
# pls
flda_md_p6_pls = MASS::lda(response~., data = cbine_data_pls)
flda_tr_p6_pls = predict(flda_md_p6, pls_train)$class
flda_te_p6_pls = predict(flda_md_p6, pls_test)$class
table(Train_predict = flda_tr_p6_pls, Train_Actual =pls_train[,1])
table(Test_predict = flda_te_p6_pls, Test_Actual = pls_test[, 1])
In [104]:
flda_md_p29 = MASS::lda(response~., data = cbine_data_p29)
flda_tr_p29 = predict(flda_md_p29, data.frame(train_kmeans))$class
flda_te_p29 = predict(flda_md_p29, data.frame(test_kmeans))$class
table(Train_predict = flda_tr_p29, Train_Actual = golub_train_response)
table(Test_predict = flda_te_p29, Test_Actual = golub_test_response)