Load and transform dataset.

Install Bioconductor biocLite package in order to access the golubEsets library. golubEsets contains the raw data used by Todd Golub in the original paper.

Load the training, testing data from library golubEsets. Also transpose the data to make observations as rows.


In [1]:
## The code below is commented out since it is unnecessary and time-consuming to run it everytime. Run it if needed.
# options(repos='http://cran.rstudio.com/') 
# source("http://bioconductor.org/biocLite.R")
# biocLite("golubEsets")
suppressMessages(library(golubEsets))
#Training data
data(Golub_Train)
golub_train_p = t(exprs(Golub_Train))
golub_train_r =pData(Golub_Train)[, "ALL.AML"]
golub_train_l = ifelse(golub_train_r == "AML", 1, 0)

#Testing data
data(Golub_Test)
golub_test_p = t(exprs(Golub_Test))
golub_test_r = pData(Golub_Test)[, "ALL.AML"]
golub_test_l = ifelse(golub_test_r == "AML", 1, 0)

#Show summary
rbind(Train = dim(golub_train_p), Test = dim(golub_test_p))
cbind(Train = table(golub_train_r),Test = table(golub_test_r))


Train38 7129
Test34 7129
TrainTest
ALL2720
AML1114

In [2]:
save(golub_train_p, golub_train_r, golub_train_l, golub_test_p, golub_test_r, golub_test_l,  file = "DP.rda")