This notebook presents the ML pipelines in the 30 articles we found that build a classifier for the ALL/AML data, and includes flowcharts the ML pipelines for the 5 articles we studied in the AIM manuscript.

Article ML Pipeline: Input Dataset Characteristics, Preprocessing and Feature Section Description, Classifier Used, Validation Employed

Paper ID	Dataset Description	Feature Selection	Classifier	Validation/Test
1	$72(Train\ 38, Test\ 34)\times 6817$ $(47 ALL, 25 AML)$	Informative genes	Golub classifier: Weighted vote + Prediction Strength(difference between votes)	1-fold cross-validation on initial data and on independent test data
2	$72(Train\ 38, Test\ 34)\times 6817$ $(47 ALL,25 AML)$	Relative class separation(same as Informative genes)	Golub classifier: Weighted vote + Prediction Strength(difference between votes)	1-fold cross-validation on initial data and on independent test data
3	$72\times 7129$ $(47 ALL,25 AML)$	the Threshold Number of Misclassification(TNoM)	Nearest Neighbor SVM(linear kernel, quadratic kernel) Boosting (100, 1000, 10000 iteration)	LOOCV on whole set
4	$72(Train\ 38, Test\ 34)\times 7129$ $(47 ALL,25 AML)$	Same as Golub. Paper	linear kernel SVM(all features, top 25, 250, 500, 1000 features), Perceptron(lack of details)	Tune by CV on Train set, Test on set, also have reorganize data set to explore more possibilities
5	$72 \times 7070$ $(47 ALL,25 AML)$	MVR(median vote relevance),NBGR(naive bayes global relevance), MAR(Golub paper relevance)(ranked use 72 obs)	SVM(linear kernel, radial kernel)	LOOCV, Tr/Ts:38/34, Tr/Ts: 34/38
6	$72(Train\ 38, Test\ 34)\times 6817$ $(47 ALL,25 AML)$	Dimension Reduction: PCA, PLS(Partial Least Square)(from p = 50)	logistic and quadratic discrimination analysis	LOOCV on Train, Test on test set; rerandomization with equal split 36/36
7	$72(Train\ 38, Test\ 34)\times 7129$ $(47 ALL,25 AML)$	Recursive Feature Elimination	SVM(linear kernel)	LOOCV/Test: success rate (at zero rejection),acceptance rate (at zero error) with various number of features
8	$72(Train\ 38, Test\ 34)\times 6817$ $(47 ALL,25 AML)$	Almost same as 6	Almost same as 6	Almost same as 6
9	$72\times 6817$ $(47 ALL, 25 AML)$->$72\times 3571$	BW: ratio of their between-group to withingroup sums of squares	Linear and quadratic discriminant analysis(FLDA, DLDA, DQDA),Golub classification, Classification trees(CV, Bag, Boosted, Boosted with CPD), Nearest neighbors	2:1 Train-test random split
10	$72(Train\ 38, Test\ 34)\times 7129$ $(47 ALL, 25 AML)$	(Train\ 38, Test\ 34)	single C4.5(decision tree), bagged(C4.5), AdaBoost(C4.5)(encapsuled in WEKA)	Test Accuracy
11	$72(Train\ 38, Test\ 34)\times 7129$ $(47 ALL, 25 AML)$	adaptive effective dimension reduction approach (MAVE) of Xia et al. (2002)	MAVE-LD, DLDA, DQDA, MAVE-NPLD	LOOCV/Test accuracy with 50, 100, 200 genes
12	$72\times 7129$ $(47 ALL, 25 AML)$	classifier feedback approach+Disjoint PCA	Soft Independent Modeling of Class Analogy (SIMCA) classification	Test Accuracy
13	$72\times 7129$ $(47 ALL, 25 AML)$	This is a clustering paper.
14	$72(Train\ 38, Test\ 34)\times 7129$ $(47 ALL, 25 AML)$->$72\times 3571$	nonparametric scoring method	LogitBoost, AdaBoost, Nearest Neighbor, Classification Tree	LOOCV Tune on Train and test on Test
15	$72(Train\ 38, Test\ 34)\times 7129$ $(47 ALL, 25 AML)$	ratio of between classes sum of squares to within class sum of squares for each gene	MSVM(linear and gaussian kernel)	Misclassifcation on Test
16	$72(Train\ 38, Test\ 34)\times 7129$ $(47 ALL, 25 AML)$	ideal feature construction	MAximal MArgin Linear Programming（MAMA)	LOOCV on Train/Test: Misclassifcation
17	$72(Train\ 38, Test\ 34)\times 7129$ $(47 ALL, 25 AML)$	univariate ranking (UR),recursive feature elimination (RFE)	Penalized Logistic Regression, SVM	10-CV on Train/Test: error
18	$72\times 7129$ $(47 ALL, 25 AML)$	information gain, twoing rule, sum minority, max minority, Gini index, sum of variances, one-dimensional SVM and t -statistics.	SVM, KNN, Naive Bayes, J4.8 DT	Test Accuracy
19	$72\times 5327$ $(47 ALL, 25 AML)$	(1) ratio of genes between-categories to within-category sums of squares(BW) (Dudoit et al., 2002); (2–3) signal-to-noise (S2N) scores (Golub et al.,1999) applied in a OVR (S2N-OVR) and in OVO (S2N-OVO) fashion; and (4) Kruskal–Wallis non-parametric one-way ANOVA (KW) (Jones, 1997)(5)no feature selection	MC-SVM, Neural Network, KNN, PNN	10 folds-CV accuracy
20	$72(Train\ 38, Test\ 34)\times 3571$ $(47 ALL, 25 AML)$	BSS=WSS criterion (Dudoit et al., 2002), Wilcoxon rank-based statistics and soft-thresholding method (Tibshirani et al., 2002)	Fisher’s linear discriminant analysis (FLDA),Diagonal linear and quadratic discriminant analysis (DLDA, DQDA),Logistic regression (LOGISTIC),Generalized partial least squares (GPLS),k nearest neighbor (kNN),CART and aggregating classi%ers (BAG, BOOST, LogitBOOST,RandomForest),Single & multi layer neural network (NN-1, NN-3),Support vector machine (SVM-linear, radial),Flexible discriminant analysis (FDA-POL, FDA-MARS),Penalized discriminant analysis (PDA),Mixture discriminant analysis (MDA-Linear, MDA-MARS),Shrunken centroids method (or Predictive Analysis of Microarrays (PAM))	Mean testset error
21	$72(Train\ 38, Test\ 34)\times 7129$ $(47 ALL, 25 AML)$	No explicit feature selection	TSP(Top scoring pairs),C4.5 decision trees (DT), Naïve Bayes (NB), k-nearest neighbor (k-NN), Support Vector Machines (SVM) and prediction analysis of microarrays (PAM)	LOOCV accuracy on Train, test accuracy reported
22	$38\times 3051$ $(27 ALL, 11 AML)$	CV, F-ratio	without variable selection: random forest, Diagonal Linear Discriminant Analysis (DLDA), K nearest neighbor (KNN), Support Vector Machines (SVM) with linear kernel; with variable selection: Shrunken centroids(SC),SC.l and SC.s,Nearest neighbor	bootstrap leave out error
23	$72(Train\ 38, Test\ 34)\times 7129$ $(47 ALL, 25 AML)$	RMIMR(Rough Maximum Interaction-Maximum Relevance)	SVM and Naive-Bayes	LOOCV error rate
24	$72\times 7129$ $(47 ALL, 25 AML)$	Independent component analysis(ICA)	SVM, PCA+FDA, P-RR,P-PCR,P-ICR, PAM	Train/Test accuracy
25	$38\times 3051$ $(27 ALL, 11 AML)$	Family-Wise-Error rate, BBF (Based Bayes error Filter)	KNN, SVM	LOOCV
26	$72(Train\ 38, Test\ 34)\times 3051$ $(47 ALL, 25 AML)$	the q most significant genes with gene expression difference between arrays with y=1 and arrays with y=0 for each of the p genes	parametric bootstrap model	boostrap mean prediction error
27	$62\times 7129$	Stepwise regression-based feature selection,ICA-based feature transformation	Naive Bayes	Hold out accuracy
28	$72\times 7129$ $(47 ALL, 25 AML)$	Information gain attribute evaluator, relief attribute evaluator and correlation-based feature selection (CFS) methods	mixed integer linear programming based hyper-box enclosure (HBE) approach	Test/LOOCV/10-CV Accuracy rate reported
29	$72(Train\ 38, Test\ 34)\times 7129$ $(47 ALL, 25 AML)$	Signal-to-Noise Ratio, Kmeans clustering	Bayes Network	Classification accuracy reported
30	$34\times 7129$ $(20 ALL, 14 AML)$	This is to select genes	Kmeans Clustering	Accuracy, Specificity, Sensitivity

The Basis for Excluding Articles from the Replication Study (we studied papers 1, 3, 6, 9, and 29)

1. Algorithm repeated in a different article (papers 2, 4, 5, 7, 8, 17, 20)
2. Encapsulated in software (papers 4, 5, 10, 11, 18, 19, 21, 26)
3. Unfamiliar or impenetrable methods (papers 11, 12, 14, 16, 22, 23, 26, 28)
4. Clustering was carried out rather than classification (paper 13)
5. The golub data was used for a multiclass rather than a binary problem (papers 15, 18)
6. Started from a processed version of the Golub data (without documentation) e.g. not using the entire dataset or not clearly identifying the input dataset (papers 22, 25, 27, 30)
7. Parameters or settings not clearly specified in the paper, rending replication difficult (papers 28 and 30)

The following section shows flow charts that represent each of the ML pipelines in the 5 articles chosen for analysis

Paper 1: "Molecular classification of cancer: class discovery and class prediction by gene expression monitoring"

Data:
- Train: 38*7129 (27ALL/11AML)
- Test: 34*7129 (20ALL/ 14 AML)
Preprocessing:
- Thresholding
- Filtering
- Transformation
- Normalization
Feature Selection:
- Neighborhood analysis
- Informative genes selection
Steps:
- Preprocessing Train/Test data using Train data statistics
- Feature selection using Train data
- Perform LOOCV on Train data for each LOOCV build various model and get LOOCV accuracy
- Predict on Test data using gene selected and model built on Train data and get Test accuracy

Paper 3: "Tissue Classification with Gene Expression Profiles"

Data:
- Merge: 72*7129 (27ALL/11AML)
Preprocessing:
- Unknown preprocessing and we use Golub preprocessing instead.
Feature Selection:
- TNoM score
- Significance: Monte Carlo permutation test
Steps:
- Perform LOOCV from the start using merged Golub. data
- Preprocessing and feature selection on the hold 71 samples
- Train on 71 hold samples and predict on leave out sample and get LOOCV accuracy

Paper 6: "Classification of Acute Leukemia Based on DNA Microarray Gene Expressions Using Partial Least Squares"

Data:
- Train: 38*7129 (27ALL/11AML)
- Test: 34*7129 (20ALL/ 14 AML)
Preprocessing:
- Thresholding Algorithm for rescaling, however, we use Golub. one.
Feature Selection:
- Same 50 genes as in Golub. Paper
Dimension Reduction:
- PCA
- PLS
Steps
- Select the same gene as in Golub. Paper
- Perform dimension reduction on the selected gene
- LOOCV on Train data: build LD and QDA model and get the LOOCV accuracy
- Dimension reduction of Test data using loadings of Train, predict by model built on Train data and get the Test Accuracy

Paper 9: "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data"

Data:
- Merge: 72*7129 (42ALL/25AML)
Preprocessing:
- Thresholding
- Filtering
- Transformation
- Normalization
Feature Selection:
- BW
Steps:
- Preprocess on the whole merged data
- Repeat below steps 200 times:
  - Split the merged data into Train/Test with ratio 2:1
  - Feature selection on and train model on Train data
  - Test on Test data using features selected and model built in the previous step
  - Get Ave Test accuracy

Paper 29: "Acute Leukemia Classification using Bayesian Networks"

Data:
- Train: 38*7129 (27ALL/11AML)
- Test: 34*7129 (20ALL/ 14 AML)
Preprocessing:
- Thresholding
- Filtering
- Transformation
- Normalization
Feature Selection:
- SNR
- Kmeans Clustering
Steps:
- Same preprocessing and feature selection on Train data as Golub. Paper
- Build Bayesian Networks on Train data
- Predict using Networks built on Test data
- Get test accuracy



In [ ]: