Version 2
- Split Stratified K-fold folds
- Define sets of model parameters values to evaluate
- For each k-fold resampling iteration DO
- For each parameter set in grid search DO
- Hold-out 1/k samples/fold
- Pre-Process Data (Create functions on training set, apply to test set with same)
- Impute data (median)
- Scale features (x_i - mean))/std
- Perform any univariate feature selection (remove very low variation features)
- Modeling feature selection (ExtraTreesClassifier)
- Fit the model on the k/K training fold
- Predict the hold-out samples/fold
- END
- Calculate the average performance across hold-out predictions
- END
- Determine the optimal parameter set from all K-folds
- Fit the final model to all training data using the optimal parameter set