Chapter 05: Resample Methods

1 Cross-Validation

A method that estimates the test error rate by holding out a subset of the training observations from the fitting process, and then applying the statistical learning method to those held out observations.

1.1 Leave-One-Out Corss-Validation

LOOCV involves spliting the set of observations into parts. A single observation $(x_1,y_1)$ is used for the validation set, ant the remaining observation $\{(x_2,y_2),\ldots,(x_n,y_n)\}$ form the train set. So the $MSE_1=(y_1-\hat{y_1})^2$ provides the approximately unbaised for the test error. Then choose $(x_2, y_2)$ to calculate the $MSE_2$. $$ CV_{(n)}=\frac{1}{n}\sum_{i=1}^{n}\bigg(\frac{y_i-\hat{y_i}}{1-h_i}\bigg)^2 $$ where $h_i$ is the leverage of $i$th observation.

1.2 $k$-fold Cross-Validation

This approach invovles randomly dividiing the set of observations into $k$ folds, of approximately equal size. Every training round, takeone fold as validation set and the remaining $k-1$ folds are treated as the training datasets. Each round involves the mean of the squared error $MSE_i$ that is computed on the observations in the held-out fold. The $k$-fold CV estimate is computed by the averaging these values: $$ CV_{(k)}=\frac{1}{k}\sum_{i=1}^k\text{MSE}_i $$ One typically performs $k$-fold CV using $k=5$ or $k=10$.

1.3 Bias-vrainces Trade-off for k-flod cross-validation

As $k$-fold CV each training set contains $(k-1)n/k$ observation, much fewer than the LOOCV approach, it is clear that LOOCV is preferre to $k$-fold CV from the persperctive of bias reduction. WorthWhile, the $k$-fold CV has the lower variances.

1.4 CV in the classification problems

We instead use the number of misclassified observations, rather than using $\text{MSE}$ to quantify test error. $$ CV_{(n)}=\frac{1}{n}\sum_{i=1}^{n}Err_i $$ where $Err_i=I(y_i\ne \hat{y_i})$

2 The Bootstrap

An extremely powerful statistical tool that can be used to quantify the uncertainty associated with a given estimator or statistical learning method.
When we want to estimate the confidence interval of parameters in statistical learning method, we cannot generate new samples from the original population. We randomly select $n$ observations from the data set to make the boostrap data set. Moreover, the sampleing is performed by replacement, which means that the same observation can occur more than once in the bootstrap data set.
Suppose that $Z$ is the population and $Z^{*i}$ is the $i$th sampled data set. The sampling action is repeated $B$ times for some large value of $B$, in order to proceduce $B$ different boostrap data sets, $Z^{*1}, \ldots,Z^{*B}$,corresponding $\alpha$ estimates, $\hat{\alpha^{*1}},\ldots,\hat{\alpha^{*B}}$. We can compute the standard error of these boostrap estimate using following formula $$ \text{SE}_B(\hat{\alpha})=\sqrt{\frac{1}{B-1}\sum_{r=1}^{B}\big(\hat{\alpha^{*r}} -\frac{1}{B}\sum_{r'=1}^{B}\hat{\alpha^{*r'}}\big)^2} $$