average squared prediction error
Note that x_0 is a test data, why f hat x_0 is a random varible? Here, method is fix but training data can vary according some probablity.
So bias is defined as the following:
$$E(\hat{f}(x_0)) - f(x_0)$$More flexible model, less bias but more variance
A regression coefficient beta_j estimates the expected change in Y per unit change in X_j, with all other predictors held fixed. BUT predictors usually change together
The only way to find out what will happen when a complex system is disturbed is to disturb the system, not merely to observe it passively.
x_i can take 1 or 0.
If a qualitative variable is more than 2 level, needs to create multiple variables. One for if Asian or not, one for if Caucasian or not. One fewer variable than number of levels.
Tree based methods: stratifying and segmenting predictor space into a number simple regions. Simple and easy to explain. Accuracy can be improved by bagging, random forest and boosting.
Hitter's data:
* Number of years + Number of hits to predict
* Internal node is a split of one predictor
* The prediction is the mean of obsvervations in the leaf node.
Improve it slowly by adding this new decision tree into the fitted function.
Gene example, even stump outperforms the random forest
Tuning parameters