We made POI red so we may say trends. We see that we don't see any trend here. Katie's intuition says that maybe it is not the exact number of email that matters. Maybe the fraction of emails that are received are important.
Here we can see that although we don't have a very good cluster of POI they are not that spread out. There are some places where there are no POIs and can indicate that none of the people there are POIs
Features and information are not the same thing. We want information. Features are an attempt to get information
There are several go-to methods of automatically selecting your features in sklearn. Many of them fall under the umbrella of univariate feature selection, which treats each feature independently and asks how much power it gives you in classifying or regressing.
There are two big univariate feature selection tools in sklearn: SelectPercentile and SelectKBest. The difference is pretty apparent by the names: SelectPercentile selects the X% of features that are most powerful (where X is a parameter) and SelectKBest selects the K features that are most powerful (where K is a parameter).
A clear candidate for feature reduction is text learning, since the data has such high dimension. We actually did feature selection in the Sara/Chris email classification problem during the first few mini-projects; you can see it in the code in tools/email_preprocess.py.
High Bias
High Variance
Maybe there are many features that you need to consider to fully describe your features but if you use too few features then you are using only a few of them. This is a typical high bias situation.
If we carefully tune the algorithm very much to minimize and take out all the information that can be found from your training set then it becomes a high variance situation.
There is the trade off between the goodness of the fit and the simplicity of your fit
We want to use as few features as possible which gives us large (r ^ 2) / low sum of squared errors
Some algorithms can automatically find the sweet spot between the number of features and quality of model. This process is called regularisation.
Lasso regression tries to minimize the SSE (Sum of Squared Errors) and the number of features
The decrease of SSE should not be offset by the increase in coefficients. For those that do not satisfy this it can set the coefficients to zero for those features.