We will review some of the basic methods in machine learning.
The text for this is Introduction to Machine Learning by Anreas Muller and Sarah Guido. It focuses on using tools from Python, and not so much on theoretical aspects.
A good source for theoretical development is the book The Elements of Statistical Learning by Hastie, Tibshirani and Friedmna, available online at http://statweb.stanford.edu/~tibs/ElemStatLearn
The software tools come from scikit-learn at http://scikit-learn.org
A video about the tools is available here http://bit.ly/advanced_machine_learning_scikit-learn
Code examples are here: https://github.com/amueller/introduction_to_ml_with_python
It's about extracting knowledge from data.
Think of presenting a collection of data to a computer that runs some algorithm. The point is to extract useful features from the data (e.g. where are the buildings in this image) or make a decision based on the data (e.g. this email is spam). Machine learning involves an algorithm that learns "on its own" how identify important features or make classification decisions.
Supervised versus unsupervised learning. Discuss.
Questions to ask when attempting to use machine learning:
In [3]:
%%bash
git clone https://github.com/amueller/introduction_to_ml_with_python.git
Now go back to your Jupyter Hub file list, to access the code examples.
In [7]:
from scipy.misc import imread
Supervised versus unsupervised.
Classification, Regression.
Regression for classification.
Types of linear regression.
Start with data $$x_1, x_2, x_3, \ldots x_n$$ use tthis to predict some $y$, linearly:
$$y = a_1 x_1 + a_2 x_2 + \cdots + a_n x_n$$Find values for parameters $a_1 ... a_n$ to get the best fit.
where $\mathbf{y}$ is a column vector, and $X$ is a matrix.
Minimize $$|| y - Xa ||_2$$ where we minimize over all choices of vector $a$.
Linear regression.
Least squares.
Ridge regression
Minimize $$|| y - Xa ||_2 + \alpha || a ||_2$$
Tychonov regularization, L2 regularization
Lasso regression $$|| y - Xa ||_2 + \alpha || a ||_1$$
L1 regularization
In [ ]: