# What is machine learning, and how does it work?

From the video series: Introduction to machine learning with scikit-learn

## Agenda

• What is machine learning?
• What are the two main categories of machine learning?
• What are some examples of machine learning?
• How does machine learning "work"?

## What is machine learning?

One definition: "Machine learning is the semi-automated extraction of knowledge from data"

• Knowledge from data: Starts with a question that might be answerable using data
• Automated extraction: A computer provides the insight
• Semi-automated: Requires many smart decisions by a human

## What are the two main categories of machine learning?

Supervised learning: Making predictions using data

• Example: Is a given email "spam" or "ham"?
• There is an outcome we are trying to predict

Unsupervised learning: Extracting structure from data

• Example: Segment grocery store shoppers into clusters that exhibit similar behaviors
• There is no "right answer"

## How does machine learning "work"?

High-level steps of supervised learning:

1. First, train a machine learning model using labeled data

• "Labeled data" has been labeled with the outcome
• "Machine learning model" learns the relationship between the attributes of the data and its outcome
2. Then, make predictions on new data for which the label is unknown

The primary goal of supervised learning is to build a model that "generalizes": It accurately predicts the future rather than the past!

• How do I choose which attributes of my data to include in the model?
• How do I choose which model to use?
• How do I optimize this model for best performance?
• How do I ensure that I'm building a model that will generalize to unseen data?
• Can I estimate how well my model is likely to perform on unseen data?

## Resources



In [1]:

from IPython.core.display import HTML
def css_styling():
return HTML(styles)
css_styling()




Out[1]:

@font-face {
font-family: "Computer Modern";
src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');
}
div.cell{
width: 90%;
/*        margin-left:auto;*/
/*        margin-right:auto;*/
}
ul {
line-height: 145%;
font-size: 90%;
}
li {
margin-bottom: 1em;
}
h1 {
font-family: Helvetica, serif;
}
h4{
margin-top: 12px;
margin-bottom: 3px;
}
div.text_cell_render{
font-family: Computer Modern, "Helvetica Neue", Arial, Helvetica, Geneva, sans-serif;
line-height: 145%;
font-size: 130%;
width: 90%;
margin-left:auto;
margin-right:auto;
}
.CodeMirror{
font-family: "Source Code Pro", source-code-pro,Consolas, monospace;
}
/*    .prompt{
display: None;
}*/
.text_cell_render h5 {
font-weight: 300;
font-size: 16pt;
color: #4057A1;
font-style: italic;
margin-bottom: 0.5em;
margin-top: 0.5em;
display: block;
}

.warning{
color: rgb( 240, 20, 20 )
}

MathJax.Hub.Config({
TeX: {
extensions: ["AMSmath.js"]
},
tex2jax: {
inlineMath: [ ['$','$'], ["\$","\$"] ],
displayMath: [ ['$$','$$'], ["\$","\$"] ]
},
displayAlign: 'center', // Change this to 'center' to center equations.
"HTML-CSS": {
styles: {'.MathJax_Display': {"margin": 4}}
}
});