Machine Learning Begins

Definition

Machine Learning is a field of study that gives computers the ability to learn without being explicitly programmed.

Arthur Samuel (1959)

A computer is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Tom Mitchell (1998)

_{Source: [Coursera's "What is Machine Learning?" lecture](https://pt.coursera.org/learn/machine-learning/lecture/Ujm7v/what-is-machine-learning)}

Hype

Skill set

Data Science Metromap

Typical kind of resources we find in the web (and our reactions)

How neural network layers internally work

Regularized linear regression formula

Typical concerns

But I won't manage to master everything on Data Science Metromap!

But I don't understand those mathematical formulas! I'm not a statistician!

But I can't understand exactly how the internals work!

But I don't have Big Data!

Machine Learning Pyramid

(from the Computer Science point of view)

Baby steps

We can start just understanding how different algorithms work and when and how to use them to solve problems!

Model

Simple workflow (generic example)

Supervised learning workflow (example with an image as input)

Steps to build a great model

Some extra concerns

Bias
Underfitting
Overfitting

Types of Algorithms

Machine Learning Types

Supervised Learning (predictive modeling, labeled data)
- Classification (Diagnose alzheimer positive or negative in image)
- Regression (Predict population growth)
Unsupervised Learning (descriptive modeling, unlabeled data)
- Clustering (Customer segmentation)
  - Association (Product recommendation)

Exercise!

Given a Stack Overflow post, find other similar Stack Overflow posts
Categorize a new candidate's interview data into Jr/Mid/Sr
Discover groups of users with similar interest in GitHub
Rank the best NYSE options to invest
Crawl a real estate website and then estimate the pricing of your own house

Some technologies

Programming languages

Python
R

Jupyter Notebook

Data structure packages

NumPy (N-dimensional array package)
Pandas (Data frame and data analysis package)

Data visualization

matplotlib (plotting package)
seaborn (plotting package)
d3 (data driven dom manipulation)
Tableau (Data Analysis)

ML algorithms & ecosystem

scikit-learn
PyML
Keras/Tensor Flow (Neural network/Deep learning algorithms)

Distributed data processing

PySpark/Spark

Machine Learning Begins

Definition

Hype

Skill set

Data Science Metromap

Typical kind of resources we find in the web (and our reactions)

Typical concerns

Machine Learning Pyramid

Baby steps

Model

Steps to build a great model

Some extra concerns

Types of Algorithms

Machine Learning Types

Exercise!

Some technologies

Programming languages

Data structure packages

Data visualization

ML algorithms & ecosystem

Distributed data processing

AIaaS/MLaaS

But what algorithm should I use?

Machine Learning Begins

Definition

Hype

Skill set

Data Science Metromap

Typical kind of resources we find in the web (and our reactions)

Typical concerns

Machine Learning Pyramid

Baby steps

Model

Steps to build a great model

Some extra concerns

Types of Algorithms

Machine Learning Types

Exercise!

Some technologies

Programming languages

Code share and visualization

Data structure packages

Data visualization

ML algorithms & ecosystem

Distributed data processing

AIaaS/MLaaS

But what algorithm should I use?