HackerMath for ML

Introduction

Intro to Stats & Maths for Machine Learning

Amit Kapoor @amitkaps
Bargava Subramanian @bargava

What I cannot create, I do not understand -- Richard Feynman

Philosophy of HackerMath

Hacker literally means developing mastery over something. -- Paul Graham

Here we will aim to learn Math essential for Data Science in this hacker way.

Three Key Questions

Why do you need to understand the math?
What math knowledge do you need?
Why approach it the hacker's way?

Approach

Understand the Math.
Code it to learn it.
Play with code.

Module 1: Linear Algebra

Supervised ML - Regression, Classification

Solve $Ax = b$ for $ n \times n$
Solve $Ax = b$ for $ n \times p + 1$
Linear Regression
Ridge Regularization (L2)
Bootstrapping
Logistic Regression (Classification)

Module 2: Statistics

Hypothesis Testing: A/B Testing

Basic Statistics
Distributions
Shuffling
Bootstrapping & Simulation
A/B Testing

Module 3: Linear Algebra contd.

Unsupervised ML: Dimensionality Reduction

Solve $Ax = \lambda x$ for $ n \times n$
Eigenvectors & Eigenvalues
Principle Component Analysis
Cluster Analysis (K-Means)

Schedule

0900 - 1000: Breakfast
1000 - 1130: Session 1
1130 - 1145: Tea Break
1145 - 1315: Session 2
1315 - 1400: Lunch
1400 - 1530: Session 3
1530 - 1545: Tea Break
1545 - 1700: Session 4

It’s tough to make predictions, especially about the future. -- Yogi Berra

What is Machine Learning (ML)?

[Machine learning is the] field of study that gives computers the ability to learn without being explicitly programmed. -- Arthur Samuel

Machine learning is the study of computer algorithm that improve automatically through experience -- Tom Mitchell

ML Problems

“Is this cancer?”
“What is the market value of this house?”
“Which of these people are friends?”
“Will this person like this movie?”
“Who is this?”
“What did you say?”
“How do you fly this thing?”.

ML in use Everyday

Search
Photo Tagging
Spam Filtering
Recommendation
...

Broad ML Application

Database Mining e.g. Clickstream data, Business data
Automating e.g. Handwriting, Natural Language Processing, Computer Vision
Self Customising Program e.g. Recommendations

ML Thought Process

Learning Paradigm

Supervised Learning
Unsupervised Learning
Reinforcement Learning
Online Learning

Supervised Learning

Regression
Classification

Unsupervised Learning

Clustering
Dimensionality Reduction

ML Pipeline

Frame: Problem definition
Acquire: Data ingestion
Refine: Data wrangline
Transform: Feature creation
Explore: Feature selection
Model: Model creation & assessment
Insight: Communication

Linear Regression

Linear Relationship

$$ y_i = \alpha + \beta_1 x_1 + \beta_2 x_2 + .. $$

Objective Function

$$ \epsilon = \sum_{k=1}^n (y_i - \hat{y_i} ) ^ 2 $$

Interactive Example: http://setosa.io/ev/

Logit Function

$$ \sigma (t)={\frac {e^{t}}{e^{t}+1}}={\frac {1}{1+e^{-t}}}$$

Logistic Regression

Logistic Relationship

Find the $ \beta $ parameters that best fit: $ y=1 $ if $\beta _{0}+\beta _{1}x+\epsilon > 0$ $ y=0$, otherwise

Follows:

$$ P(x)={\frac {1}{1+e^{-(\beta _{0}+\beta _{1}x)}}} $$

Fitting a Model

Bias-Variance Tradeoff

Train and Test Datasets

Split the Data - 80% / 20%

Train and Test Datasets

Measure the error on Test data

Model Complexity

Cross Validation

Regularization

Attempts to impose Occam's razor on the solution

Model Evaluation

Mean Squared Error

$$ MSE = 1/n \sum_{k=1}^n (y_i - \hat{y_i} ) ^ 2 $$

Model Evaluation

Confusion Matrix

Model Evaluation

Classification Metrics

Recall (TPR) = TP / (TP + FN)
Precision = TP / (TP + FP)
Specificity (TNR) = TN / (TN + FP)

Model Evaluation

Receiver Operating Characteristic Curve

Plot of TPR vs FPR at different discrimination threshold

Decision Tree

Example: Survivor on Titanic

Decision Tree

Easy to interpret
Little data preparation
Scales well with data
White-box model
Instability – changing variables, altering sequence
Overfitting

Bagging

Also called bootstrap aggregation, reduces variance
Uses decision trees and uses a model averaging approach

Random Forest

Combines bagging idea and random selection of features.
Similar to decision trees are constructed – but at each split, a random subset of features is used.

Challenges

If you torture the data enough, it will confess. -- Ronald Case

Data Snooping
Selection Bias
Survivor Bias
Omitted Variable Bias
Black-box model Vs White-Box model
Adherence to regulations



In [ ]:



In [ ]:



In [ ]:



In [ ]: