Introduction

What is machine learning
Why machine learning
Types of machine learning
Machine learning pipeline
Relations to other fields
Historical perspective

What is machine learning?

The construction and study of algorithms/programs that can learn from data .

Traditional programming

Steps: formulate problem $\rightarrow$ design algorithm $\rightarrow$ write program $\rightarrow$ test

The program remains invariant with different input data, unless the programmer manually changes it.

Example: minimum finding

Problem

Given a sequence of numbers, find the smallest one

Algorithm

Record the currently minimum, initialized to $\infty$
Loop through the numbers one by one
- If this number $<$ minimum, minimum $\leftarrow$ this number

Analysis

The time complexity of the above algorithm is $O(N)$ where $N$ is the sequence size.



In [1]:

    
# a simple python program to find minimum numbers
# note that it remains the same regardless of the input data

import math

# function
def findmin(numbers):
    answer = math.inf
    for value in numbers:
        if value < answer:
            answer = value
    return answer

# main
test = [3.14, 2.2, 8, -9.2, 100000, 0]

print(findmin(test))

Example: sorting

Problem

Given a sequence of numbers, order them from small to large

Algorithm

Pick one number (randomly) as anchor
Go through each other number
- If this number $\leq$ anchor, append it to the left sequence
- Otherwise, append it to the right sequence
Apply the same method recursively to the left and right sequences
Concatenate left sequence, anchor, right sequence



In [2]:

    
# code is left as an exercise

Other examples

Think about the programs you have written in the past; how many of them can learn from data?

Machine learning

The program can learn from data and change structure/behavior

The programmer still writes (part of) the program, but it is not fixed

Models with parameters that can change with input data, like brains
Programmer selects and initializes the model; the model parameters change with data (training via optimization)
The trained model deals with future situations

Why learning from data

Some algorithms/programs are hard/impossible to design/code manually/explicitly

The algorithms/programs might need to deal with unforseeable situations (generalization)

Example: handwriting digit recognition

Problem

Input: a digit represented by a $28 \times 28$ image (MNIST)
Output: one of the digits in [0 .. 9]

Traditional programming?

Give it a try :-)

Machine learning

Collect data - pairs of images and labels
Select and initialize a model; train the model (parameters) with the data
The model, if properly trained, can recognize handwritings not in the original dataset for training

Sometimes it is much easier to say what (example data) instead of how (algorithm)

Soon We Won’t Program Computers. We’ll Train Them Like Dogs

Other applications

Self-driving cars
Language translation
Speech analysis & synthesis
Spam filtering
Recommendation systems
Fraud detection
Market prediction

Types of machine learning

Supervised learning

Given examples of inputs and corresponding desired outputs, predict outputs for future inputs

classification, regression, time series prediction

Classification vs. regression

Classification: discrete output

Class labels: spam or not spam, positive or negative for disease, types of animals, etc.

Classifying flowers based on various features

Regression: continous output

function fitting

Unsupervised learning

Given only inputs (without desired output examples), automatically discover representations, features, structure, etc.

clustering, outlier detection, density estimation, dimensionality reduction

Clustering

Put data samples into different groups

Dimensionality reduction

Project data from a higher dimensional space into a lower dimensional space

compression, visualization

Reinforcement learning

Given sequences of actions of an agent and feedbacks from an environment, learn to select action sequences in a way that maximises the expected reward

playing games
- state: game board configuration
- reward: expected winning chance
self driving cars
- state: position, direction, speed, sensors, etc.
- reward: not hitting anything, remaining time for destination, fuel consumption, etc.

Summary of types

Types of learning

Supervised learning
- Given sample inputs and outputs, learn how to predict outputs for future inputs
Unsupervised learning
- Given only sample inputs, learn to transform or organize them in some way
Reinforcement learning
- No direct inputs or outputs, learn best behavior from environment feedbacks (states, rewards)

Types of data

Discrete/continuous $\times$ input/output
- Applies to different types of learning above
Discrete output
- Classification for supervised learning
- Clustering for unsupervised learning
Continuous output
- Regression for supervised learning
- Dimensionality reduction for unsupervised learning

Representation

Model

We can represent the model as a function $f$, with a set of parameters $\Theta$.
Given a set of inputs $\mathbf{X}$, the model computes outcomes $\mathbf{Y}$. $$\mathbf{Y} = f(\mathbf{X}, \Theta)$$

For example, in digit recognition, $\mathbf{X}$ and $\mathbf{Y}$ are the digit images and digit labels (0 to 9), respectively.

The parameters $\Theta$ consist of those optimized automatically and those manually picked by humans. The latter are called hyper-parameters.

Loss

Every machine learning task as a goal, which can be formalized as a loss function: $$L(\mathbf{X}, \mathbf{T}, \mathbf{Y})$$ , where $\mathbf{T}$ is some form of target or auxiliary information, such as:

labels for supervised classification
number of clusters for unsupervised clustering
environment for reinforcement learning

Regularization

In addition to the objective, we often care about the simplicity of the model, for better efficiency and generalization (avoiding over-fitting). The complexity of the model can be measured by another penalty function: $$P(\Theta)$$ Some common penalty functions include number and/or magnitude of parameters.

Objective

We can sum up both the loss and regularization terms as the total objective: $$\Phi(\mathbf{X}, \mathbf{T}, \Theta) = L\left(\mathbf{X}, \mathbf{T}, \mathbf{Y}=f(\mathbf{X}, \Theta)\right) + P(\Theta)$$

During training, the goal is to optimize the parameters $\Theta$ with respect to the given training data $\mathbf{X}$ and $\mathbf{T}$: $$argmin_\Theta \; \Phi(\mathbf{X}, \mathbf{T}, \Theta)$$ And hope the trained model with generalize well to future data.

Example: curve fitting

Given a set of data points $\left(\mathbf{X}, \mathbf{Y}\right)$, fit a model curve to describe their relationship.

This is actually a regression problem, but we have all seen this in prior math/coding classes to serve as a good example for machine learning.

Recall $\mathbf{Y} = f(\mathbf{X}, \Theta)$ is our model.

For 2D linear curve fitting, the model is a straight line: $y = w_1 x + w_0$, so the parameters $\Theta = \{w_0, w_1\}$.

The loss function is $L\left(\mathbf{X}, \mathbf{T}, \mathbf{Y}\right) = \sum_i \left( T^{(i)} - Y^{(i)}\right)^2 = \sum_i \left( T^{(i)} - w_1 X^{(i)} - w_0 \right)^2$.
($\mathbf{X}$ is a matrix/tensor, and each data sample is a row. We denote the ith sample/row as $\mathbf{X}^{(i)}$.)

For this simple example we don't care about regularization, thus $P(\Theta) = 0$.

The goal is to optimize $\Theta = \{w_0, w_1 \}$ with given $\left(\mathbf{X}, \mathbf{Y}\right)$ to minimize $L$. For simple cases like this, we can directly optimize via calculus: $$ \begin{align} \frac{\partial L}{\partial w_0} & = 0 \\ \frac{\partial L}{\partial w_1} & = 0 \end{align} $$

The math and coding will be left as an exercise.

Steps for building a machine learning system

We will talk about individual steps for the rest of this course.

Data mining

discovering (i.e. mining) useful information from large data sets

Pattern recognition

originated from engineering, more on hand-crafted algorithms
machine learning originated from computer science

Artificial intelligence

machine learning is a subset
traditional AI can be rule-based, deterministic
machine learnign tends to be data-driven, probabilistic
- statistics

Cognitive science

reverse engineers the brain
machine learning focuses on the forward process
- understanding biology can help, but not the goal

Deep learning

Deep learning $\subset$ neural networks $\subset$ machine learning

Algorithms existed since the 80's, but lacked sufficient computing power

Moore law enabled simple algorithms to process deep architecture and large data

Geoffrey Hinton, the godfather of ‘deep learning’—which helped Google’s AlphaGo beat a grandmaster—on the past, present and future of AI

A DARPA Perspective on Artificial Intelligence

https://youtu.be/-O01G3tSYpU



In [3]:

    
from IPython.display import YouTubeVideo
YouTubeVideo("-O01G3tSYpU")









    Out[3]:

Reading

PML - Chapter 1
IML - Chapter 1

Coding

Install anaconda, git
Review/learn python, ipynb
See README.md for quick instructions

Assignment

ex01

Introduction

What is machine learning?

Traditional programming

Example: minimum finding

Example: sorting

Other examples

Machine learning

Why learning from data

Example: handwriting digit recognition

Other applications

Types of machine learning

Supervised learning

Classification vs. regression

Classification: discrete output

Classifying flowers based on various features

Regression: continous output

Unsupervised learning

Clustering

Dimensionality reduction

Reinforcement learning

Summary of types

Representation

Model

Loss

Regularization

Objective

Example: curve fitting

Steps for building a machine learning system

Machine learning and other related fields

Deep learning

A DARPA Perspective on Artificial Intelligence

Reading

Coding

Assignment

Fin