Supervised

instances (input)
concept (function/mapper)
target (answer)
hypothesis (class)
sample (training set)
candidate (concept $\stackrel{?}{=}$ target)
testing set

generalization, not memorization, is the whole point of machine learning

Unsupervised


In [1]:
import numpy as np
from scipy import optimize as op

In [79]:
X = np.arange(1,101)
def f(x):
    return (x % 6)**2 % 7 - np.sin(x)

y = f(X)
max_y = max(y)
for x in X:
    if f(x) == max_y:
        print x


11

In [82]:
print [x % 6 for x in X]
print 25 % 7


[1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4]
4

random restart hillclimbing


In [6]:
p = [0.14,0.36,0.21,0.29]
from matplotlib import pyplot as plt
%matplotlib inline
plt.hist()
plt.show()


Reinforcement

Markov Decision Processes

definition of problem:
States: $S$ (set; things taht describe the world)
Model: $T(s, a, s') \sim Pr(s'|s,a)$ (describes rules/physics of world/game; function of state, action, and another state; produces probability; transition function)
Actions: $A(s), A$ (things that can be done in a state; a function of state)
Reward: $R(s), R(s,a), R(s,a,s')$ (scalar value for being in a state; encompasses the domain knowledge; different reward types)

these four things define a MDP

Markovian property: only the present matters; memoryless; only depends on current state $s$, rules don't change (i.e., stationary)

definition of solution:
Policy: $\pi(s) \rightarrow a$ (function that takes in a state, returns action)
$\pi*$ policy that maximizes/optimizes long term reward

you can refer a plan, but it's technically not working on a plan (since only one state/action at a time)

delayed reward (don't know how each step affects reward until the end)
minor changes matter
The (Temporal) Credit Assignment Problem