Supervised

instances (input)
concept (function/mapper)
target (answer)
hypothesis (class)
sample (training set)
candidate (concept $\stackrel{?}{=}$ target)
testing set

generalization, not memorization, is the whole point of machine learning

Unsupervised



In [1]:

    
import numpy as np
from scipy import optimize as op



In [79]:

    
X = np.arange(1,101)
def f(x):
    return (x % 6)**2 % 7 - np.sin(x)

y = f(X)
max_y = max(y)
for x in X:
    if f(x) == max_y:
        print x



In [82]:

    
print [x % 6 for x in X]
print 25 % 7









    



[1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4]
4

random restart hillclimbing



In [6]:

    
p = [0.14,0.36,0.21,0.29]
from matplotlib import pyplot as plt
%matplotlib inline
plt.hist()
plt.show()

Reinforcement

Markov Decision Processes

definition of problem:
States: $S$ (set; things taht describe the world)
Model: $T(s, a, s') \sim Pr(s'|s,a)$ (describes rules/physics of world/game; function of state, action, and another state; produces probability; transition function)
Actions: $A(s), A$ (things that can be done in a state; a function of state)
Reward: $R(s), R(s,a), R(s,a,s')$ (scalar value for being in a state; encompasses the domain knowledge; different reward types)

these four things define a MDP

Markovian property: only the present matters; memoryless; only depends on current state $s$, rules don't change (i.e., stationary)

definition of solution:
Policy: $\pi(s) \rightarrow a$ (function that takes in a state, returns action)
$\pi*$ policy that maximizes/optimizes long term reward

you can refer a plan, but it's technically not working on a plan (since only one state/action at a time)

delayed reward (don't know how each step affects reward until the end)
minor changes matter
The (Temporal) Credit Assignment Problem