Y.-W. FANG at Kyoto University, April 28th - April ～th, 2018

机器学习，无论单纯从理论，或者应用的角度来看，都可能会只看到冰山一角。现在，我们是从基础开始的，即fundation Oriented Machine Learning. 在学习的过程中，穿插基础在理论和实践中的实际应用，讨论各种应用场景背后的基础。

The goal of this course is make students learn 'future/untaught' techniques or study deeper theory easily.

Learning to Answer Yes/No

回顾下course01--the learning problem： $\mathcal{A}$ (algorithm) takes $\mathcal{D}$ (database) and $\mathcal{H}$ (hypothesis set) to get $g$ (final hypothesis)

在当前的course02中，我们则要让机器学习帮助我们判断 yes or no。

Perceptron Hypothesis Set ($\mathcal{H}$)

假设我们是一家银行，当一位客户（用户信息见下表）对我行提出信用卡申请时，我们需要判断核发或者拒发。

age	23 years
annual salary	NTD 1,000,000
year in job	0.5 year
current debt	200,000

For $\vec{x}$ = ($x_1$, $x_2$, $x_3$, ..., $x_d$) -- 'features of customer', compute a weighted 'score' and
- approve credit if ${\sum_{i=1}^d w_ix_i > {\rm threshold}}$
- deny credit if ${\sum_{i=1}^d w_ix_i < {\rm threshold}}$
$y$: {+1(good), -1(bad)}, 0 ignored -- linear formula $h$ ${\in}$ $\mathcal{H}$ are

${h({\mathbf{x}}) = {\rm sign}((\sum_{i=1}^d w_ix_i) - {\rm{threshold}})}$

Note:

'perceptron': 感知器，来源于类神经网络中的专业名词。

我们可以对perceptron hypothesis ${h({\mathbf{x}})}$ 矢量化，是指具有更简单对形式：

${h({\mathbf{x}}) = {\rm sign}((\sum_{i=1}^d w_ix_i) - {\rm{threshold}})}$

= sign${((\sum_{i=1}^d w_ix_i) + (-{\rm{threshold}})\cdot(+1))}$

这里，我们可以把 -threshold 这一项取为 ${w_{\rm 0}}$, +1 取作 ${x_0}$，那么我们就可以把第二项放到第一项里面，上面的公式就变为

= sign(${\sum_{i=0}^d w_ix_i}$)

我们可以把 ${\sum_{i=0}^d x_i}$ 和 ${\sum_{i=0}^d w_i}$ 都写成vector form，then we have

= sign(${\mathbf{w}^T \mathbf{x}}$)

each 'tall' $\mathbf{w}$ represents a hypothesis $h$ $\&$ is multiplied with 'tall' $\mathbf{x}$ -- will use tall versions to simplify notation

what do perceptrons $h$ 'look linke'? -- Perceptrons in ${\mathbb{R}^2}$

${h({\mathbf{x}})}$ = sign(${w_0 + w_1x_1 + w_2x_2}$)

这里 $x_1$ 和 ${x_2}$ 分别代表客户features中的第一个和第二个维度。

示意图如下，

customer features ${\mathbf{x}}$: points on the plane (or points in ${\mathbb{R}^d}$, 在上面的式子中，$d$ = 2)
labels $y$: $\circ$ represents (+1), $\times$ represents (-1)
hypothesis $h$: $\textbf{lines}$ (or hyperplanes in ${\mathbb{R}^d}$)

--positive on one side of a line, negative on the other side.

因此在二维平面内，这个 perceptron 其实就是一条线，因此perceptron 也被称作 linear (binary) classifiers. 在三维中，这个 perceptron 就是一个面。

Perceptron Learning Algorithm (PLA)

$\mathcal{H}$ = all possible perceptrons, but how can we select $g$ from $\mathcal{H}$?

我们的出发点如下

want: $g$ $\approx$ $f$ (hard when $f$ unknown)
almost necessary: $g$ $\approx$ $f$ on $\mathcal{D}$, ideally ${g(\mathbf{x}_n)}$ = ${f(\mathbf{x}_n)}$ = ${y_n}$

但是我们发现，一个一个去尝试 $\mathcal{H}$ 中的 $h$ 在实际情况中是十分困难的，因为$\mathcal{H}$ 是无限的。即使是在上一个图中的二维问题，这个集合就是无限的，因为一个二维平面中有无数条直线。显然，穷尽所有，几乎是不可能的。

我们不妨先取其中的一个，例如 ${g_0}$, 也许这条线并不是很好（指代它可能离$f$仍然相差较大），然后通过数据 $\mathcal{D}$, 不断地correct它。为了方便，我们用weight vector ${w_0}$ 来代表 ${g_0}$.



In [ ]:

    
Cyclic PLA

For $t$ = 0, 1, ...



In [ ]:



In [ ]:

Some remaining issures of PLA？

它是否会停下来？假设对于 $\mathcal{D}$，我们已经找到了一个 $g$ $\approx$ $f$，那么对于 $\mathcal{D}$ 之外的数据， $g$ 是否还能保持接近于我们希望的那个 $f$呢？这个问题留待之后的学习中解决。



In [ ]: