Is learning feasible?

Lecture

Hoeffding's inequality (valid for 1 experiment), probability that the sample has a meaning for the population

$ P ( |\nu - \mu| > \epsilon) \leq 2 e^{-2 \epsilon^2 N} $ with:

  • $\mu$ probability of an event $E$
  • $\nu$ sample frequency of the event $E$
  • $\epsilon$ tolerance
  • $N$ sample size

$\mu = \nu$ is PAC (possibly approximetly correct)

Relate to learning, hypothesis function $h$ has a probability to be close to the target function $f$ ($h = f$ is PAC).

Notations:

  • $\nu$ is in sample, denoted by $E_{in}$ (ie. error in sample)
  • $\mu$ is out of sample, denoted by $E_{out}$ (ie. error out of sample)

If we perform well out of sample (ie. $E_{out}$ is small) then we must have really learn.
But we are actually trying to see performance through our hypothesis $h$, so these errors depends on $h$. Leading to new notations:

  • $E_{in}(h)$ (was $\nu$)
  • $E_{out}(h)$ (was $\mu$)
  • $P ( |E_{in}(h) - E_{out}(h)| > \epsilon) \leq 2 e^{-2 \epsilon^2 N} $ (valid for one $h$)
  • $P ( |E_{in}(h) - E_{out}(h)| > \epsilon) \leq 2 M e^{-2 \epsilon^2 N} $ (valid for M hypotheses $h$)

Q & A

Q. What if the Hoeffding's inequality gives something trivial like p < 2?
A. Either data size insufficient, or tolerance to stringent.


In [ ]: