Mean value is given as: $$ \bar{X} = \frac{1}{N} \sum_{i=1}^{N} X_{i} $$
This way of calculating the mean value is not scalabable, since we would need to store all the results from the previous episodes.
Chernoff-Hoeffding bound states that the confidence bound changes exponentially with number of samples we collect: $$ P\left \{ \left | \bar{X} - \mu \geq \varepsilon \right | \right \} \leq 2exp\left \{ -2\varepsilon^{2}N \right \} $$
That leads to another, simpler equation: $$ X_{UBC-j} = \bar{X}_{j} + \sqrt{2\frac{lnN}{N_{j}}} $$ where:
In [ ]: