***** Easy Bandit *****
***** 3-armed bandit *****
Actions: N=3 Outcomes: M=8
Loss Matrix (with actions as row indices and outcomes as column indices):
|
000 |
001 |
010 |
011 |
100 |
101 |
110 |
111 |
| arm 0 |
1.0 |
1.0 |
1.0 |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
| arm 1 |
1.0 |
1.0 |
0.0 |
0.0 |
1.0 |
1.0 |
0.0 |
0.0 |
| arm 2 |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
Feedback Matrix (symbolic form):
|
000 |
001 |
010 |
011 |
100 |
101 |
110 |
111 |
| arm 0 |
loss |
loss |
loss |
loss |
win |
win |
win |
win |
| arm 1 |
loss |
loss |
win |
win |
loss |
loss |
win |
win |
| arm 2 |
loss |
win |
loss |
win |
loss |
win |
loss |
win |
Outcomes distribution (for stochastic games):
P(000)=0.045 P(001)=0.005 P(010)=0.045 P(011)=0.005 P(100)=0.405 P(101)=0.045 P(110)=0.405 P(111)=0.045
======> This game is EASY, because all neighbouring pairs are observable.
***** Hard Bandit *****
***** 4-armed bandit *****
Actions: N=4 Outcomes: M=16
Loss Matrix (with actions as row indices and outcomes as column indices):
|
0000 |
0001 |
0010 |
0011 |
0100 |
0101 |
0110 |
0111 |
1000 |
1001 |
1010 |
1011 |
1100 |
1101 |
1110 |
1111 |
| arm 0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
| arm 1 |
1.0 |
1.0 |
1.0 |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
1.0 |
1.0 |
1.0 |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
| arm 2 |
1.0 |
1.0 |
0.0 |
0.0 |
1.0 |
1.0 |
0.0 |
0.0 |
1.0 |
1.0 |
0.0 |
0.0 |
1.0 |
1.0 |
0.0 |
0.0 |
| arm 3 |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
Feedback Matrix (symbolic form):
|
0000 |
0001 |
0010 |
0011 |
0100 |
0101 |
0110 |
0111 |
1000 |
1001 |
1010 |
1011 |
1100 |
1101 |
1110 |
1111 |
| arm 0 |
loss |
loss |
loss |
loss |
loss |
loss |
loss |
loss |
win |
win |
win |
win |
win |
win |
win |
win |
| arm 1 |
loss |
loss |
loss |
loss |
win |
win |
win |
win |
loss |
loss |
loss |
loss |
win |
win |
win |
win |
| arm 2 |
loss |
loss |
win |
win |
loss |
loss |
win |
win |
loss |
loss |
win |
win |
loss |
loss |
win |
win |
| arm 3 |
loss |
win |
loss |
win |
loss |
win |
loss |
win |
loss |
win |
loss |
win |
loss |
win |
loss |
win |
Outcomes distribution (for stochastic games):
P(0000)=0.05 P(0001)=0.05 P(0010)=0.05 P(0011)=0.05 P(0100)=0.05 P(0101)=0.05 P(0110)=0.05 P(0111)=0.05 P(1000)=0.075 P(1001)=0.075 P(1010)=0.075 P(1011)=0.075 P(1100)=0.075 P(1101)=0.075 P(1110)=0.075 P(1111)=0.075
======> This game is EASY, because all neighbouring pairs are observable.
***** Four levels easy Dynamic Pricing (c=2) *****
***** 4-levels dynamic pricing *****
Actions: N=4 Outcomes: M=4
Loss Matrix (with actions as row indices and outcomes as column indices):
|
0$ |
1$ |
2$ |
3$ |
| 0$ |
0.0 |
1.0 |
2.0 |
3.0 |
| 1$ |
2.0 |
0.0 |
1.0 |
2.0 |
| 2$ |
2.0 |
2.0 |
0.0 |
1.0 |
| 3$ |
2.0 |
2.0 |
2.0 |
0.0 |
Feedback Matrix (symbolic form):
|
0$ |
1$ |
2$ |
3$ |
| 0$ |
sold |
sold |
sold |
sold |
| 1$ |
not-sold |
sold |
sold |
sold |
| 2$ |
not-sold |
not-sold |
sold |
sold |
| 3$ |
not-sold |
not-sold |
not-sold |
sold |
Outcomes distribution (for stochastic games):
P(0$)=0.1 P(1$)=0.1 P(2$)=0.7 P(3$)=0.1
======> This game is HARD, because [0$,2$] pair is not locally observable.
***** Five levels hard Dynamic Pricing (c=2) *****
***** 6-levels dynamic pricing *****
Actions: N=6 Outcomes: M=6
Loss Matrix (with actions as row indices and outcomes as column indices):
|
0$ |
1$ |
2$ |
3$ |
4$ |
5$ |
| 0$ |
0.0 |
1.0 |
2.0 |
3.0 |
4.0 |
5.0 |
| 1$ |
2.0 |
0.0 |
1.0 |
2.0 |
3.0 |
4.0 |
| 2$ |
2.0 |
2.0 |
0.0 |
1.0 |
2.0 |
3.0 |
| 3$ |
2.0 |
2.0 |
2.0 |
0.0 |
1.0 |
2.0 |
| 4$ |
2.0 |
2.0 |
2.0 |
2.0 |
0.0 |
1.0 |
| 5$ |
2.0 |
2.0 |
2.0 |
2.0 |
2.0 |
0.0 |
Feedback Matrix (symbolic form):
|
0$ |
1$ |
2$ |
3$ |
4$ |
5$ |
| 0$ |
sold |
sold |
sold |
sold |
sold |
sold |
| 1$ |
not-sold |
sold |
sold |
sold |
sold |
sold |
| 2$ |
not-sold |
not-sold |
sold |
sold |
sold |
sold |
| 3$ |
not-sold |
not-sold |
not-sold |
sold |
sold |
sold |
| 4$ |
not-sold |
not-sold |
not-sold |
not-sold |
sold |
sold |
| 5$ |
not-sold |
not-sold |
not-sold |
not-sold |
not-sold |
sold |
Outcomes distribution (for stochastic games):
P(0$)=0.3 P(1$)=0.1 P(2$)=0.1 P(3$)=0.1 P(4$)=0.1 P(5$)=0.3
======> This game is HARD, because [0$,2$] pair is not locally observable.
***** G. Bartok's thesis game *****
***** Bartok game *****
Actions: N=3 Outcomes: M=3
Loss Matrix (with actions as row indices and outcomes as column indices):
|
0 |
1 |
2 |
| 0 |
1.0 |
1.0 |
0.0 |
| 1 |
0.0 |
1.0 |
1.0 |
| 2 |
1.0 |
0.0 |
1.0 |
Feedback Matrix (symbolic form):
|
0 |
1 |
2 |
| 0 |
a |
b |
b |
| 1 |
b |
a |
b |
| 2 |
b |
b |
a |
Outcomes distribution (for stochastic games):
P(0)=0.333333333333 P(1)=0.333333333333 P(2)=0.333333333333
======> This game is EASY, because all neighbouring pairs are observable.
***** Apple tasting (organic food) *****
***** Apple tasting game *****
Actions: N=2 Outcomes: M=2
Loss Matrix (with actions as row indices and outcomes as column indices):
|
rotten |
good |
| sell apple |
1.0 |
0.0 |
| taste apple |
0.0 |
1.0 |
Feedback Matrix (symbolic form):
|
rotten |
good |
| sell apple |
blind |
blind |
| taste apple |
rotten |
good |
Outcomes distribution (for stochastic games):
P(rotten)=0.05 P(good)=0.95
======> This game is EASY, because all neighbouring pairs are observable.
***** Apple tasting (supermarket) *****
***** Apple tasting game *****
Actions: N=2 Outcomes: M=2
Loss Matrix (with actions as row indices and outcomes as column indices):
|
rotten |
good |
| sell apple |
1.0 |
0.0 |
| taste apple |
0.0 |
1.0 |
Feedback Matrix (symbolic form):
|
rotten |
good |
| sell apple |
blind |
blind |
| taste apple |
rotten |
good |
Outcomes distribution (for stochastic games):
P(rotten)=0.5 P(good)=0.5
======> This game is EASY, because all neighbouring pairs are observable.
***** Horse race *****
***** Full-information (horse race) *****
Actions: N=4 Outcomes: M=4
Loss Matrix (with actions as row indices and outcomes as column indices):
|
0 |
1 |
2 |
3 |
| bet on horse 0 |
0.0 |
1.0 |
1.0 |
1.0 |
| bet on horse 1 |
1.0 |
0.0 |
1.0 |
1.0 |
| bet on horse 2 |
1.0 |
1.0 |
0.0 |
1.0 |
| bet on horse 3 |
1.0 |
1.0 |
1.0 |
0.0 |
Feedback Matrix (symbolic form):
|
0 |
1 |
2 |
3 |
| bet on horse 0 |
0 |
1 |
2 |
3 |
| bet on horse 1 |
0 |
1 |
2 |
3 |
| bet on horse 2 |
0 |
1 |
2 |
3 |
| bet on horse 3 |
0 |
1 |
2 |
3 |
Outcomes distribution (for stochastic games):
P(0)=0.1 P(1)=0.6 P(2)=0.1 P(3)=0.2
======> This game is EASY, because all neighbouring pairs are observable.
***** Intractable *****
***** Intractable *****
Actions: N=2 Outcomes: M=2
Loss Matrix (with actions as row indices and outcomes as column indices):
|
no |
yes |
| ask |
1.0 |
0.0 |
| not-ask |
0.0 |
1.0 |
Feedback Matrix (symbolic form):
|
no |
yes |
| ask |
maybe |
maybe |
| not-ask |
who-knows |
who-knows |
Outcomes distribution (for stochastic games):
P(no)=0.75 P(yes)=0.25
======> This game is INTRACTABLE, because [ask,not-ask] pair is not globally observable.
***** Label efficient prediction *****
***** Label-efficient prediction *****
Actions: N=3 Outcomes: M=2
Loss Matrix (with actions as row indices and outcomes as column indices):
|
ham |
spam |
| ask user |
1.0 |
1.0 |
| transfer email |
0.0 |
1.0 |
| drop email |
2.0 |
0.0 |
Feedback Matrix (symbolic form):
|
ham |
spam |
| ask user |
ham |
spam |
| transfer email |
blind |
blind |
| drop email |
blind |
blind |
Outcomes distribution (for stochastic games):
P(ham)=0.75 P(spam)=0.25
======> This game is HARD, because [transfer email,drop email] pair is not locally observable.
***** Easy Dueling Bandit *****
***** 3-armed utility-based dueling bandit *****
Actions: N=6 Outcomes: M=8
Loss Matrix (with actions as row indices and outcomes as column indices):
|
000 |
001 |
010 |
011 |
100 |
101 |
110 |
111 |
| (0,0) |
1.0 |
1.0 |
1.0 |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
| (0,1) |
1.0 |
1.0 |
0.5 |
0.5 |
0.5 |
0.5 |
0.0 |
0.0 |
| (0,2) |
1.0 |
0.5 |
1.0 |
0.5 |
0.5 |
0.0 |
0.5 |
0.0 |
| (1,1) |
1.0 |
1.0 |
0.0 |
0.0 |
1.0 |
1.0 |
0.0 |
0.0 |
| (1,2) |
1.0 |
0.5 |
0.5 |
0.0 |
1.0 |
0.5 |
0.5 |
0.0 |
| (2,2) |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
Feedback Matrix (symbolic form):
|
000 |
001 |
010 |
011 |
100 |
101 |
110 |
111 |
| (0,0) |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
| (0,1) |
tie |
tie |
loss |
loss |
win |
win |
tie |
tie |
| (0,2) |
tie |
loss |
tie |
loss |
win |
tie |
win |
tie |
| (1,1) |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
| (1,2) |
tie |
loss |
win |
tie |
tie |
loss |
win |
tie |
| (2,2) |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
Outcomes distribution (for stochastic games):
P(000)=0.045 P(001)=0.005 P(010)=0.045 P(011)=0.005 P(100)=0.405 P(101)=0.045 P(110)=0.405 P(111)=0.045
======> This game is EASY, because all neighbouring pairs are observable.
***** Hard Dueling Bandit *****
***** 4-armed utility-based dueling bandit *****
Actions: N=10 Outcomes: M=16
Loss Matrix (with actions as row indices and outcomes as column indices):
|
0000 |
0001 |
0010 |
0011 |
0100 |
0101 |
0110 |
0111 |
1000 |
1001 |
1010 |
1011 |
1100 |
1101 |
1110 |
1111 |
| (0,0) |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
| (0,1) |
1.0 |
1.0 |
1.0 |
1.0 |
0.5 |
0.5 |
0.5 |
0.5 |
0.5 |
0.5 |
0.5 |
0.5 |
0.0 |
0.0 |
0.0 |
0.0 |
| (0,2) |
1.0 |
1.0 |
0.5 |
0.5 |
1.0 |
1.0 |
0.5 |
0.5 |
0.5 |
0.5 |
0.0 |
0.0 |
0.5 |
0.5 |
0.0 |
0.0 |
| (0,3) |
1.0 |
0.5 |
1.0 |
0.5 |
1.0 |
0.5 |
1.0 |
0.5 |
0.5 |
0.0 |
0.5 |
0.0 |
0.5 |
0.0 |
0.5 |
0.0 |
| (1,1) |
1.0 |
1.0 |
1.0 |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
1.0 |
1.0 |
1.0 |
1.0 |
0.0 |
0.0 |
0.0 |
0.0 |
| (1,2) |
1.0 |
1.0 |
0.5 |
0.5 |
0.5 |
0.5 |
0.0 |
0.0 |
1.0 |
1.0 |
0.5 |
0.5 |
0.5 |
0.5 |
0.0 |
0.0 |
| (1,3) |
1.0 |
0.5 |
1.0 |
0.5 |
0.5 |
0.0 |
0.5 |
0.0 |
1.0 |
0.5 |
1.0 |
0.5 |
0.5 |
0.0 |
0.5 |
0.0 |
| (2,2) |
1.0 |
1.0 |
0.0 |
0.0 |
1.0 |
1.0 |
0.0 |
0.0 |
1.0 |
1.0 |
0.0 |
0.0 |
1.0 |
1.0 |
0.0 |
0.0 |
| (2,3) |
1.0 |
0.5 |
0.5 |
0.0 |
1.0 |
0.5 |
0.5 |
0.0 |
1.0 |
0.5 |
0.5 |
0.0 |
1.0 |
0.5 |
0.5 |
0.0 |
| (3,3) |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
1.0 |
0.0 |
Feedback Matrix (symbolic form):
|
0000 |
0001 |
0010 |
0011 |
0100 |
0101 |
0110 |
0111 |
1000 |
1001 |
1010 |
1011 |
1100 |
1101 |
1110 |
1111 |
| (0,0) |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
| (0,1) |
tie |
tie |
tie |
tie |
loss |
loss |
loss |
loss |
win |
win |
win |
win |
tie |
tie |
tie |
tie |
| (0,2) |
tie |
tie |
loss |
loss |
tie |
tie |
loss |
loss |
win |
win |
tie |
tie |
win |
win |
tie |
tie |
| (0,3) |
tie |
loss |
tie |
loss |
tie |
loss |
tie |
loss |
win |
tie |
win |
tie |
win |
tie |
win |
tie |
| (1,1) |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
| (1,2) |
tie |
tie |
loss |
loss |
win |
win |
tie |
tie |
tie |
tie |
loss |
loss |
win |
win |
tie |
tie |
| (1,3) |
tie |
loss |
tie |
loss |
win |
tie |
win |
tie |
tie |
loss |
tie |
loss |
win |
tie |
win |
tie |
| (2,2) |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
| (2,3) |
tie |
loss |
win |
tie |
tie |
loss |
win |
tie |
tie |
loss |
win |
tie |
tie |
loss |
win |
tie |
| (3,3) |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
tie |
Outcomes distribution (for stochastic games):
P(0000)=0.05 P(0001)=0.05 P(0010)=0.05 P(0011)=0.05 P(0100)=0.05 P(0101)=0.05 P(0110)=0.05 P(0111)=0.05 P(1000)=0.075 P(1001)=0.075 P(1010)=0.075 P(1011)=0.075 P(1100)=0.075 P(1101)=0.075 P(1110)=0.075 P(1111)=0.075
======> This game is EASY, because all neighbouring pairs are observable.