Simple Reinforcement Learning with Tensorflow: Part 0 - Q-Tables

In this iPython notebook we implement a Q-Table algorithm that solves the FrozenLake problem. To learn more, read here: https://medium.com/@awjuliani/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0

For more reinforcment learning tutorials, see: https://github.com/awjuliani/DeepRL-Agents



In [1]:

    
import gym
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline

Load the environment



In [2]:

    
env = gym.make('FrozenLake-v0')









    



[2017-03-08 19:31:58,969] Making new env: FrozenLake-v0

Implement Q-Table learning algorithm



In [3]:

    
#Initialize table with all zeros
Q = np.zeros([env.observation_space.n,env.action_space.n])
# Set learning parameters
lr = .8
y = .95
num_episodes = 2000
#create lists to contain total rewards and steps per episode
#jList = []
rList = []
for i in range(num_episodes):
    #Reset environment and get first new observation
    s = env.reset()
    rAll = 0
    d = False
    j = 0
    #The Q-Table learning algorithm
    while j < 99:
        j+=1
        #Choose an action by greedily (with noise) picking from Q table
        a = np.argmax(Q[s,:] + np.random.randn(1,env.action_space.n)*(1./(i+1)))
        #Get new state and reward from environment
        s1,r,d,_ = env.step(a)
        #Update Q-Table with new knowledge
        Q[s,a] = Q[s,a] + lr*(r + y*np.max(Q[s1,:]) - Q[s,a])
        rAll += r
        s = s1
        if d == True:
            break
    #jList.append(j)
    rList.append(rAll)



In [4]:

    
print("Score over time: " +  str(sum(rList)/num_episodes))









    



Score over time: 0.5345



In [5]:

    
print("Final Q-Table Values")
print(Q)









    



Final Q-Table Values
[[  5.65665336e-01   1.32702822e-02   1.84476143e-04   2.78297656e-02]
 [  1.11953138e-03   1.33937995e-04   0.00000000e+00   5.89997642e-01]
 [  2.73597828e-02   0.00000000e+00   5.28510052e-03   5.24379779e-01]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   4.26526621e-01]
 [  6.53693362e-01   1.68814092e-03   2.63829516e-03   1.91123462e-03]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00]
 [  8.09674056e-02   4.48724195e-05   2.20163475e-09   2.58590894e-06]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00   3.07917957e-03   5.60094060e-01]
 [  3.31391432e-04   4.99781611e-01   0.00000000e+00   0.00000000e+00]
 [  8.47572076e-01   3.13583798e-04   8.98214216e-04   2.51734273e-04]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00]
 [  1.32122644e-05   5.31944387e-05   8.97199769e-01   4.44037973e-04]
 [  0.00000000e+00   0.00000000e+00   9.96736292e-01   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00]]



In [ ]: