In [1]:
# Add simple_rl to system path.
import os
import sys
parent_dir = os.path.abspath(os.path.join(os.getcwd(), os.pardir))
sys.path.insert(0, parent_dir)
from simple_rl.agents import QLearningAgent, RandomAgent
from simple_rl.tasks import GridWorldMDP
from simple_rl.run_experiments import run_agents_on_mdp
Next, we make an MDP and a few agents:
In [2]:
# Setup MDP.
mdp = GridWorldMDP(width=6, height=6, init_loc=(1,1), goal_locs=[(6,6)])
# Setup Agents.
ql_agent = QLearningAgent(actions=mdp.get_actions())
rand_agent = RandomAgent(actions=mdp.get_actions())
The real meat of simple_rl are the functions that run experiments. The first of which takes a list of agents and an mdp and simulates their interaction:
In [3]:
# Run experiment and make plot.
run_agents_on_mdp([ql_agent, rand_agent], mdp, instances=5, episodes=100, steps=40, reset_at_terminal=True, verbose=False)
We can throw R-Max, introduced by [Brafman and Tennenholtz, 2002] in the mix, too:
In [43]:
from simple_rl.agents import RMaxAgent
rmax_agent = RMaxAgent(actions=mdp.get_actions(), horizon=3, s_a_threshold=1)
# Run experiment and make plot.
run_agents_on_mdp([rmax_agent, ql_agent, rand_agent], mdp, instances=5, episodes=100, steps=20, reset_at_terminal=True, verbose=False)
Each experiment we run generates an Experiment object. This facilitates recording results, making relevant files, and plotting. When the run_agents...
function is called, a results dir is created containing relevant experiment data. There should be a subdirectory in results named after the mdp you ran experiments on -- this is where the plot, agent results, and parameters.txt file are stored.
All of the above code is contained in the simple_example.py file.
First let's make a FourRoomMDP from [Sutton, Precup, Singh 1999], which is more visually interesting than a grid world.
In [ ]:
from simple_rl.tasks import FourRoomMDP
four_room_mdp = FourRoomMDP(9, 9, goal_locs=[(9, 9)], gamma=0.95)
# Run experiment and make plot.
four_room_mdp.visualize_value()
Or we can visualize a policy:
Both of these are in examples/viz_example.py. If you need pygame in anaconda, give this a shot:
> conda install -c cogsci pygame
If you get an sdl font related error on Mac/Linux, try:
> brew update sdl && sdl_tf
We can also make grid worlds with a text file. For instance, we can construct the grid problem from [Barto and Pickett 2002] by making a text file:
--w-----w---w----g
--------w---------
--w-----w---w-----
--w-----w---w-----
wwwww-wwwwwwwww-ww
---w----w----w----
---w---------w----
--------w---------
wwwwwwwww---------
w-------wwwwwww-ww
--w-----w---w-----
--------w---------
--w---------w-----
--w-----w---w-----
wwwww-wwwwwwwww-ww
---w-----w---w----
---w-----w---w----
a--------w--------
Then, we make a grid world out of it:
In [30]:
from simple_rl.tasks.grid_world import GridWorldMDPClass
pblocks_mdp = GridWorldMDPClass.make_grid_world_from_file("pblocks_grid.txt", randomize=False)
pblocks_mdp.visualize_value()
Which Produces:
There's also a Taxi MDP, which is actually built on top of an Object Oriented MDP Abstract class from [Diuk, Cohen, Littman 2008].
In [4]:
from simple_rl.tasks import TaxiOOMDP
from simple_rl.run_experiments import run_agents_on_mdp
from simple_rl.agents import QLearningAgent, RandomAgent
# Taxi initial state attributes..
agent = {"x":1, "y":1, "has_passenger":0}
passengers = [{"x":3, "y":2, "dest_x":2, "dest_y":3, "in_taxi":0}]
taxi_mdp = TaxiOOMDP(width=4, height=4, agent=agent, walls=[], passengers=passengers)
# Make agents.
ql_agent = QLearningAgent(actions=taxi_mdp.get_actions())
rand_agent = RandomAgent(actions=taxi_mdp.get_actions())
Above, we specify the objects of the OOMDP and their attributes. Now, just as before, we can let some agents interact with the MDP:
In [5]:
# Run experiment and make plot.
run_agents_on_mdp([ql_agent, rand_agent], taxi_mdp, instances=5, episodes=100, steps=150, reset_at_terminal=True)
More on OOMDPs in examples/oomdp_example.py
I've added a few markov games, including rock paper scissors, grid games, and prisoners dilemma. Just as before, we get a run agents method that simulates learning and makes a plot:
In [75]:
from simple_rl.run_experiments import play_markov_game
from simple_rl.agents import QLearningAgent, FixedPolicyAgent
from simple_rl.tasks import RockPaperScissorsMDP
import random
# Setup MDP, Agents.
markov_game = RockPaperScissorsMDP()
ql_agent = QLearningAgent(actions=markov_game.get_actions(), epsilon=0.2)
fixed_action = random.choice(markov_game.get_actions())
fixed_agent = FixedPolicyAgent(policy=lambda s:fixed_action)
# Run experiment and make plot.
play_markov_game([ql_agent, fixed_agent], markov_game, instances=10, episodes=1, steps=10)
Recently I added support for making OpenAI gym MDPs. It's again only a few lines of code:
In [53]:
from simple_rl.tasks import GymMDP
from simple_rl.agents import LinearQLearningAgent, RandomAgent
from simple_rl.run_experiments import run_agents_on_mdp
# Gym MDP.
gym_mdp = GymMDP(env_name='CartPole-v0', render=False) # If render is true, visualizes interactions.
num_feats = gym_mdp.get_num_state_feats()
# Setup agents and run.
lin_agent = LinearQLearningAgent(gym_mdp.get_actions(), num_features=num_feats, alpha=0.2, epsilon=0.4, rbf=True)
run_agents_on_mdp([lin_agent], gym_mdp, instances=3, episodes=1, steps=50)
In [ ]: