Copyright 2019 The TensorFlow Authors

Licensed under the Apache License, Version 2.0 (the "License");

// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

GridMaze Environment

View source on GitHub

This environment enables you to create and learn to solve grid mazes. Solving mazes is a nice problem to start with when learning the basics of reinforcement learning.

Introduction

The problem description of the grid maze environment is that you have a Robot located on a starting cell in a grid maze world. By moving Up, Down, Left, or Right, the robot should figure out the best strategy to move from the start cell to the goal cell of the maze. The robot gets a reward (positive value) or a cost (negative value) when moving to a new cell. The purpose of the game is for the robot to figure out how to move from the start to goal position and getting as much reward (or as little cost) as possible.

Each cell in the grid world have different functions:

SPACE: The robot is free to enter this cell, but there is at a certain cost for doing so (e.g. -1).
START: This is where the robot starts, like SPACE it can also be entered (returned to), at a cost. A maze must only have one START cell.
GOAL: The robot successfully solves the maze and typically gets a reward by walking to this cell. A maze can have more than one goal cell.
WALL: This cell cannot be entered.
BOUNCE: The robot can attempt to enter this cell, but doing so will make it bounce back to the cell it came from, and get the associated reward/cost of that cell.
HOLE: The robot enters this cell at a typically big cost, and the game is over. In other words, it fails to reach the GOAL cell.

Let's look at an example:

Describing each location in the maze by [row, column]:

Robot starts out in cell [1, 1]. From this cell it can only move right or down, since a wall cell cannot be entered. Going right or down will cost -1.
The goal cell is at [4, 4]. Entering this cell gives the robot 10 in reward, and the robot has successfully solve the maze (the "T" menas "terminal cell").
Entering the hole in [2, 2] will cost the robot -100 and it is a terminal state (game over). Since robot did not reach goal cell, it lost. Likewise for the hole at [3, 4].
Entering one of the space cells at position [2, 3] and [3, 2] each has a cost of -2.
The space cell at [4, 0] is a bit special. Since the cell at [4, 5] is also space, it means the robot can go left in [4, 0] and end up in [4, 5]. Had [4, 5] been a wall, going left from [4, 0] would not have been a legal action.
Cell [4, 2] is a bounce cell. This means attempting to enter it from [3, 2], [4, 1], or [4, 3] results in robot bouncing back to those cells. Cost will be -2 from [3, 2] and -1 from [4, 1] and [4, 3].

Now what about the white arrows with 50%? This indicates that cell [2, 4] has random behavior. If the robot attempts to enter cell [2, 4], it will with 50% probability end up in that cell, but it might also well end up in cell [3, 4] with 50% probability.

That's it for an overview of the environment, let's look at how interact with the environment through code!

Installing OpenSpiel and imports



In [0]:

    
%install '.package(url: "https://github.com/deepmind/open_spiel", .branch("master"))' OpenSpiel
import TensorFlow
import OpenSpiel

Creating a GridMaze Environment

This is how you create the maze environment in the picture above:



In [0]:

    
var maze = GridMaze(rowCount: 6, columnCount: 6)
maze[1, 1] = GridCell.start(reward: -1.0)
maze[2, 2] = GridCell.hole(reward: -100)
maze[2, 3] = GridCell.space(reward: -2.0)
maze[3, 2] = GridCell.space(reward: -2.0)
maze[3, 4] = GridCell.hole(reward: -100)
maze[4, 0] = GridCell.space(reward: -1.0)
maze[4, 2] = GridCell.bounce
maze[4, 4] = GridCell.goal(reward: 1.0)
maze[4, 5] = GridCell.space(reward: -1.0)
maze[2, 4].entryJumpProbabilities = [(.Relative(1, 0), 0.5), (.Welcome, 0.5)]

GridMaze defines a subscript for accessing cells in the environment: use maze[row, column]. By default, when you create and environment the cells will be surrounded by GridCell.wall cells, and the free space inside it will be GridCell.space with cost -1 (see below for more information on changing this default behavior).

This code is pretty straightforward, except the last statement regarding the entryJumpProbabilities that needs some more explanation. For any grid cell, you can specify the jump probabilities as follows:

maze[row,col].entryJumpProbabilities = [
   // This is a list of jump probabilities
   // Each element is a pair, the first element specifies the jump behavior and the second the probability of it happening

   (.Welcome, 0.25),        // With 0.25 probability, welcome the robot to this cell
   (.BounceBack, 0.25),     // With 0.25 probability, make the robot bounce back to cell it was standing on
   (.Relative(1,0), 0.25),  // With 0.25 probability, jump to the next row (1), same column (0) in the matrix
   (.Absolute(1,1), 0.25),  // With 0.25 probability, jumpt to cell at row 1, column 1  
]

You can also omit specifying the entryJumpProbabilities, in case the default for the cell type will apply, i.e.

GridCell.space, .start, .goal, and .hole will be [(.Welcome, 1.0)]
GridCell.bounce will be [(.BounceBack), 1.0]

A couple of rules related to entryJumpProbabilities

If you do specify this field, it must be either have 0 elements (== 100% .Welcome), or probabilities in list must sum to 1.0.
When using .Relative or .Absolute, the target cell must be 100% .Welcome. The environment current does not support sequenced jumps.

There is also currently limilted environment validation (contributions are welcome). In other words, make sure you index cells that exist, and if you're doing entryJumpProbabilities, that any .Relative or .Absolute target cell exist (and does not in turn have entryJumpProbabilities).

The init function of GridMaze looks like this

public init(
  rowCount: Int,
  columnCount: Int,
  borderCell: GridCell = GridCell.wall,
  innerCell: GridCell = GridCell.space(reward: -1.0)

As you can see, the size of the environment must be specified through the rowCount and columnCount paramaters. Using default parameter values, the environment will be surrounded by GridCell.wall cells, and the other cells will be GridCell.space with a cost of -1.0. You can change the default by specifying these parameters, for example surround it with bounce cells by providing GridCell.bounce instead of the default GridCell.wall.

Printing a Maze

After a maze has been created, it can be printed by calling printMazeAndTable



In [0]:

    
maze.printMazeAndTable(header: "Maze Environment")

This function prints:

Maze with all cells and their rewards. Note that WALL and BOUNCE do not have rewards, since they cannot be entered).
Letter 'T' indicates that the cell is terminal
A cell that contains '(JS)' means it has a JumpSpecification, which is detailed before the maze printout.

Printing V-/Q-Values and Policy

(if you are not familiar with these concepts, then please see the OpenSpiel Swift tutorials).

It is also possible to print a v-/q-/policy-table together with the maze.

In the below code:

vtable is a mapping from state to the value in that state
qtable is a mapping from state to a value for each of the legal actions from that state
ptable is a mapping from state to the action the policy takes from that state



In [0]:

    
var vtable = [String: Double]()
var qtable = [String: [(action: GridMaze.Action, qvalue: Double)]]()
var policyTable = [String: [GridMaze.Action]]()
maze.printMazeAndTable(header: "--- Printing Everything", vtable: vtable, qtable: qtable, ptable: policyTable)

Since these tables do not contain any data for the cells

vtable will print the value 0 for each state
qtable will print a uniform distribution across the legal actions from each cell
ptable will print '?' since there is no policy for any of the states

Creating Custom Cells

The cell types we've looked at so far (GridCell.start, .goal, .space, .hole, .bounce) are all just functions and properties that create GridCell objects with certain parameters. You can create your own cell behaviors by calling GridCell.init with the desired behavior. See the GridMaze.swift source code to understand how this works.

Putting it all together

The file main.swift is a complete example of a Robot walking randomly through the maze trying to reach the goal cell. You can also open it as a Colab.

The OpenSpiel Swift tutorials directory is also continuously populated with more tutorials.

Join the community!

If you have any questions about Swift for TensorFlow, Swift in OpenSpiel, or would like to share your work or research with the community, please join our mailing list swift@tensorflow.org.