Copyright 2019 The TensorFlow Authors

Licensed under the Apache License, Version 2.0 (the "License");

// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

View source on GitHub

Introduction

This Colab will use the maze below, and have the Robot walk randomly until it either falls into a hole or reaches goal.

For a detailed description of how this maze or how the GridMaze Environment works in general, please see the Colab.

Installing OpenSpiel and imports



In [0]:

    
%install '.package(url: "https://github.com/deepmind/open_spiel", .branch("master"))' OpenSpiel
import TensorFlow
import OpenSpiel

Creating the GridMaze Environment

The following code creates the maze environment depicted above.



In [0]:

    
var maze = GridMaze(rowCount: 6, columnCount: 6)
maze[1, 1] = GridCell.start(reward: -1.0)
maze[2, 2] = GridCell.hole(reward: -100)
maze[2, 3] = GridCell.space(reward: -2.0)
maze[3, 2] = GridCell.space(reward: -2.0)
maze[3, 4] = GridCell.hole(reward: -100)
maze[4, 0] = GridCell.space(reward: -1.0)
maze[4, 2] = GridCell.bounce
maze[4, 4] = GridCell.goal(reward: 1.0)
maze[4, 5] = GridCell.space(reward: -1.0)
maze[2, 4].entryJumpProbabilities = [(.Relative(1, 0), 0.5), (.Welcome, 0.5)]

Printing the Maze

The maze environment has now been created, it can be displayed by calling printMazeAndTable



In [0]:

    
maze.printMazeAndTable(header: "Maze Environment")

Random Walk Loop

The below code make the Robot walk around randomly in the maze. In this process, it will eventually reach one of two terminal states:

Hole
Goal

The main iteration loop will continue until the Robot reaches goal. In other words, if it falls into a hole, the loop will start over until it reaches the goal state.



In [0]:

    
var gamesPlayed = 0
var continuePlaying = true
while continuePlaying {
  gamesPlayed += 1
  var gameState = maze.initialState
  var actionSequence = [String]()
  
  // Let the Robot take actions (walk betwen cells) until it either reaches GOAL
  // or falls into a HOAL (those are the only two terminal cells)
  while !gameState.isTerminal {
    
    // The current informationState, denoting the cell position the Robot is
    // currently at. If you create a learning algorithm, this will be the key
    // to v-/q-/policy-tables: 
    //    let currentInformationState = gameState.informationStateString()
    // This will be extensively used in learning algorithms in later tutorials.
    
    // Select a random action from the legal ones in this state
    // The GridMaze.Action enum has members: .LEFT, .UP, .DOWN, .RIGHT
    // Since some cells cannot be even attempted to be entered (e.g. WALL), all
    // gameStates may not return all four members
    let actionIndex = Int.random(in: 0..<gameState.legalActions.count)
    let actionToTake = gameState.legalActions[actionIndex]
    
    // We now have an actionToTake from current informationState
    // Let's have Robot move in that direction!
    gameState.apply(actionToTake)
    
    // Store the action taken to print that later
    actionSequence.append(actionToTake.description)
    
    // If Robot made it to GOAL then we're done
    if gameState.isGoal {
      print("*** AWESOME. Robot made it to Goal at game: \(gamesPlayed) with reward/cost: \(gameState.utility(for: .player(0)))")
      print("    Sequence of actions used to solve the maze: \(actionSequence)")
      continuePlaying = false
    } else if gameState.isTerminal {
      print("Robot failed to solve the maze at game: \(gamesPlayed) with reward/cost: \(gameState.utility(for: .player(0)))")
    }
  }
}

Summary

This tutorial demonstrated the Robot talking random actions to eventually reach the goal state. Later tutorials will cover more advanced examples, for example how to create generic algorithms the Robot will use to optimally find a solution to any maze problem (or other environment problems for that matter).

Join the community!

If you have any questions about Swift for TensorFlow, Swift in OpenSpiel, or would like to share your work or research with the community, please join our mailing list swift@tensorflow.org.