Parallel Value Iteration

We assume you've read the documentation on Serial Value Iteration. Otherwise, go back there and understand it before coming back.

Quick and dirty example

To reap the benefits of Julia’s parallel computing framework for value iteration, we need a few more steps. The main issue we have to get around is code availability when we add processes. But we’ll skip an in-depth explanation and just go straight to what we can do.

We consider a quick and dirty example of running the exact same code as in the MDP with T(s, a) type transition on PLite’s parallel value iteration solver.

First, we wrap our existing code under the module ExampleModule (you can name it whatever you want), and save it under the file name ExampleModule.jl.

As our naming scheme suggests, the module and file should share the same name. Next slide shows what should be saved to the file.


In [1]:
#=
module ExampleModule

export
  mdp,
  solver,
  solve,
  getpolicy

using PLite

<constants, mdp definition, state and action space, transition, reward>

# solver options
solver = ParallelValueIteration()
discretize_statevariable!(solver, "x", StepX)

end
=#

In our definition of the parallel value iteration solver, we have an additional keyword argument nthreads indicating how many parallel processes we want to run. The default value is CPU_CORES / 2.

As in the serial solver, PLite.jl needs a definition of the discretization scheme.


In [2]:
#=
solver = ParallelValueIteration()
discretize_statevariable!(solver, "x", StepX)
=#

CPU_CORES is a Julia standard library constant, and it defaults to the number of CPU cores in your system. But the number of cores given usually includes virtual cores (e.g., Intel processors), so we divide by two to obtain the number of physical cores.

There isn’t an issue with increasing the number of cores. But since we have the same number of cores doing the same number of work, there won’t be an increase in efficiency. In fact, with greater number of threads there may be more overhead and runtime processes. As such, we recommend using as many threads as there are physical cores on the machine. In the case of the parallel solver, we can define


In [3]:
#=
solver = ParallelValueIteration(
  tol=1e-6,
  maxiter=10000,
  discount=0.999,
  verbose=false,
  nthreads=10)
=#

Notice also that we need to export a few functions and variables in order for the module to work. Don't worry too much about it--just make sure you have these in your module.


In [4]:
#=
export
  mdp,
  solver,
  solve,
  getpolicy
=#

Solution and policy extraction

Finally, on the console or Jupyter notebook, we just input the following. The resulting policy is no different than the one obtained using serial value iteration.


In [5]:
const NThreads = round(Int, CPU_CORES / 2)
addprocs(NThreads - 1)  # -1 to account for existing process

push!(LOAD_PATH, ".")
push!(LOAD_PATH, "../src")

using ExampleModule

# generate results
solution = solve(mdp, solver)
policy = getpolicy(mdp, solution)


INFO: mdp and value iteration solver passed basic checks
INFO: value iteration solution generated
cputime [s] = 2.459400272000004
number of iterations = 460
residual = 9.842572909793015e-5
Out[5]:
policy (generic function with 1 method)

Warning

Note that for a small problem such as ours, the gain will not be apparent. In fact, it may be worse due to the additional overhead.

Consider using this for problems with large state and action spaces, or problems with complex transition or reward functions.