DiscreteDP Example: Mine Management

Masashi Takahashi, Iori Saito

Department of Economics, University of Tokyo

From Miranda and Fackler, Applied Computational Economics and Finance, 2002, Section 7.6.3

Julia translation of the Python version

The model is formulated with finite horizon in Section 7.2.1, but solved with infinite horizon in Section 7.6.1. Here we follow the latter.



In [1]:

    
using QuantEcon
using Plots
plotlyjs();









    






 
 Plotly javascript loaded.
 To load again call 
init_notebook(true)



In [2]:

    
price = 1     # Market price of ore
sbar = 100    # Upper bound of ore stock
beta = 0.9    # Discount rate
n = sbar + 1  # Number of states
m = sbar + 1  # Number of actions

# Cost function
c(s, x) = (x^2) / (1+s);

Product formulation

This approch sets up the reward array R and the transition probability array Q as a 2-dimensional array of shape (n, m) and a 3-simensional array of shape (n, m, n), respectively, where the reward is set to $−∞$ for infeasible state-action pairs (and the transition probability distribution is arbitrary for those pairs).

Reward array:



In [3]:

    
R = Matrix{Float64}(n, m)
for j in 1:m
    for i in 1:n
        if j <= i
            R[i, j] = price*(j-1) -c(i-1, j-1)
        else
            R[i, j] = -Inf
        end
    end
end

(Degenerate) transition probability array:



In [4]:

    
Q = zeros(n, m, n)
for j in 1:m
    for i in 1:n
        if j <= i
            Q[i, j, i-j+1] = 1.0
        else
            Q[i, j, 1] = 1.0
        end
    end
end

Set up the dynamic program as a DiscreteDP instance:



In [5]:

    
ddp = DiscreteDP(R, Q, beta);

Solve the optimization problem with the solve method, which by defalut uses the policy iteration algorithm:



In [6]:

    
# Solve the dynamic optimization problem (by policy iteration)
res = solve(ddp, PFI);

The number of iterations:



In [7]:

    
res.num_iter









    Out[7]:




9

The controlled Markov chain is stored in res.mc. To simulate:



In [8]:

    
res.mc.state_values = 0:100









    Out[8]:





0:100



In [9]:

    
# Simulate the controlled Markov chain
nyrs = 15
spath = simulate(res.mc, nyrs+1, init = sbar + 1)









    Out[9]:





16-element Array{Int64,1}:
 100
  76
  58
  44
  33
  25
  19
  14
  11
   8
   6
   4
   3
   2
   1
   0



In [10]:

    
p1 = plot(0:sbar, res.v, xlabel="Stock", ylabel="Value", title="Optimal Value Function")
p2 = plot(0:sbar, res.sigma, xlabel="Stock", ylabel="Extraction", title="Optimal Extraction Policy")
p3 = plot(0:nyrs+1, spath, xlabel="Year", ylabel="Stock", title="Optimal State Path")
plot(p1, p2, p3, legend = false)









    Out[10]:

State-action pairs formulation

This approach assigns the rewards and transition probabilities only to feaslbe state-action pairs, setting up R and Q as a 1-dimensional array of length L and a 2-dimensional array of shape (L, n), respectively.

We need the arrays of feasible state and action indices:



In [11]:

    
# Arrays of feasible state and action indices
s_indices = []
a_indices = []
for j in 1:m
    for i in 1:n
        if i - j >= 0
            push!(s_indices, i)
            push!(a_indices, j)
        end
    end
end

L = length(s_indices)









    Out[11]:




5151



In [12]:

    
s_indices









    Out[12]:





5151-element Array{Any,1}:
   1
   2
   3
   4
   5
   6
   7
   8
   9
  10
  11
  12
  13
   ⋮
 100
 101
  98
  99
 100
 101
  99
 100
 101
 100
 101
 101



In [13]:

    
a_indices









    Out[13]:





5151-element Array{Any,1}:
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   ⋮
  97
  97
  98
  98
  98
  98
  99
  99
  99
 100
 100
 101

Reward vector:



In [14]:

    
R_sa = zeros(L)
for (i , (s, x)) in enumerate(zip(s_indices.-1, a_indices.-1))
    R_sa[i] = price*x - c(s, x)
end

(Degenerate) transition probability array:



In [15]:

    
Q_sa = Matrix{Float64}(L,n)
for (i, s) in enumerate(s_indices)
    x = a_indices[i]
    Q_sa[i, s-x+1] = 1
end

Set up the dynamic program:



In [16]:

    
ddp_sa = DiscreteDP(R_sa, Q_sa, beta, s_indices, a_indices);

Solve the optimization problem with the solve method, which by defalut uses the policy iteration algorithm:



In [17]:

    
res_sa = solve(ddp_sa, PFI);

Number of iterations:



In [18]:

    
res_sa.num_iter









    Out[18]:




9

Simulate the controlled markov chain:



In [19]:

    
res_sa.mc.state_values = 0:100
nyrs = 15
spath_sa = simulate(res_sa.mc,nyrs+1,init = sbar + 1)









    Out[19]:





16-element Array{Int64,1}:
 100
  76
  58
  44
  33
  25
  19
  14
  11
   8
   6
   4
   3
   2
   1
   0

Draw the graphs:



In [20]:

    
p1_sa = plot(0:sbar, res_sa.v, xlabel="Stock", ylabel="Value", title="Optimal Value Function")
p2_sa = plot(0:sbar, res_sa.sigma, xlabel="Stock", ylabel="Extraction", title="Optimal Extraction Policy")
p3_sa = plot(0:nyrs+1, spath_sa, xlabel="Year", ylabel="Stock", title="Optimal State Path")
plot(p1_sa, p2_sa, p3_sa, legend=false)









    Out[20]: