Department of Economics, University of Tokyo
From Miranda and Fackler, Applied Computational Economics and Finance, 2002, Section 7.6.3
Julia translation of the Python version
The model is formulated with finite horizon in Section 7.2.1, but solved with infinite horizon in Section 7.6.1. Here we follow the latter.
In [1]:
using QuantEcon
using Plots
plotlyjs();
In [2]:
price = 1 # Market price of ore
sbar = 100 # Upper bound of ore stock
beta = 0.9 # Discount rate
n = sbar + 1 # Number of states
m = sbar + 1 # Number of actions
# Cost function
c(s, x) = (x^2) / (1+s);
This approch sets up the reward array R and the transition probability array Q as a 2-dimensional array of shape (n, m) and a 3-simensional array of shape (n, m, n), respectively, where the reward is set to $−∞$ for infeasible state-action pairs (and the transition probability distribution is arbitrary for those pairs).
Reward array:
In [3]:
R = Matrix{Float64}(n, m)
for j in 1:m
for i in 1:n
if j <= i
R[i, j] = price*(j-1) -c(i-1, j-1)
else
R[i, j] = -Inf
end
end
end
(Degenerate) transition probability array:
In [4]:
Q = zeros(n, m, n)
for j in 1:m
for i in 1:n
if j <= i
Q[i, j, i-j+1] = 1.0
else
Q[i, j, 1] = 1.0
end
end
end
Set up the dynamic program as a DiscreteDP instance:
In [5]:
ddp = DiscreteDP(R, Q, beta);
Solve the optimization problem with the solve method, which by defalut uses the policy iteration algorithm:
In [6]:
# Solve the dynamic optimization problem (by policy iteration)
res = solve(ddp, PFI);
The number of iterations:
In [7]:
res.num_iter
Out[7]:
The controlled Markov chain is stored in res.mc. To simulate:
In [8]:
res.mc.state_values = 0:100
Out[8]:
In [9]:
# Simulate the controlled Markov chain
nyrs = 15
spath = simulate(res.mc, nyrs+1, init = sbar + 1)
Out[9]:
In [10]:
p1 = plot(0:sbar, res.v, xlabel="Stock", ylabel="Value", title="Optimal Value Function")
p2 = plot(0:sbar, res.sigma, xlabel="Stock", ylabel="Extraction", title="Optimal Extraction Policy")
p3 = plot(0:nyrs+1, spath, xlabel="Year", ylabel="Stock", title="Optimal State Path")
plot(p1, p2, p3, legend = false)
Out[10]:
This approach assigns the rewards and transition probabilities only to feaslbe state-action pairs, setting up R and Q as a 1-dimensional array of length L and a 2-dimensional array of shape (L, n), respectively.
We need the arrays of feasible state and action indices:
In [11]:
# Arrays of feasible state and action indices
s_indices = []
a_indices = []
for j in 1:m
for i in 1:n
if i - j >= 0
push!(s_indices, i)
push!(a_indices, j)
end
end
end
L = length(s_indices)
Out[11]:
In [12]:
s_indices
Out[12]:
In [13]:
a_indices
Out[13]:
Reward vector:
In [14]:
R_sa = zeros(L)
for (i , (s, x)) in enumerate(zip(s_indices.-1, a_indices.-1))
R_sa[i] = price*x - c(s, x)
end
(Degenerate) transition probability array:
In [15]:
Q_sa = Matrix{Float64}(L,n)
for (i, s) in enumerate(s_indices)
x = a_indices[i]
Q_sa[i, s-x+1] = 1
end
Set up the dynamic program:
In [16]:
ddp_sa = DiscreteDP(R_sa, Q_sa, beta, s_indices, a_indices);
Solve the optimization problem with the solve method, which by defalut uses the policy iteration algorithm:
In [17]:
res_sa = solve(ddp_sa, PFI);
Number of iterations:
In [18]:
res_sa.num_iter
Out[18]:
Simulate the controlled markov chain:
In [19]:
res_sa.mc.state_values = 0:100
nyrs = 15
spath_sa = simulate(res_sa.mc,nyrs+1,init = sbar + 1)
Out[19]:
Draw the graphs:
In [20]:
p1_sa = plot(0:sbar, res_sa.v, xlabel="Stock", ylabel="Value", title="Optimal Value Function")
p2_sa = plot(0:sbar, res_sa.sigma, xlabel="Stock", ylabel="Extraction", title="Optimal Extraction Policy")
p3_sa = plot(0:nyrs+1, spath_sa, xlabel="Year", ylabel="Stock", title="Optimal State Path")
plot(p1_sa, p2_sa, p3_sa, legend=false)
Out[20]: