PyShop Session 3

Advanced Topics in Numerical Methods and Array Manipulation.

This session delves more deeply into the NumPy and SciPy libraries. Numpy itself is mainly the ndarray object, but this has many features and we will cover its indexing and methods. SciPy builds on NumPy by adding a host of numerical algorithms. In order to make the session more fun, we will use NumPy and SciPy to solve an applied problem. We will construct a program for solving the finite element solution of a simple RBC model. If time permits, we will then try to vectorize this problem to get some speed up.

A Quick Introduction to Finite Elements

In economics we are often faced with large, non-linear problems which we would like to solve numerically. We do this in one of two ways: locally or globally. A local approximation is the simplest method and is taught in most first year graduate macro courses, in the form of linearizations. This can also be accomplished through perturbations. However, we sometimes are interested in the solution far from the steady state. In this case, we look for global solutions. Most graduate students are familiar with value- or policy- function iteration, which solve for a policy function over the entire state space. Another global method is called "Finite Elements", or "Projection Methods".

Projection methods get their name from the assumption of a parameterized policy function. This reduces the dimensionality of the problem from an infinite dimensional one to finite dimensions. The name Finite Elements comes from the idea that, given this approximation the solution will be inexact. However, a Galerkin weighted residual approximation will be exact, and thus we can write an integral equation. We then break the state space into "elements", over which we integrate. In this way, we break our problem down to a sequence of smaller problems.

We'll work with the simplest example from McGratten, 1993, the first economics paper on Finite Elements. Consider an economy with a representative agent who must choose a capital stock each period. Productivity follows a finite state markov chain, capital depreciates at a rate $\delta$, and preferences are CRRA with risk aversion parameter $\gamma$. Their problem can be written as

$$ \begin{equation*} \begin{aligned} & \underset{\{ c_t,k_{t+1} \}_{t=0}^\infty}{\text{max}} & & \mathbb{E} {\sum_{t=0}^{\infty}} \beta \frac{c_t^{1-\gamma}}{1-\gamma} \\ & \text{s.t.} & & y_t = a_tk_t^\alpha \\ & & & k_{t+1} = (1 - \delta)k_t + i_t \\ & & & a_t \in \{a_1, ... , a_I\} \\ & & & \mathbb{P}[a_{t+1} = a_j | a_t = a_i] = \Pi_{i,j} \end{aligned} \end{equation*} $$

We can solve this problem to arrive at the euler equation and law of motion for capital (the steps are left to the reader):

$$ \begin{align*} c_t^{-\gamma} &= \beta \sum_{i=1}^{I} \Pi_{ij} \left[ c_{t+1}^{-\gamma} \left(1 - \delta + \alpha a_ik_{t+1}^{\alpha - 1} \right) \right]\\ k_{t+1} &= a_ik_t^\alpha +(1 - \delta)k_t - c_t \end{align*} $$

for all $i, j$.

At this point we cannot solve this problem exactly, as it is highly non-linear. In order to make it more simple, we can assume that the capital decision rule can be approximated by a parameterized function:

$$ \begin{align*} \hat{k}_{t+1} &= \sum_{l = 1}^{L} N_l(k_t)\kappa_l \end{align*} $$

where the $\kappa_l$ are constants

We don't need to make much assumptions about the functions $N_l(\cdot)$, other than that they form a complete basis for the state space. The easiest and most natural method is to use a polynomial basis, eg chebyshev polynomials. However, these functions are not very clear to the first time reader, so we will use a linear basis function:

\begin{align*} N_l(k) &= \left\{ \begin{array}{lll} \frac{k - k_{l-1}}{k_l - k_{l-1}} & k_{l-1}\leq k \leq k_l \\ \frac{k_{l+1} - k}{k_{l-1} - k_l} & k_{l}\leq k \leq k_{l + 1} \\ 0 & else \end{array} \right. \end{align*}

This implies that capital can be approximated as a linear interpolant between two constants:

\begin{align*} \hat{k}(k) &= \frac{k_{l+1} - k}{k_{l-1} - k_l}\kappa_l^i + \frac{k - k_{l}}{k_{l + 1} - k_{l}}\kappa_{l+1}^i & k\in[k_l, k_{l+1}] \end{align*}

Given this, we can calculate $c_t$, $c_{t+1}$, and $\hat{k_{t+1}}$ conditional on a value for $\kappa$. However, our euler equation is no longer exact. But, we can write the Galerkin weak form integral equation (where $R$ is the euler residual):

\begin{align*} \sum_{i = 1}^{I}\int_{k_0}^{k^L}w(k,i)R(k,i;\kappa)dk =0 \end{align We define the weighting function to be the same linear interpolator: \begin{align*} \sum_{i = 1}^{I} \sum_{a = 1}^{L} w_a^i \left \{\sum_{l=1}^{L - 1} \int_{k_l}^{k_{l+1}}N_a(k)R(k,i;\kappa)dk\right \} =0 \end{align*}

If this holds for all values of $w$, then everything inside the braces must be zero. Finally, we arrive at the system of equations we need to solve:

\begin{align*} \sum_{l=1}^{L - 1} \int_{k_l}^{k_{l+1}}N_a(k)R(k,i;\kappa)dk & & \forall i, a \end{align*}

There are a few things to notice:

The integration is done accross elements.
Since the linear interpolator is zero off of the element and its upper neighbor, the terms of the sum are mainly zeros.
We can calculate the integrals individually and then reconstruct the residual functions.

We'll approximate the integral by gaussian quadrature and use a built in solver to find the solution.

Pre-Programming

Before we set out to solve our problem numerically, we are going to write down EXACTLY what we need to solve. This will allow us to avoid a lot of head scratching along the way. Then we'll try to speed up our code by using NumPy's vectorized functions.

First of all, notice that for a value of $a$, $N_a(\cdot)$ is zero for almost all of the terms. Thus, we have the following system of equations

\begin{align*} \sum_{l=a}^{a + 1} \int_{k_l}^{k_{l+1}}N_a(k)R(k,i;\kappa)dk & & \forall i, a \end{align*}

Next, recall that using gaussian quadrature (which we can generate using a NumPy function) for $M$ quadrature points, we have

\begin{align*} \sum_{l=a}^{a + 1} \sum_{m=1}^{M} N_a(k_l^m)R(k_l^m,i;\kappa)\omega^m & & \forall i, a \end{align*}

This is a much simplified expression of what we need to solve. Our first shot at the solution will follow the following steps. We'll loop over all $i\in \{1,...,I\}$ and all $l\in \{1, ...,L-1\}$ and do the following:

Scale the abcissas to the state space.
Calculate policy based on a coefficient input.
Calculate consumption and output based on that input.
For all possible outcomes of the state next period, calculate the next period's policy based on the same coefficient input.
Calculate next period's consumption and output based on that calculation.
Calculate the conditional expectation using step 5.
Generate the residual over the current element and add it to the main vector.

If that doesn't seem super clear, it will once we get to programming!

The Looped Application

Here is the looped function that I've used. I copied it from McGratten's original fortran code.



In [1]:

    
#To avoid problems in the future, import everything now
import numpy as np
import scipy.optimize
import scipy.linalg
import time



In [2]:

    
print("hello")









    



hello



In [3]:

    
def mcgratten_weighted_residual(coeff):
    """
    A looped function of the McGratten problem.

    Variables:
        coeff   :   ndarray; a vector of coefficients for the parameterized
                    policy function
    Returns:
        ndarray containing the residuals.

    """
    #Unpack the arguments
    alpha, delta, I, L, k, abcissas, M, a, PIE = args
    coeff = coeff.reshape(I, L)

    #Initialize the residuals
    RES = np.zeros((I*L))

    #Construct the vector of residual funcitons
    for i in range(0, I):
        for l in range(0, L - 1):
            #Compute f at all of the quadrature points on the element
            for m in range(0, M):
                x = abcissas[m]
                kt = scalup(x, k[l+1], k[l])
                u = weights[m]
                bs1 = 0.5*(1 - x)
                bs2 = 0.5*(1 + x)

                #Compute capital next period by projection
                kp = coeff[i, l]*bs1 + coeff[i, l+1]*bs2

                #Compute consumption
                y = a[i]*kt**alpha
                c = y + (1 - delta)*kt - kp

                #Find the element for next periods capital stock
                ii = 0
                for h in range(0, L - 1):
                    if kp >= k[h]:
                        ii = h
                k1p = k[ii]
                k2p = k[ii + 1]

                #Calculate next period capital policy
                x = scaldwn(kp, k2p, k1p)
                bs1p = 0.5*(1 - x)
                bs2p = 0.5*(1 + x)

                #Initialize summation Variables
                sum1 = 0.0

                #Compute the summation and derivative simultaneously
                for h in range(0, I):
                    kpp = coeff[h, ii]*bs1p + coeff[h, ii+1]*bs2p
                    yp = a[h]*kp**alpha
                    cp = yp + (1 - delta)*kp - kpp
                    dudcp = cp**(-gamma)
                    dydkp = alpha*a[h]*kp**(alpha - 1)
                    sum1 += PIE[i, h]*dudcp*(dydkp + (1 - delta))

                ind = l + i*L

                #Generate the full residual entry by mulying qua*res*weight
                res = (beta*sum1 - c**(-gamma))*u
                RES[ind] += res*bs1
                RES[ind+1] += res*bs2

    return RES

There are several things you should immediately notice about the structure of this function.

First, it's a function! I defined the residual function so that I could pass it to a numerical optimizer.

Second, you'll notice that it is only a function of the coefficients. This is in order to simplify the passing to the numerical optimizer. Often times, we will have functions with additional arguments or keyword arguments. Many SciPy optimizers can handle this, but some cannot. To avoid problems, I like to define a tuple object containing those values that do not change during optimization, such as preference parameters, discount rates, etc.

Third, this is not very Pythonic. There is a looped structure that can be very slow. Our goal later will be to use vectors to speed this up, but for now we'll talk about what the function is doing.

Let's break the function into sections that we can study individually.

In the next few sections the code I present will not be stand alone, but just repition of parts of the above function to discuss.

Parameter Definition



In [ ]:

    
#Unpack the arguments
alpha, delta, I, L, k, abcissas, M, a, PIE = args
coeff = coeff.reshape(I, L)

#Initialize the residuals
RES = np.zeros((I*L))

The first thing we do is "unpack" the arguments. We need to define an args tuple in the global namespace that contains all of these parameters. alpha is the capital share, delta depreciation rate, I number of discrete states in the markov chain, L the number of points in the partition of capital, k the partition, abcissas a set of gaussian abcissas for estimating the integral, a the different values the markov chain can take, and PIE the markov transition matrix.

Next we initialize a vector that will contain our solution. Here there are $IL$ equations in $IL$ unknowns.

The Loops



In [ ]:

    
#Construct the vector of residual funcitons
for i in range(0, I):
    for l in range(0, L - 1):

Next, we loop over all elements, calculating the residuals. There are $I(L-1)$ elements in our problem.

An Unexpected Loop



In [1]:

    
#Compute f at all of the quadrature points on the element
for m in range(0, M):









    



  File "<ipython-input-1-953a74e804fa>", line 2
    for m in range(0, M):
                         ^
SyntaxError: unexpected EOF while parsing

This isn't as unexpected as it seems. In order to approximate the integral over each element, we use a gaussian quadrature. This uses a polynomial approximation rule. When you tell SciPy you would like quadrature points and weights, you specify the number of points and it returns to you a set of points on $[-1, 1]$ and corresponding weights. It is then your job to scale these to your state space and calculate the function to be integrated at each point.

To this end, we need to define some functions that will map our quadrature points into the corresponding state space and back. Here are those functions:



In [4]:

    
def scalup(points, upper, lower):
    """
    A linear transformation for state variables: [-1,1]-> [k_low, k_high].

    Variables:
        points  :   ndarray; a vector of points on [-1,1]
        upper   :   ndarray; a vector of upper bounds for the element in which
                    each point is to be projected.
        lower   :   ndarray; a vector of lower bounds for the element in which
                    each point is to be projected.

    Returns:
        ndarray cotaining the transformed points
    """
    return (points + 1.)*(upper - lower)/2 + lower


def scaldwn(points, upper, lower):
    """
    A linear transformation for state variables: [k_low, k_high] -> [-1,1].

    Variables:
        points  :   ndarray; a vector of points on [k_low, k_high]
        upper   :   ndarray; a vector of upper bounds for the element in which
                    each point is to be projected.
        lower   :   ndarray; a vector of lower bounds for the element in which
                    each point is to be projected.
    Returns:
        ndarray cotaining the transformed points

    """
    return 2*(points - lower)/(upper - lower) - 1

Now, we can loop through the $M$ quadrature points on each element to calculate the residual function.

Finding the State



In [2]:

    
x = abcissas[m]
kt = scalup(x, k[l+1], k[l])
u = weights[m]
bs1 = 0.5*(1 - x)
bs2 = 0.5*(1 + x)









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-257048b36c48> in <module>()
----> 1 x = abcissas[m]
      2 kt = scalup(x, k[l+1], k[l])
      3 u = weights[m]
      4 bs1 = 0.5*(1 - x)
      5 bs2 = 0.5*(1 + x)

NameError: name 'abcissas' is not defined

Given that we are on element $(i, l)$, studying point $m$ in the list of quadrature points, we need to find our value of the state and calculate the linear interpolators.

These lines do just that, taking x from the list of abcissas, scaling it to the state space on the current element, defining the weight u, and calculating the interpolator.

Calculating Inter- and Intra- temporal



In [ ]:

    
#Compute capital next period by projection
kp = coeff[i, l]*bs1 + coeff[i, l+1]*bs2

#Compute consumption
y = a[i]*kt**alpha
c = y + (1 - delta)*kt - kp

Now, given the state and the current solution for the coefficients, we can calculate the policy function for next period capital. Given this, we use the law of motion for capital and the production function to calculate consumption.

Next Period's Element



In [ ]:

    
#Find the element for next periods capital stock
ii = 0
for h in range(0, L - 1):
    if kp >= k[h]:
        ii = h
k1p = k[ii]
k2p = k[ii + 1]

In order to calculate the conditional expectation, we need to calculate the policy function for capital next period as well. That is, in order to find $c_{t+1}$, we need to calculate $k_{t+2}$. Given our assumption of linear interpolators, we then need to find the element on which to calculate next period's capital.

To do this, we loop through the lower bounds of each element and ask the question: "Is $k_{t+1}$ greater than this lower bound?" If so, we consider that to be the lower bound of the element for next period's policy function.

Once we are done, we then define the bounds for next period's element.

NOTE: The assumption of linear interpolators is what makes this necessary. There are other sets of bases, eg Chebyshev polynomials, which are not functions of the bounds of the element. Because of this you do not need to do this look up operation. In particular, were you doing this on several dimensions, this could become quite a pain. However, linear interpolators are the most conceptually approachable, so they are the easiest to start with.

Calculating Conditional Expectations



In [ ]:

    
#Calculate next period capital policy
x = scaldwn(kp, k2p, k1p)
bs1p = 0.5*(1 - x)
bs2p = 0.5*(1 + x)

#Initialize summation Variables
sum1 = 0.0

#Compute the summation and derivative simultaneously
for h in range(0, I):
    kpp = coeff[h, ii]*bs1p + coeff[h, ii+1]*bs2p
    yp = a[h]*kp**alpha
    cp = yp + (1 - delta)*kp - kpp
    dudcp = cp**(-gamma)
    dydkp = alpha*a[h]*kp**(alpha - 1)
    sum1 += PIE[i, h]*dudcp*(dydkp + (1 - delta))

Given the value of $k_{t+1}$ and the bounds of the element, you can calculate the linear interpolator functions bs1p and bs2p.

Now, you need to calculate the capital policy, production, consumption, marginal utility, and marginal productivity of capital for all possible realizations of the state in $t+1$. To do this, loop over the indices $i = 0, ... , I-1$ and calculate all of these values. Then multiply by the probability of this outcome and add it to the sum.

After the loop you will have calculated the conditional expectation:

$$ \sum_{i=1}^{I} \Pi_{ij} \left[ c_{t+1}^{-\gamma} \left(1 - \delta + \alpha a_ik_{t+1}^{\alpha - 1} \right) \right] $$

Calculate the Residual



In [ ]:

    
ind = l + i*L

#Generate the full residual entry by mulying qua*res*weight
res = (beta*sum1 - c**(-gamma))*u
RES[ind] += res*bs1
RES[ind+1] += res*bs2

Now that you have calculate the functional value at the quadrature point $m$ on element $(i, l)$, you can multiply by the weight and the interpolater. Finally, add it to the residual vector. You'll notice that you are adding it to severl points in the residual vector simultaneously. This is because of the linear intorpolater, which is non-zero on two elements simultaneously.

Running the Code

Now that we've gone through how this code works, let's see it in action. First, we're going to define some functions to generate a steady state and a transition matrix (a seriously sloppy function, but be nice), and define some parameters. Then we'll calculate a partition over the state and some quadrature points. Finally, we'll make an initial guess to the value of capital at the nodes (this is the same as the coefficient values) as half of output. Finally, we package all of our arguments into a tuple and run a jacobian free newton kryolov solver (I like this one both for its speed and its verbosity).



In [8]:

    
def steady(alpha, beta, delta):
    """
    A function to compute the steady state of the simple RBC model.

    Variables:
        alpha   :   float; capital share
        beta    :   float; time discount
        delta   :   float; depreciation rate

    """
    k_star = ((1 - (1 - delta)*beta)/(beta * alpha))**(1/(alpha - 1))
    return k_star, k_star**alpha - delta*k_star


def explicit_transition(states):
    """
    A function to generate an explicit transition matrix.

    Variables:
        states  :   int; number of finite states in the chain

    """
    if states == 5:
        PIE = np.array([[0.1, 0.1, 0.6, 0.1, 0.1],
                        [0.1, 0.1, 0.6, 0.1, 0.1],
                        [0.1, 0.1, 0.6, 0.1, 0.1],
                        [0.1, 0.1, 0.6, 0.1, 0.1],
                        [0.1, 0.1, 0.6, 0.1, 0.1]])
    return PIE

# Step 1: Calibrate Parameters
gamma = 2.0
beta = 0.98
alpha = 0.3
delta = 0.9


# Step 2: Compute steady state
k_star, c_star = steady(alpha, beta, delta)

# Step 3: Generate the state space
#Define upper and lower bound of the region of interest around the steady stat
kmin = 0.5*k_star
kmax = 1.5*k_star

#Define the finite states for the markov chain
I = 5
PIE = explicit_transition(I)
a = np.array([0.95, 0.975, 1.0, 1.025, 1.05])

#Generate a state space vector over k
L = 11
k = np.linspace(kmin, kmax, num=L)

# Step 4: Calculate quadrature weights and abcissas
#Specify number of points in the integral on each element [k_j, k_j+1]
M = 10

#Generate absissas and weights
abcissas, weights = np.polynomial.legendre.leggauss(M)

# Step 5: Guess and solve
#Generate intial guess as half of production at all nodes
coeff0 = np.outer(a, k**alpha)*0.1

#Pack up all of the arguments to pass to the solver
args = (alpha, delta, I, L, k, abcissas, M, a, PIE)

time0 = time.time()
res = scipy.optimize.newton_krylov(mcgratten_weighted_residual, coeff0,
                                   method='lgmres', verbose=True, maxiter=1000,
                                   line_search='armijo')
print(time.time() - time0)









    



0:  |F(x)| = 65.3091; step 1; tol 0.164013
1:  |F(x)| = 21.2687; step 1; tol 0.0954504
2:  |F(x)| = 3.31136; step 1; tol 0.0218159
3:  |F(x)| = 0.119142; step 1; tol 0.00116509
4:  |F(x)| = 0.000200299; step 1; tol 2.54373e-06
5:  |F(x)| = 7.06869e-10; step 1; tol 1.12089e-11
0.6425514221191406



In [9]:

    
print(res.shape)



In [34]:

    
def policy(a_p, k_p, coeff, k, a):
    """
    A function to calculate the capital policy at points.

    Variables:
        points  :   ndarray; a vector of points on
                    [k_low, k_high]x{a1...aI}
        coeff   :   ndarray; a vector of coefficients
        k       :   ndarray; a vector containing the
                    partition of the capital state.
        a       :   ndarray; a vector containing the
                    partition of the productivity state.
    Returns:
        ndarray evaluated funciton

    """
    #Unpack the points
    #k_p = points[:, 0]
    #a_p = points[:, 1]

    #Find the elements for the points
    ll = np.array([np.where(point >= np.append([0], k[:-1]))[0][-1]
                   for point in k_p][0])
    ll[ll > 0] -= 1

    ii = np.array([np.where(point == a)[0] for point in a_p][0])
    #ii[ll > 0] -= 1

    #Define element bounds for next period
    k1p = k[ll]
    k2p = k[ll + 1]

    #Calculate the interpolators
    x = scaldwn(k_p, k2p, k1p)
    bs1 = 0.5*(1 - x)
    bs2 = 0.5*(1 + x)

    #Return the polciy function of the given points
    return coeff[ii, ll]*bs1 + coeff[ii, ll+1]*bs2



In [46]:

    
#Plot the policy function over the state space
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = fig.gca(projection="3d")

a_points, k_points = np.meshgrid(a, k)

kp = policy(np.reshape(a_points, (I*L)), np.reshape(k_points, (I*L)),
            np.array(res), k, a)

kp = np.reshape(kp, (L, I))

ax.plot_surface(a_points, k_points, kp, alpha=0.3)
#cset = ax.contour(k_points, a_points, kp, zdir='z', offset=-2.0)
#cset = ax.contour(C1, C2, utils, zdir='y', offset=10.0)
#cset = ax.contour(C1, C2, utils, zdir='x', offset=1.0)

plt.show()









    



[ 0.13749235]

We can see that our policy function is essentially linear. We knew that, by definition, it would be piecewise linear. We could also expand the state space and increase the number of points to see whether the shape changes:



In [42]:

    
#Increase the number of points
L = 50
k = np.linspace(kmin, kmax, num=L)
coeff0 = np.outer(a, k**alpha)*0.1

#Pack up all of the arguments to pass to the solver
args = (alpha, delta, I, L, k, abcissas, M, a, PIE)

time0 = time.time()
res = scipy.optimize.newton_krylov(mcgratten_weighted_residual, coeff0,
                                   method='lgmres', verbose=True, maxiter=1000,
                                   line_search='armijo')
print(time.time() - time0)









    



0:  |F(x)| = 147.288; step 1; tol 0.1636
1:  |F(x)| = 47.8171; step 1; tol 0.0948574
2:  |F(x)| = 7.43522; step 1; tol 0.0217603
3:  |F(x)| = 0.218998; step 1; tol 0.000780794
4:  |F(x)| = 0.00115691; step 1; tol 2.51167e-05
5:  |F(x)| = 6.47416e-08; step 1; tol 2.81843e-09
3.883786678314209

We can see that we don't get much change. Re-run the earlier cell to get back the old paramteters and we'll move on to vectorizing this function.

Vectorizing

Given our function, we now want to try to speed it up, as you can see from the previous runs, it was quite slow. Vectorizing it will be a good chance for us to see how much NumPy can increase the speed of your code.

The easiest way (for me) to vectorize something is to work around the loops. Each loop can be thought of as a set of stacked operations, which we can try to do in a single pass, avoiding the loop. If you work in this way, it is easiest to start from the inner-most loop and work out. That is, we have loops over i, l, m, and h. We'll start with the loop over h and work our way back.

Here is the loop we would like to vectorize:



In [50]:

    
#Compute the summation and derivative simultaneously
for h in range(0, I):
    kpp = coeff[h, ii]*bs1p + coeff[h, ii+1]*bs2p
    yp = a[h]*kp**alpha
    cp = yp + (1 - delta)*kp - kpp
    dudcp = cp**(-gamma)
    dydkp = alpha*a[h]*kp**(alpha - 1)
    sum1 += PIE[i, h]*dudcp*(dydkp + (1 - delta))









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-50-54612f3f40f0> in <module>()
      1 #Compute the summation and derivative simultaneously
      2 for h in range(0, I):
----> 3     kpp = coeff[h, ii]*bs1p + coeff[h, ii+1]*bs2p
      4     yp = a[h]*kp**alpha
      5     cp = yp + (1 - delta)*kp - kpp

NameError: name 'coeff' is not defined

First, we should ask the question, "What is this loop doing?" It is calculating the conditional expectation of returns on capital next period, discounted by marginal utility.

Given the markov chain randomness in this model, this lends itself very easily to the dot product. We can use NumPy's automatic array operations (which are natively element-wise) to calculate a vector of next period returns ($1 - \delta + \alpha a_ik_{t+1}^{\alpha - 1}$) and then take the dot product with the corresponding row from $\Pi$.

Here is the code, take a look and try to figure out what it is doing and then we will discuss it:



In [52]:

    
kpp = coeff[:, ii]*bs1p + coeff[:, ii+1]*bs2p
yp = a*kp**alpha
cp = yp + (1 - delta)*kp*np.ones(I) - kpp
dudcp = cp**(-gamma)
dydkp = alpha*a*kp**(alpha-1)
sum1 = np.dot(PIE[i, :], dudcp*(dydkp + 1 - delta))









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-52-7df618e75ad1> in <module>()
----> 1 kpp = coeff[:, ii]*bs1p + coeff[:, ii+1]*bs2p
      2 yp = a*kp**alpha
      3 cp = yp + (1 - delta)*kp*np.ones(I) - kpp
      4 dudcp = cp**(-gamma)
      5 dydkp = alpha*a*kp**(alpha-1)

NameError: name 'coeff' is not defined

Got it? Here's how I think of what this is doing. First, notice the parts of the loop that do not change. These are bs1p, bs2p, and kp. You can treat these as scalars. So, you can look at the policy function calculation, kpp, as taking the linear combination of the vectors of coefficients weighted by bs1p and bs2p.

Then you calculate a vector of outputs, each entry corresponding to a different realization of $a$.

Next, calculate consumption, but notice that kp does not vary, so multiply by a vector of ones. Note: NumPy's broadcasting methods could easily handle this, but I find it clearing to include the vector so those reading your code understand.



In [55]:

    
temp = 1.
temp2 = np.array([1, 2, 3])
print(temp2 + temp)









    



[ 2.  3.  4.]

Finally, you simply calculate $\partial_c u(c_{t+1})$ and $\partial_k y(k_{t+1})$ as vectors and calculate the expectation by a straight forward dot product.

Here's the full code for the modified function, which I'll call mcgratten_weighted_residual_vec_1:



In [57]:

    
def mcgratten_weighted_residual_vec_1(coeff):
    """
    A partially vectorized function of the McGratten problem.

    Variables:
        coeff   :   ndarray; a vector of coefficients for the parameterized
                    policy function
    Returns:
        ndarray containing the residuals.

    """
    #Unpack the arguments
    alpha, delta, I, L, k, abcissas, M, a, PIE = args
    coeff = coeff.reshape(I, L)

    #Initialize the residuals
    RES = np.zeros((I*L))

    #Construct the vector of residual funcitons
    for i in range(0, I):
        for l in range(0, L - 1):
            #Compute f at all of the quadrature points on the element
            for m in range(0, M):
                x = abcissas[m]
                kt = scalup(x, k[l+1], k[l])
                u = weights[m]
                bs1 = 0.5*(1 - x)
                bs2 = 0.5*(1 + x)

                #Compute capital next period by projection
                kp = coeff[i, l]*bs1 + coeff[i, l+1]*bs2

                #Compute consumption
                y = a[i]*kt**alpha
                c = y + (1 - delta)*kt - kp

                #Find the element for next periods capital stock
                ii = 0
                for h in range(0, L - 1):
                    if kp >= k[h]:
                        ii = h
                k1p = k[ii]
                k2p = k[ii + 1]

                #Calculate next period capital policy
                x = scaldwn(kp, k2p, k1p)
                bs1p = 0.5*(1 - x)
                bs2p = 0.5*(1 + x)

                #Initialize summation Variables
                sum1 = 0.0

                #Compute the summation and derivative simultaneously
                kpp = coeff[:, ii]*bs1p + coeff[:, ii+1]*bs2p
                yp = a*kp**alpha
                cp = yp + (1 - delta)*kp*np.ones(I) - kpp
                dudcp = cp**(-gamma)
                dydkp = alpha*a*kp**(alpha-1)
                sum1 = np.dot(PIE[i, :], dudcp*(dydkp + 1 - delta))

                ind = l + i*L

                #Generate the full residual entry by mulying qua*res*weight
                res = (beta*sum1 - c**(-gamma))*u
                RES[ind] += res*bs1
                RES[ind+1] += res*bs2

    return RES

Let's run the two optimizations side by side and see the difference in speed, as well as take a norm of the difference in the answers to test for accuracy:



In [64]:

    
time0 = time.time()
res = scipy.optimize.newton_krylov(mcgratten_weighted_residual, coeff0,
                                   method='lgmres', verbose=True, maxiter=1000,
                                   line_search='armijo')
print(time.time() - time0)

time0 = time.time()
res1 = scipy.optimize.newton_krylov(mcgratten_weighted_residual_vec_1, coeff0,
                                    method='lgmres', verbose=True,
                                    maxiter=1000, line_search='armijo')
print(time.time() - time0)
print(np.linalg.norm(res - res1))









    



0:  |F(x)| = 65.3091; step 1; tol 0.164013
1:  |F(x)| = 21.2687; step 1; tol 0.0954504
2:  |F(x)| = 3.31136; step 1; tol 0.0218159
3:  |F(x)| = 0.119142; step 1; tol 0.00116509
4:  |F(x)| = 0.000200299; step 1; tol 2.54373e-06
5:  |F(x)| = 7.06869e-10; step 1; tol 1.12089e-11
0.6594507694244385
0:  |F(x)| = 65.3091; step 1; tol 0.164013
1:  |F(x)| = 21.2687; step 1; tol 0.0954504
2:  |F(x)| = 3.2904; step 1; tol 0.0215406
3:  |F(x)| = 0.108799; step 1; tol 0.000983992
4:  |F(x)| = 0.000182078; step 1; tol 2.52065e-06
5:  |F(x)| = 6.04243e-10; step 1; tol 9.91172e-12
0.9424023628234863
9.60094062534e-12

It's actually slower by $50%$! Why would we ever want to do this!? What a world...?!

Ok, don't panic. The reason you get a slow down here is because the vector dot product and broadcasting can be slower for small arrays. Where you'll get a speed up is when the arrays become large. The only way in this case to see if this vectorization is effective is to increase the number of discrete states, which is heavy. We'll have an easier time on furth loops and you'll see the benefits then.

Overall, that's the process! We'll now go through all the other loops, vectorizing the operations.

The Abcissas Loop

What we'll do is take our previously vectorized function and try to vectorize it further. Here is the loop we want to vectorize:



In [ ]:

    
for m in range(0, M):
    x = abcissas[m]
    kt = scalup(x, k[l+1], k[l])
    u = weights[m]
    bs1 = 0.5*(1 - x)
    bs2 = 0.5*(1 + x)

    #Compute capital next period by projection
    kp = coeff[i, l]*bs1 + coeff[i, l+1]*bs2

    #Compute consumption
    y = a[i]*kt**alpha
    c = y + (1 - delta)*kt - kp

    #Find the element for next periods capital stock
    ii = 0
    for h in range(0, L - 1):
        if kp >= k[h]:
            ii = h
    k1p = k[ii]
    k2p = k[ii + 1]

    #Calculate next period capital policy
    x = scaldwn(kp, k2p, k1p)
    bs1p = 0.5*(1 - x)
    bs2p = 0.5*(1 + x)

    #Initialize summation Variables
    sum1 = 0.0

    #Compute the summation and derivative simultaneously
    kpp = coeff[:, ii]*bs1p + coeff[:, ii+1]*bs2p
    yp = a*kp**alpha
    cp = yp + (1 - delta)*kp*np.ones(I) - kpp
    dudcp = cp**(-gamma)
    dydkp = alpha*a*kp**(alpha-1)
    sum1 = np.dot(PIE[i, :], dudcp*(dydkp + 1 - delta))

    ind = l + i*L

    #Generate the full residual entry by mulying qua*res*weight
    res = (beta*sum1 - c**(-gamma))*u
    RES[ind] += res*bs1
    RES[ind+1] += res*bs2

You'll notice here a pesky sub-routine that is trying to find indices for the next period's element:



In [ ]:

    
#Find the element for next periods capital stock
ii = 0
for h in range(0, L - 1):
    if kp >= k[h]:
        ii = h
k1p = k[ii]
k2p = k[ii + 1]

This is going to be tough to vectorize, but we can do it! Let's see what it is doing. It takes some value, kp, and returns the index for the next lowest value in another object, k. To do this vectorially we'll use the numpy.where function. Let's look at an example.

Given a value of kp, we would like to return the index of the next lowest value in k. If kp$=0.2$, that index is $4$:



In [67]:

    
kp = 0.2
print(np.where(kp >= k))
k









    



(array([0, 1, 2, 3, 4]),)






    Out[67]:





array([ 0.10079915,  0.12095897,  0.1411188 ,  0.16127863,  0.18143846,
        0.20159829,  0.22175812,  0.24191795,  0.26207778,  0.28223761,
        0.30239744])

Notice though that np.where gives a vector of indices to all of the values of k where the condition is not true. It also is returning a tuple. Thus, we need to index to the value we want, the interior of the tuple and the last entry in the vector:



In [68]:

    
print(np.where(kp >= k)[0][-1])

However, what if our algorithm steps out of the state space? For example, what if kp is zero:



In [70]:

    
kp = 0.0
print(np.where(kp >= k)[0][-1])









    



---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-70-537ea786b81d> in <module>()
      1 kp = 0.0
----> 2 print(np.where(kp >= k)[0][-1])

IndexError: index -1 is out of bounds for axis 0 with size 0

Oh no! This is happening because there are no values in k for which the condition is not true. What we can do is add a zero to the array and cut off the upper point (this will avoid a similar problem on the upper bound). This gives an answer that is correct for zero, but incorrect for something else:



In [72]:

    
kp = np.array(0.2)
print(np.where(kp >= np.append([0], k[:-1]))[0][-1])
kp = np.array(0.0)
print(np.where(kp >= np.append([0], k[:-1]))[0][-1])

5
0

So, we can just subtract one from all values that are greater than zero. In order to foresee the needing to make this work for a set of points in kp we can also vectorize the entire calculation using a list comprehension:



In [78]:

    
kp = np.array([0, 0.2])
ii = np.array([np.where(point >= np.append([0], k[:-1]))[0][-1] 
               for point in kp])
ii[ii > 0] -= 1
print(ii)

Now that we've gon over how to use np.where to find the element, we can vectorize the loop in pretty much the same way as before. Here is the vectorized form of the loop:



In [82]:

    
#For demonstration purposes
l, i = 0, 0
coeff = coeff0
RES = np.zeros((I*L))

#Compute f at all of the quadrature points on the element
x = abcissas
kt = scalup(x, k[l+1], k[l])
u = weights
bs1 = 0.5*(1 - x)
bs2 = 0.5*(1 + x)

#Compute capital next period by projection
kp = coeff[i, l]*bs1 + coeff[i, l + 1]*bs2
#print bs1
#Compute consumption
y = a[i]*kt**alpha
c = y + (1 - delta)*kt - kp

#Find indices for next period's capital stock for each point
ii = np.array([np.where(point >= np.append([0], k[:-1]))[0][-1]
               for point in kp])
ii[ii > 0] -= 1

#Find elements for next period's capital stock
#NOTE: Here I'm using an array as an index
k1p = k[ii]
k2p = k[ii + 1]

#Calculate next period's policy
x = scaldwn(kp, k2p, k1p)
bs1p = 0.5*(1 - x)
bs2p = 0.5*(1 + x)

#Initialize the summation vector
sum1 = np.zeros((M))

#Compute all of the sums simultaneously
#NOTE: You can slice AND index by an array at the same time!
#NOTE: This is a good time to talk about broadcasting
#The resulting matrix contains columns corresponding to each
#abcissa, and rows corresponding to possible states in period
#t+1
kpp = coeff[:, ii]*bs1p + coeff[:, ii+1]*bs2p
yp = np.outer(a, kp**alpha)
cp = yp + (1 - delta)*np.outer(np.ones(I), kp) - kpp
dudcp = cp**(-gamma)
dydkp = np.outer(a, kp**(alpha - 1))*alpha
sum1 = np.dot(PIE[i, :], dudcp*(dydkp + 1 - delta))

#Calculate the residual functions and fill the vector
ind = l + i*L
res = (beta*sum1 - c**(-gamma))*u
RES[ind] += np.dot(res, bs1)
RES[ind + 1] += np.dot(res, bs2)

The first section is fairly straight forward, but the second section deserves some discussion. Instead of building a single vector, we want to build M vectors simultaneously. To this end, kpp is a matrix whose columns correspond to the different quadrature points:



In [86]:

    
kpp









    Out[86]:





array([[ 0.04066755,  0.04068698,  0.04072013,  0.04076406,  0.04081486,
         0.04086803,  0.04091883,  0.04096276,  0.04099591,  0.04101535],
       [ 0.04173774,  0.04175769,  0.04179171,  0.0418368 ,  0.04188894,
         0.0419435 ,  0.04199565,  0.04204073,  0.04207475,  0.0420947 ],
       [ 0.04280794,  0.0428284 ,  0.0428633 ,  0.04290954,  0.04296301,
         0.04301898,  0.04307246,  0.0431187 ,  0.04315359,  0.04317405],
       [ 0.04387814,  0.04389911,  0.04393488,  0.04398228,  0.04403709,
         0.04409445,  0.04414927,  0.04419667,  0.04423243,  0.0442534 ],
       [ 0.04494834,  0.04496982,  0.04500646,  0.04505501,  0.04511117,
         0.04516993,  0.04522608,  0.04527463,  0.04531127,  0.04533275]])

Similarly, yp is a matrix of the same size, as is cp. Additinally, the sum1 variable now is a vector:



In [101]:

    
sum1









    Out[101]:





array([ 19.85700398,  19.77682054,  19.6413732 ,  19.46441258,
        19.26326171,  19.05668646,  18.86294257,  18.6982269 ,
        18.57561132,  18.50438922])

In the end, we simply use this vector to calculate the integral for each element using a dot product and then add it to the residual:



In [102]:

    
#Calculate the residual functions and fill the vector
ind = l + i*L
res = (beta*sum1 - c**(-gamma))*u
RES[ind] += np.dot(res, bs1)
RES[ind + 1] += np.dot(res, bs2)

Here is the whole function in one cell, as well as running and comparing our results for the looped and vectorized function:



In [104]:

    
def mcgratten_weighted_residual_vec_2(coeff):
    """
    A partially vectorized function of the McGratten problem.

    Variables:
        coeff   :   ndarray; a vector of coefficients for the parameterized
                    policy function
    Returns:
        ndarray containing the residuals.

    """
    #Unpack the arguments
    alpha, delta, I, L, k, abcissas, M, a, PIE = args
    coeff = coeff.reshape(I, L)

    #Initialize the residuals
    RES = np.zeros((I*L))

    #Construct the vector of residual funcitons
    for i in range(0, I):
        for l in range(0, L - 1):
            #Compute f at all of the quadrature points on the element
            x = abcissas
            kt = scalup(x, k[l+1], k[l])
            u = weights
            bs1 = 0.5*(1 - x)
            bs2 = 0.5*(1 + x)

            #Compute capital next period by projection
            kp = coeff[i, l]*bs1 + coeff[i, l + 1]*bs2
            #print bs1
            #Compute consumption
            y = a[i]*kt**alpha
            c = y + (1 - delta)*kt - kp

            #Find indices for next period's capital stock for each point
            ii = np.array([np.where(point >= np.append([0], k[:-1]))[0][-1]
                           for point in kp])
            ii[ii > 0] -= 1

            #Find elements for next period's capital stock
            #NOTE: Here I'm using an array as an index
            k1p = k[ii]
            k2p = k[ii + 1]

            #Calculate next period's policy
            x = scaldwn(kp, k2p, k1p)
            bs1p = 0.5*(1 - x)
            bs2p = 0.5*(1 + x)

            #Initialize the summation vector
            sum1 = np.zeros((M))

            #Compute all of the sums simultaneously
            #NOTE: You can slice AND index by an array at the same time!
            #NOTE: This is a good time to talk about broadcasting
            #The resulting matrix contains columns corresponding to each
            #abcissa, and rows corresponding to possible states in period
            #t+1
            kpp = coeff[:, ii]*bs1p + coeff[:, ii+1]*bs2p
            yp = np.outer(a, kp**alpha)
            cp = yp + (1 - delta)*np.outer(np.ones(I), kp) - kpp
            dudcp = cp**(-gamma)
            dydkp = np.outer(a, kp**(alpha - 1))*alpha
            sum1 = np.dot(PIE[i, :], dudcp*(dydkp + 1 - delta))

            #Calculate the residual functions and fill the vector
            ind = l + i*L
            res = (beta*sum1 - c**(-gamma))*u
            RES[ind] += np.dot(res, bs1)
            RES[ind + 1] += np.dot(res, bs2)
    return RES



In [105]:

    
time0 = time.time()
res = scipy.optimize.newton_krylov(mcgratten_weighted_residual, coeff0,
                                   method='lgmres', verbose=True, maxiter=1000,
                                   line_search='armijo')
print(time.time() - time0)

time0 = time.time()
res2 = scipy.optimize.newton_krylov(mcgratten_weighted_residual_vec_2, coeff0,
                                    method='lgmres', verbose=True,
                                    maxiter=1000, line_search='armijo')
print(time.time() - time0)
print(np.linalg.norm(res - res2))









    



0:  |F(x)| = 65.3091; step 1; tol 0.164013
1:  |F(x)| = 21.2687; step 1; tol 0.0954504
2:  |F(x)| = 3.31136; step 1; tol 0.0218159
3:  |F(x)| = 0.119142; step 1; tol 0.00116509
4:  |F(x)| = 0.000200299; step 1; tol 2.54373e-06
5:  |F(x)| = 7.06869e-10; step 1; tol 1.12089e-11
0.6652843952178955
0:  |F(x)| = 65.3091; step 1; tol 0.164013
1:  |F(x)| = 21.2687; step 1; tol 0.0954504
2:  |F(x)| = 3.32867; step 1; tol 0.0220446
3:  |F(x)| = 0.119987; step 1; tol 0.00116942
4:  |F(x)| = 0.000199263; step 1; tol 2.48214e-06
5:  |F(x)| = 6.95892e-10; step 1; tol 1.09767e-11
0.624577522277832
4.95180023046e-12

Huzzah! We got a speed up! Now, we can look at how this change affects speed as we vary the number of quadrature points, M. When we do that we also need to regenerate weights and abcissas, as well as redefine the args tuple:



In [106]:

    
M = 100

#Generate absissas and weights
abcissas, weights = np.polynomial.legendre.leggauss(M)

#Pack up all of the arguments to pass to the solver
args = (alpha, delta, I, L, k, abcissas, M, a, PIE)

time0 = time.time()
res = scipy.optimize.newton_krylov(mcgratten_weighted_residual, coeff0,
                                   method='lgmres', verbose=True, maxiter=1000,
                                   line_search='armijo')
print(time.time() - time0)

time0 = time.time()
res2 = scipy.optimize.newton_krylov(mcgratten_weighted_residual_vec_2, coeff0,
                                    method='lgmres', verbose=True,
                                    maxiter=1000, line_search='armijo')
print(time.time() - time0)
print(np.linalg.norm(res - res2))









    



0:  |F(x)| = 65.3091; step 1; tol 0.164013
1:  |F(x)| = 21.2683; step 1; tol 0.0954467
2:  |F(x)| = 3.33141; step 1; tol 0.0220818
3:  |F(x)| = 0.121392; step 1; tol 0.00119499
4:  |F(x)| = 0.000196477; step 1; tol 2.35769e-06
5:  |F(x)| = 5.80679e-10; step 1; tol 7.86127e-12
6.156726360321045
0:  |F(x)| = 65.3091; step 1; tol 0.164013
1:  |F(x)| = 21.2683; step 1; tol 0.0954467
2:  |F(x)| = 3.33609; step 1; tol 0.0221438
3:  |F(x)| = 0.122501; step 1; tol 0.00121353
4:  |F(x)| = 0.000194777; step 1; tol 2.27528e-06
5:  |F(x)| = 4.71143e-10; step 1; tol 5.26589e-12
3.3143067359924316
5.25787098956e-12

The solution is now achieved in half the time using 100 quadrature points. We should see similar gains as we vectorize the other loops. Let's reset the arguments so we don't have problems later, then we'll work on the next loop.



In [117]:

    
M = 10

#Generate absissas and weights
abcissas, weights = np.polynomial.legendre.leggauss(M)

#Pack up all of the arguments to pass to the solver
args = (alpha, delta, I, L, k, abcissas, M, a, PIE)

The Capital Elements Loop

The next loop we would like to vectorize is the one which loops over capital. Here it is from the last function:



In [110]:

    
for l in range(0, L - 1):
    #Compute f at all of the quadrature points on the element
    x = abcissas
    kt = scalup(x, k[l+1], k[l])
    u = weights
    bs1 = 0.5*(1 - x)
    bs2 = 0.5*(1 + x)

    #Compute capital next period by projection
    kp = coeff[i, l]*bs1 + coeff[i, l + 1]*bs2
    #print bs1
    #Compute consumption
    y = a[i]*kt**alpha
    c = y + (1 - delta)*kt - kp

    #Find indices for next period's capital stock for each point
    ii = np.array([np.where(point >= np.append([0], k[:-1]))[0][-1]
                   for point in kp])
    ii[ii > 0] -= 1

    #Find elements for next period's capital stock
    #NOTE: Here I'm using an array as an index
    k1p = k[ii]
    k2p = k[ii + 1]

    #Calculate next period's policy
    x = scaldwn(kp, k2p, k1p)
    bs1p = 0.5*(1 - x)
    bs2p = 0.5*(1 + x)

    #Initialize the summation vector
    sum1 = np.zeros((M))

    #Compute all of the sums simultaneously
    #NOTE: You can slice AND index by an array at the same time!
    #NOTE: This is a good time to talk about broadcasting
    #The resulting matrix contains columns corresponding to each
    #abcissa, and rows corresponding to possible states in period
    #t+1
    kpp = coeff[:, ii]*bs1p + coeff[:, ii+1]*bs2p
    yp = np.outer(a, kp**alpha)
    cp = yp + (1 - delta)*np.outer(np.ones(I), kp) - kpp
    dudcp = cp**(-gamma)
    dydkp = np.outer(a, kp**(alpha - 1))*alpha
    sum1 = np.dot(PIE[i, :], dudcp*(dydkp + 1 - delta))

    #Calculate the residual functions and fill the vector
    ind = l + i*L
    res = (beta*sum1 - c**(-gamma))*u
    RES[ind] += np.dot(res, bs1)
    RES[ind + 1] += np.dot(res, bs2)

Now we are going to have to build a series of vectors. This is going to be slightly more complex, as we'll need to use kronecker products to repeat vectors, but once you get the hang of this it is fairly simple. Here's the vectorized loop:



In [127]:

    
#Vectorize element loop along k axis
#Scale and calculate capital
x = np.kron(np.ones(L - 1), abcissas)
k1 = np.kron(k[:-1], np.ones(M))
k2 = np.kron(k[1:], np.ones(M))
kt = scalup(x, k2, k1)
u = np.kron(np.ones(L - 1), weights)
bs1 = 0.5*(1 - x)
bs2 = 0.5*(1 + x)
Bs1 = 0.5*(1 - abcissas)
Bs2 = 0.5*(1 + abcissas)

#Compute capital next period by projection
kp = np.kron(coeff[i, :-1], np.ones(M))*bs1\
    + np.kron(coeff[i, 1:], np.ones(M))*bs2

#Compute consumption
y = a[i]*kt**alpha
c = y + (1 - delta)*kt - kp

#Find indices for next period's element bounds
ii = np.array([np.where(point >= np.append([0], k[:-1]))[0][-1]
               for point in kp])
ii[ii > 0] -= 1

#Define element bounds for next period
k1p = k[ii]
k2p = k[ii + 1]

#Calculate policy next period
x = scaldwn(kp, k2p, k1p)
bs1p = 0.5*(1 - x)
bs2p = 0.5*(1 + x)

#Initialize the summation
sum1 = np.zeros(((L-1)*M))

#Compute all of the sums simultaneously using linear algebra
kpp = coeff[:, ii]*bs1p + coeff[:, ii+1]*bs2p
yp = np.outer(a, kp**alpha)
cp = yp + (1 - delta)*np.outer(np.ones(I), kp) - kpp
dudcp = cp**(-gamma)
dydkp = np.outer(a, kp**(alpha - 1))*alpha
sum1 = np.dot(PIE[i, :], dudcp*(dydkp + 1 - delta))

#Calculate the residual functions
res = (beta*sum1 - c**(-gamma))*u
res = res.reshape(L-1, M)
bs1 = bs1.reshape(L-1, M)
bs2 = bs2.reshape(L-1, M)
RES[i*(L-1): (i+1)*(L-1)] += np.dot(res, Bs1)
RES[i*(L-1) + 1: (i+1)*(L-1) + 1] += np.dot(res, Bs2)

You'll immediately notice the use kroneckers everywhere. This is simply staking up all of the loops performed before into one big vector of length $M(L-1)$, so that there is one point for every abcissa in every element on $k$.

Let's look at the first few lines to get a sense of this:



In [131]:

    
x = np.kron(np.ones(L - 1), abcissas)
k1 = np.kron(k[:-1], np.ones(M))
k2 = np.kron(k[1:], np.ones(M))
kt = scalup(x, k2, k1)

print(x[:20])
print(k1[:20])
print(k2[:20])
print(kt[:20])

u = np.kron(np.ones(L - 1), weights)
bs1 = 0.5*(1 - x)
bs2 = 0.5*(1 + x)
Bs1 = 0.5*(1 - abcissas)
Bs2 = 0.5*(1 + abcissas)









    



[-0.97390653 -0.86506337 -0.67940957 -0.43339539 -0.14887434  0.14887434
  0.43339539  0.67940957  0.86506337  0.97390653 -0.97390653 -0.86506337
 -0.67940957 -0.43339539 -0.14887434  0.14887434  0.43339539  0.67940957
  0.86506337  0.97390653]
[ 0.10079915  0.10079915  0.10079915  0.10079915  0.10079915  0.10079915
  0.10079915  0.10079915  0.10079915  0.10079915  0.12095897  0.12095897
  0.12095897  0.12095897  0.12095897  0.12095897  0.12095897  0.12095897
  0.12095897  0.12095897]
[ 0.12095897  0.12095897  0.12095897  0.12095897  0.12095897  0.12095897
  0.12095897  0.12095897  0.12095897  0.12095897  0.1411188   0.1411188
  0.1411188   0.1411188   0.1411188   0.1411188   0.1411188   0.1411188
  0.1411188   0.1411188 ]
[ 0.10106217  0.1021593   0.10403067  0.10651047  0.10937842  0.1123797
  0.11524765  0.11772745  0.11959883  0.12069595  0.12122199  0.12231912
  0.1241905   0.1266703   0.12953825  0.13253953  0.13540748  0.13788728
  0.13975865  0.14085578]

So x is just a repetition of the abcissas vector over and over. Then, k1 and k2 define lower and upper bounds, respectively, for the element. Finally, kt are the abcissa points. Notice that these do not repeat, as each point corresponds to a different point on different elements.

We can continue constructing our solution in a similar way throughout.

Next, the computation of kp deserves some attention:



In [134]:

    
kp = np.kron(coeff[i, :-1], np.ones(M))*bs1\
    + np.kron(coeff[i, 1:], np.ones(M))*bs2

print(bs1)
print(np.kron(coeff[i, :-1], np.ones(M)))
print(kp)









    



[ 0.98695326  0.93253168  0.83970478  0.7166977   0.57443717  0.42556283
  0.2833023   0.16029522  0.06746832  0.01304674  0.98695326  0.93253168
  0.83970478  0.7166977   0.57443717  0.42556283  0.2833023   0.16029522
  0.06746832  0.01304674  0.98695326  0.93253168  0.83970478  0.7166977
  0.57443717  0.42556283  0.2833023   0.16029522  0.06746832  0.01304674
  0.98695326  0.93253168  0.83970478  0.7166977   0.57443717  0.42556283
  0.2833023   0.16029522  0.06746832  0.01304674  0.98695326  0.93253168
  0.83970478  0.7166977   0.57443717  0.42556283  0.2833023   0.16029522
  0.06746832  0.01304674  0.98695326  0.93253168  0.83970478  0.7166977
  0.57443717  0.42556283  0.2833023   0.16029522  0.06746832  0.01304674
  0.98695326  0.93253168  0.83970478  0.7166977   0.57443717  0.42556283
  0.2833023   0.16029522  0.06746832  0.01304674  0.98695326  0.93253168
  0.83970478  0.7166977   0.57443717  0.42556283  0.2833023   0.16029522
  0.06746832  0.01304674  0.98695326  0.93253168  0.83970478  0.7166977
  0.57443717  0.42556283  0.2833023   0.16029522  0.06746832  0.01304674
  0.98695326  0.93253168  0.83970478  0.7166977   0.57443717  0.42556283
  0.2833023   0.16029522  0.06746832  0.01304674]
[ 0.04772662  0.04772662  0.04772662  0.04772662  0.04772662  0.04772662
  0.04772662  0.04772662  0.04772662  0.04772662  0.05040981  0.05040981
  0.05040981  0.05040981  0.05040981  0.05040981  0.05040981  0.05040981
  0.05040981  0.05040981  0.05279576  0.05279576  0.05279576  0.05279576
  0.05279576  0.05279576  0.05279576  0.05279576  0.05279576  0.05279576
  0.05495366  0.05495366  0.05495366  0.05495366  0.05495366  0.05495366
  0.05495366  0.05495366  0.05495366  0.05495366  0.05693016  0.05693016
  0.05693016  0.05693016  0.05693016  0.05693016  0.05693016  0.05693016
  0.05693016  0.05693016  0.05875836  0.05875836  0.05875836  0.05875836
  0.05875836  0.05875836  0.05875836  0.05875836  0.05875836  0.05875836
  0.06046269  0.06046269  0.06046269  0.06046269  0.06046269  0.06046269
  0.06046269  0.06046269  0.06046269  0.06046269  0.06206175  0.06206175
  0.06206175  0.06206175  0.06206175  0.06206175  0.06206175  0.06206175
  0.06206175  0.06206175  0.06357007  0.06357007  0.06357007  0.06357007
  0.06357007  0.06357007  0.06357007  0.06357007  0.06357007  0.06357007
  0.06499921  0.06499921  0.06499921  0.06499921  0.06499921  0.06499921
  0.06499921  0.06499921  0.06499921  0.06499921]
[ 0.04776162  0.04790765  0.04815672  0.04848677  0.04886848  0.04926794
  0.04964965  0.0499797   0.05022878  0.0503748   0.05044094  0.05057078
  0.05079226  0.05108575  0.05142518  0.05178039  0.05211982  0.05241331
  0.05263479  0.05276463  0.05282392  0.05294135  0.05314166  0.0534071
  0.05371409  0.05403534  0.05434233  0.05460776  0.05480807  0.05492551
  0.05497945  0.05508702  0.05527049  0.05551361  0.05579479  0.05608904
  0.05637021  0.05661334  0.05679681  0.05690437  0.05695401  0.05705351
  0.05722321  0.05744809  0.05770817  0.05798035  0.05824043  0.05846531
  0.05863501  0.05873451  0.0587806   0.05887335  0.05903156  0.0592412
  0.05948366  0.05973739  0.05997985  0.06018949  0.0603477   0.06044045
  0.06048355  0.06057058  0.06071901  0.06091571  0.06114319  0.06138125
  0.06160873  0.06180543  0.06195387  0.06204089  0.06208143  0.06216352
  0.06230353  0.06248906  0.06270363  0.06292818  0.06314276  0.06332829
  0.0634683   0.06355039  0.06358871  0.06366649  0.06379915  0.06397495
  0.06417826  0.06439102  0.06459433  0.06477012  0.06490279  0.06498056
  0.06501694  0.06509092  0.06521711  0.06538432  0.0655777   0.06578008
  0.06597346  0.06614067  0.06626686  0.06634084]

This operation is just repeating the coefficients $M$ times in order to carry out an element-wise multiplication with the interpolator. This is the same calculation as before, but carried out $L-1$ times simultaneously.

Next let's see how our index finder is doing:



In [133]:

    
#Find indices for next period's element bounds
ii = np.array([np.where(point >= np.append([0], k[:-1]))[0][-1]
               for point in kp])
ii[ii > 0] -= 1

print(ii)









    



[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

It seems to be working just fine. You'll notice that all of the values in kp are below the lower bound on the state space, so we're simply forcing it back into the lower element.

Finally, let's take a look at the conditional expectation:



In [135]:

    
#Define element bounds for next period
k1p = k[ii]
k2p = k[ii + 1]

#Calculate policy next period
x = scaldwn(kp, k2p, k1p)
bs1p = 0.5*(1 - x)
bs2p = 0.5*(1 + x)

#Initialize the summation
sum1 = np.zeros(((L-1)*M))

#Compute all of the sums simultaneously using linear algebra
kpp = coeff[:, ii]*bs1p + coeff[:, ii+1]*bs2p
yp = np.outer(a, kp**alpha)
cp = yp + (1 - delta)*np.outer(np.ones(I), kp) - kpp
dudcp = cp**(-gamma)
dydkp = np.outer(a, kp**(alpha - 1))*alpha
sum1 = np.dot(PIE[i, :], dudcp*(dydkp + 1 - delta))



In [139]:

    
print(kpp.shape)
print(yp.shape)
print(cp.shape)
print(sum1.shape)









    



(5, 100)
(5, 100)
(5, 100)
(100,)

You'll notice that we just repeated the same code as before. Instead of $M$ columns, the kpp matrix contains $M(L-1)$ columns, one for every quadrature point and element. Similarly, sum1 now contains all of the conditional expectations at all of the quadrature points on all of the elements in the capital direction.

Finally, we want to calculate the residuals. First of all, notice that we are going to have some trouble entering adding to the vector of residual, since the loop was only over $0...L-2$, while the length of our RES vector is a function of $L$. Why is this? On the lowest and highest elements, we only add once to the residual vector, since there is no overlap with another residual. In order to deal with this we have to do some fancy slicing.

As a test case, let's create a vector of size $IL$ and try to recove only the entries that we would like to fill. Given a value for i in $\{0, ..., I\}$, we would like to fill all of the rows corresponding to $l + i*L$, for all $l$ in $\{0...L-2\}$, and the same plus one. Thus, we simply slice the test vector accordingly:



In [150]:

    
i = 0
test = np.arange(I*L)
print(test)
print(test[i*L:(i+1)*(L-1)])

#Will this work as we loop through i?
print([test[i*L:(i+1)*(L-1)] for i in range(0,I)])

#NO! Why?  The upper bound is advancing at a rate L-1 while the lower bound moves at L
print([test[i*L:(i+1)*L-1] for i in range(0,I)])

#That gives us the result we wanted!  What about plus one?
print([test[i*L+1:(i+1)*L] for i in range(0,I)])









    



[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
 50 51 52 53 54]
[0 1 2 3 4 5 6 7 8 9]
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([11, 12, 13, 14, 15, 16, 17, 18, 19]), array([22, 23, 24, 25, 26, 27, 28, 29]), array([33, 34, 35, 36, 37, 38, 39]), array([44, 45, 46, 47, 48, 49])]
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20]), array([22, 23, 24, 25, 26, 27, 28, 29, 30, 31]), array([33, 34, 35, 36, 37, 38, 39, 40, 41, 42]), array([44, 45, 46, 47, 48, 49, 50, 51, 52, 53])]
[array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10]), array([12, 13, 14, 15, 16, 17, 18, 19, 20, 21]), array([23, 24, 25, 26, 27, 28, 29, 30, 31, 32]), array([34, 35, 36, 37, 38, 39, 40, 41, 42, 43]), array([45, 46, 47, 48, 49, 50, 51, 52, 53, 54])]

So, on each pass, given a value for i we know how to fill RES, but how do we calculate the integrals? First, notice that bs1 just repeats over and over. So, if we reshape the res vector into a matrix, we can carry out a dot product with a single interpolator. Here's my implimentation:



In [153]:

    
res = (beta*sum1 - c**(-gamma))*u
res = res.reshape(L-1, M)
RES[i*L: (i+1)*L-1] += np.dot(res, Bs1)
RES[i*L + 1: (i+1)*L] += np.dot(res, Bs2)

Ok, here's the full function and a comparison with the older version:



In [156]:

    
def mcgratten_weighted_residual_vec_3(coeff):
    """
    A partially vectorized function of the McGratten problem.

    Variables:
        coeff   :   ndarray; a vector of coefficients for the parameterized
                    policy function
    Returns:
        ndarray containing the residuals.

    """
    #Unpack the arguments
    alpha, delta, I, L, k, abcissas, M, a, PIE = args
    coeff = coeff.reshape(I, L)

    #Initialize the residuals
    RES = np.zeros((I*L))

    #Construct the vector of residual funcitons
    for i in range(0, I):
        #Vectorize element loop along k axis
        #Scale and calculate capital
        x = np.kron(np.ones(L - 1), abcissas)
        kt = scalup(x, np.kron(k[1:], np.ones(M)), np.kron(k[:-1], np.ones(M)))
        u = np.kron(np.ones(L - 1), weights)
        bs1 = 0.5*(1 - x)
        bs2 = 0.5*(1 + x)
        Bs1 = 0.5*(1 - abcissas)
        Bs2 = 0.5*(1 + abcissas)

        #Compute capital next period by projection
        kp = np.kron(coeff[i, :-1], np.ones(M))*bs1\
            + np.kron(coeff[i, 1:], np.ones(M))*bs2

        #Compute consumption
        y = a[i]*kt**alpha
        c = y + (1 - delta)*kt - kp

        #Find indices for next period's element bounds
        ii = np.array([np.where(point >= np.append([0], k[:-1]))[0][-1]
                       for point in kp])
        ii[ii > 0] -= 1

        #Define element bounds for next period
        k1p = k[ii]
        k2p = k[ii + 1]

        #Calculate policy next period
        x = scaldwn(kp, k2p, k1p)
        bs1p = 0.5*(1 - x)
        bs2p = 0.5*(1 + x)

        #Initialize the summation
        sum1 = np.zeros(((L-1)*M))

        #Compute all of the sums simultaneously using linear algebra
        kpp = coeff[:, ii]*bs1p + coeff[:, ii+1]*bs2p
        yp = np.outer(a, kp**alpha)
        cp = yp + (1 - delta)*np.outer(np.ones(I), kp) - kpp
        dudcp = cp**(-gamma)
        dydkp = np.outer(a, kp**(alpha - 1))*alpha
        sum1 = np.dot(PIE[i, :], dudcp*(dydkp + 1 - delta))

        #Calculate the residual functions
        res = (beta*sum1 - c**(-gamma))*u
        res = res.reshape(L-1, M)
        RES[i*L: (i+1)*L-1] += np.dot(res, Bs1)
        RES[i*L + 1: (i+1)*L] += np.dot(res, Bs2)
    return RES



In [158]:

    
time0 = time.time()
res = scipy.optimize.newton_krylov(mcgratten_weighted_residual, coeff0,
                                   method='lgmres', verbose=True, maxiter=1000,
                                   line_search='armijo')
print(time.time() - time0)

time0 = time.time()
res3 = scipy.optimize.newton_krylov(mcgratten_weighted_residual_vec_3, coeff0,
                                    method='lgmres', verbose=True,
                                    maxiter=1000, line_search='armijo')
print(time.time() - time0)
print(np.linalg.norm(res - res3))









    



0:  |F(x)| = 65.3091; step 1; tol 0.164013
1:  |F(x)| = 21.2687; step 1; tol 0.0954504
2:  |F(x)| = 3.31136; step 1; tol 0.0218159
3:  |F(x)| = 0.119142; step 1; tol 0.00116509
4:  |F(x)| = 0.000200299; step 1; tol 2.54373e-06
5:  |F(x)| = 7.06869e-10; step 1; tol 1.12089e-11
0.7266361713409424
0:  |F(x)| = 65.3091; step 1; tol 0.164013
1:  |F(x)| = 21.2687; step 1; tol 0.0954504
2:  |F(x)| = 3.32729; step 1; tol 0.0220263
3:  |F(x)| = 0.125515; step 1; tol 0.00128071
4:  |F(x)| = 0.000285917; step 1; tol 4.67017e-06
5:  |F(x)| = 1.87484e-09; step 1; tol 3.8698e-11
0.4355125427246094
2.46345452403e-11

This is clearly faster, even for lower values of $M$ and $L$, but let's see how it performs as these increase:



In [160]:

    
#Generate a state space vector over k
L = 101
k = np.linspace(kmin, kmax, num=L)

#Specify number of points in the integral on each element [k_j, k_j+1]
M = 10

#Generate absissas and weights
abcissas, weights = np.polynomial.legendre.leggauss(M)

#Generate intial guess as half of production at all nodes
coeff0 = np.outer(a, k**alpha)*0.1

#Pack up all of the arguments to pass to the solver
args = (alpha, delta, I, L, k, abcissas, M, a, PIE)

time0 = time.time()
res = scipy.optimize.newton_krylov(mcgratten_weighted_residual, coeff0,
                                   method='lgmres', verbose=True, maxiter=1000,
                                   line_search='armijo')
print(time.time() - time0)

time0 = time.time()
res3 = scipy.optimize.newton_krylov(mcgratten_weighted_residual_vec_3, coeff0,
                                    method='lgmres', verbose=True,
                                    maxiter=1000, line_search='armijo')
print(time.time() - time0)
print(np.linalg.norm(res - res3))









    



0:  |F(x)| = 213.312; step 1; tol 0.167273
1:  |F(x)| = 69.0602; step 1; tol 0.0943335
2:  |F(x)| = 10.6125; step 1; tol 0.0212532
3:  |F(x)| = 0.297359; step 1; tol 0.000706587
4:  |F(x)| = 0.00148477; step 1; tol 2.24386e-05
5:  |F(x)| = 1.47206e-07; step 1; tol 8.84669e-09
10.364726305007935
0:  |F(x)| = 213.312; step 1; tol 0.167273
1:  |F(x)| = 69.0602; step 1; tol 0.0943335
2:  |F(x)| = 10.6115; step 1; tol 0.0212492
3:  |F(x)| = 0.297485; step 1; tol 0.000707322
4:  |F(x)| = 0.00146601; step 1; tol 2.18569e-05
5:  |F(x)| = 1.44792e-07; step 1; tol 8.77923e-09
3.316906690597534
4.09607965422e-10

On my home computer with $L = 101$ and $M = 100$, I got a four times speed up! Huzzah!

Let's reset our parameters:



In [163]:

    
#Generate a state space vector over k
L = 11
k = np.linspace(kmin, kmax, num=L)

#Specify number of points in the integral on each element [k_j, k_j+1]
M = 10

#Generate absissas and weights
abcissas, weights = np.polynomial.legendre.leggauss(M)

#Generate intial guess as half of production at all nodes
coeff0 = np.outer(a, k**alpha)*0.1

#Pack up all of the arguments to pass to the solver
args = (alpha, delta, I, L, k, abcissas, M, a, PIE)

Finally, let's turn to the last loop.

The Productivity Loop

As mentioned before, you only get a speed up when you vectorize a loop that makes many passes. That means that this probably won't change much, but we'll give it a try anyways.

One last time, here's the loop we are trying to vectorize:



In [164]:

    
for i in range(0, I):
    #Vectorize element loop along k axis
    #Scale and calculate capital
    x = np.kron(np.ones(L - 1), abcissas)
    kt = scalup(x, np.kron(k[1:], np.ones(M)), np.kron(k[:-1], np.ones(M)))
    u = np.kron(np.ones(L - 1), weights)
    bs1 = 0.5*(1 - x)
    bs2 = 0.5*(1 + x)
    Bs1 = 0.5*(1 - abcissas)
    Bs2 = 0.5*(1 + abcissas)

    #Compute capital next period by projection
    kp = np.kron(coeff[i, :-1], np.ones(M))*bs1\
        + np.kron(coeff[i, 1:], np.ones(M))*bs2

    #Compute consumption
    y = a[i]*kt**alpha
    c = y + (1 - delta)*kt - kp

    #Find indices for next period's element bounds
    ii = np.array([np.where(point >= np.append([0], k[:-1]))[0][-1]
                   for point in kp])
    ii[ii > 0] -= 1

    #Define element bounds for next period
    k1p = k[ii]
    k2p = k[ii + 1]

    #Calculate policy next period
    x = scaldwn(kp, k2p, k1p)
    bs1p = 0.5*(1 - x)
    bs2p = 0.5*(1 + x)

    #Initialize the summation
    sum1 = np.zeros(((L-1)*M))

    #Compute all of the sums simultaneously using linear algebra
    kpp = coeff[:, ii]*bs1p + coeff[:, ii+1]*bs2p
    yp = np.outer(a, kp**alpha)
    cp = yp + (1 - delta)*np.outer(np.ones(I), kp) - kpp
    dudcp = cp**(-gamma)
    dydkp = np.outer(a, kp**(alpha - 1))*alpha
    sum1 = np.dot(PIE[i, :], dudcp*(dydkp + 1 - delta))

    #Calculate the residual functions
    res = (beta*sum1 - c**(-gamma))*u
    res = res.reshape(L-1, M)
    RES[i*(L-1): (i+1)*(L-1)] += np.dot(res, Bs1)
    RES[i*(L-1) + 1: (i+1)*(L-1) + 1] += np.dot(res, Bs2)

And just to move things along, here is the vectorized version:



In [10]:

    
H = I*(L - 1)
#Vectorize all loops
#Scale and calculate capital
x = np.kron(np.ones(H), abcissas)
kt = scalup(x, np.kron(k[1:], np.ones(I*M)), np.kron(k[:-1], np.ones(I*M)))
u = np.kron(np.ones(H), weights)
bs1 = 0.5*(1 - x)
bs2 = 0.5*(1 + x)
#NOTE: bs is constant across elements, so simplify calculations later
Bs1 = 0.5*(1 - abcissas)
Bs2 = 0.5*(1 + abcissas)

#Compute capital next period by projection
kp = np.kron(coeff[:, :-1].reshape(H,), np.ones((M)))*bs1\
    + np.kron(coeff[:, 1:].reshape(H,), np.ones((M,)))*bs2

#Compute consumption
y = np.kron(a, np.ones((L - 1)*M))*kt**alpha
c = y + (1 - delta)*kt - kp

#Find indices for next periods element bounds
ii = np.array([np.where(point >= np.append([0], k[:-1]))[0][-1]
               for point in kp])
ii[ii > 0] -= 1

#Define element bounds for next period
k1p = k[ii]
k2p = k[ii + 1]

#Calculate policy next period
x = scaldwn(kp, k2p, k1p)
bs1p = 0.5*(1 - x)
bs2p = 0.5*(1 + x)

#Initialize the summation
sum1 = np.zeros((H*M))

#Compute the sums
#Calculate next period's policy, output, and consumption
kpp = coeff[:, ii]*bs1p + coeff[:, ii+1]*bs2p
yp = np.outer(a, kp**alpha)
cp = yp + (1 - delta)*np.outer(np.ones(I), kp) - kpp

#Calculate marginal product of capital and marginal utility
dudcp = cp**(-gamma)
dydkp = np.outer(a, kp**(alpha - 1))*alpha
rhs = dudcp*(dydkp + 1 - delta)

#Generate a block diagonal matrix for the expectation
mats = [rhs[:, i*M*(L - 1): (i+1)*M*(L - 1)] for i in range(0, I)]
temp1 = scipy.linalg.block_diag(*mats)
temp2 = PIE.reshape(I*I)
sum1 = np.dot(temp2, temp1)

#Calculate the residual functions
res = (beta*sum1 - c**(-gamma))*u
res = res.reshape((H, M))
RES[[i for i in range(0, I*L) if i not in range((L - 1),I*L,L)]] += np.dot(res, Bs1)
RES[[i for i in range(0, I*L) if i not in range(0, I*L - 1, L)]] += np.dot(res, Bs2)









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-22e12707bd77> in <module>()
     12 
     13 #Compute capital next period by projection
---> 14 kp = np.kron(coeff[:, :-1].reshape(H,), np.ones((M)))*bs1    + np.kron(coeff[:, 1:].reshape(H,), np.ones((M,)))*bs2
     15 
     16 #Compute consumption

NameError: name 'coeff' is not defined

This code looks almost exactly the same, and this is one of the benefits of vectorization. An increase in dimensions from one to two can be tricky, but from two to three or three to four is usually trivial. Let's look at exactly what's going on.

First, we build a vector kp the same way as befor, except now it is longer. One point is the reshpae operation:



In [170]:

    
kp = np.kron(coeff[:, :-1].reshape(H,), np.ones((M)))*bs1\
    + np.kron(coeff[:, 1:].reshape(H,), np.ones((M,)))*bs2
print(kp.shape)









    



(500,)

In order to stack all of the loops as we did befor, we reshape the coefficient matrix. It is important to think about how this reshape operation works. The default method is to allow the last (or rightmost) index to vary the fastest. Let's see an example:



In [175]:

    
test = np.arange(9)
print(test)
print(test.reshape(3,3))
print(test.reshape(3,3).reshape(9,))









    



[0 1 2 3 4 5 6 7 8]
[[0 1 2]
 [3 4 5]
 [6 7 8]]
[0 1 2 3 4 5 6 7 8]

So as you can see, the reshape method (or corresponding function) will either take items from the first row of a matrix, then the second, and so on. Or, conversely, if reshaping a vector to a matrix it will fill the first row first, the second row second, and etc.

Looking back to kp, the reshape is just taking the rows of the coeff matrix and stacking them end-to-end to make a long vector. This is exactly what we want.

The next important point here is the calculation of the integrals. You'll notice that we construct the rhs matrix, which is $I \times (L-1)I$. We need to take the dot product of the first row of $\Pi$ with the first $L-1$ columns of rhs, the second with the second, and so forth. This presents a very odd computational challenge, but here is my solution:

We can transform the rhs matrix up into a block diagonal matrix, where each block corresponds to $L-1$ columns from the original matrix. This matrix will we $I^2\times (L-1)I$. We can then reshape the $\Pi$ matrix into a vector. If we use the reshape method, this will stack the rows end to end. Finally, taking the dot product of these two should give us the desired result!

That's carried out in these lines, where we first create a list of arrays, then pass them to scipy.linalg.block_diag as a star-args argument (allowing you to pass a dynamic number of matrices), and finally reshaping and taking the dot product:



In [180]:

    
#Generate a block diagonal matrix for the expectation
mats = [rhs[:, i*M*(L - 1): (i+1)*M*(L - 1)] for i in range(0, I)]
temp1 = scipy.linalg.block_diag(*mats)
temp2 = PIE.reshape(I*I)
sum1 = np.dot(temp2, temp1)

The last point to make is that we need to fill only a subset of the rows of RES. In two lines we would like to fill first the rows that aren't on the upper bound of capital, and on the next those that aren't on the lower bound of capital. This present a need for some truly fancy slicing.

First, notice that in the first case we want the numbers zero through $L-2$, $L$ through $2L-1$, and so on. Put a different way, we want all of the indices, less those corresponding to $l = L - 1$. We can do this with a fancy slice and using range. Check it out:



In [208]:

    
#Here are all the indices
print([i for i in range(0, I*L)])

#Here are all the indices we dont want
#NOTE: The third argument is the step size
print([L-1, I*L, L])

#Here are all the indices we do want!
print([i for i in range(0, I*L) if i not in range((L - 1),I*L,L)])

#And, on the second line, we want all of these indices
print([i for i in range(0, I*L) if i not in range(0, I*L - 1, L)])









    



[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54]
[10, 55, 11]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54]

Thus, we can reshape the res object and fill the corresponding rows of RES using a dot product:



In [210]:

    
res = (beta*sum1 - c**(-gamma))*u
res = res.reshape((H, M))
RES[[i for i in range(0, I*L) if i not in range((L - 1),I*L,L)]] += np.dot(res, Bs1)
RES[[i for i in range(0, I*L) if i not in range(0, I*L - 1, L)]] += np.dot(res, Bs2)

Here is the full function with a comparison to the original:



In [1]:

    
def mcgratten_weighted_residual_vec(coeff):
    """
    A vectorized function of the McGratten problem.

    Variables:
        coeff   :   ndarray; a vector of coefficients for the parameterized
                    policy function
    Returns:
        ndarray containing the residuals.

    """
    #Unpack the arguments
    alpha, delta, I, L, k, abcissas, M, a, PIE = args

    #Reshape coefficients in case they are vector
    coeff = coeff.reshape(I, L)

    #Initialize the residuals
    RES = np.zeros((I*L))

    #For readability define the number of rows
    H = I*(L - 1)

    #Construct the vector of residual funcitons
    #Vectorize all loops
    #Scale and calculate capital
    x = np.kron(np.ones(H), abcissas)
    kt = scalup(x, np.kron(k[1:], np.ones(I*M)), np.kron(k[:-1], np.ones(I*M)))
    u = np.kron(np.ones(H), weights)
    bs1 = 0.5*(1 - x)
    bs2 = 0.5*(1 + x)
    #NOTE: bs is constant across elements, so simplify calculations later
    Bs1 = 0.5*(1 - abcissas)
    Bs2 = 0.5*(1 + abcissas)

    #Compute capital next period by projection
    kp = np.kron(coeff[:, :-1].reshape(H,), np.ones((M)))*bs1\
        + np.kron(coeff[:, 1:].reshape(H,), np.ones((M,)))*bs2

    #Compute consumption
    y = np.kron(a, np.ones((L - 1)*M))*kt**alpha
    c = y + (1 - delta)*kt - kp

    #Find indices for next periods element bounds
    ii = np.array([np.where(point >= np.append([0], k[:-1]))[0][-1]
                   for point in kp])
    ii[ii > 0] -= 1

    #Define element bounds for next period
    k1p = k[ii]
    k2p = k[ii + 1]

    #Calculate policy next period
    x = scaldwn(kp, k2p, k1p)
    bs1p = 0.5*(1 - x)
    bs2p = 0.5*(1 + x)

    #Initialize the summation
    sum1 = np.zeros((H*M))

    #Compute the sums
    #Calculate next period's policy, output, and consumption
    kpp = coeff[:, ii]*bs1p + coeff[:, ii+1]*bs2p
    yp = np.outer(a, kp**alpha)
    cp = yp + (1 - delta)*np.outer(np.ones(I), kp) - kpp

    #Calculate marginal product of capital and marginal utility
    dudcp = cp**(-gamma)
    dydkp = np.outer(a, kp**(alpha - 1))*alpha
    rhs = dudcp*(dydkp + 1 - delta)

    #Generate a block diagonal matrix for the expectation
    mats = [rhs[:, i*M*(L - 1): (i+1)*M*(L - 1)] for i in range(0, I)]
    temp1 = scipy.linalg.block_diag(*mats)
    temp2 = PIE.reshape(I*I)
    sum1 = np.dot(temp2, temp1)

    #Calculate the residual functions
    res = (beta*sum1 - c**(-gamma))*u
    res = res.reshape((H, M))
    RES[[i for i in range(0, I*L) if i not in
        range((L - 1), I*L, L)]] += np.dot(res, Bs1)
    RES[[i for i in range(0, I*L) if i not in
        range(0, I*L - 1, L)]] += np.dot(res, Bs2)

    return RES



In [9]:

    
time0 = time.time()
res = scipy.optimize.newton_krylov(mcgratten_weighted_residual, coeff0,
                                   method='lgmres', verbose=True, maxiter=1000,
                                   line_search='armijo')
print(time.time() - time0)

time0 = time.time()
res_vec = scipy.optimize.newton_krylov(mcgratten_weighted_residual_vec, coeff0,
                                       method='lgmres', verbose=True,
                                       maxiter=1000, line_search='armijo')
print(time.time() - time0)
print(np.linalg.norm(res - res_vec))









    



0:  |F(x)| = 65.3091; step 1; tol 0.164013
1:  |F(x)| = 21.2687; step 1; tol 0.0954504
2:  |F(x)| = 3.31136; step 1; tol 0.0218159
3:  |F(x)| = 0.119142; step 1; tol 0.00116509
4:  |F(x)| = 0.000200299; step 1; tol 2.54373e-06
5:  |F(x)| = 7.06869e-10; step 1; tol 1.12089e-11
0.6985089778900146
0:  |F(x)| = 65.8238; step 1; tol 0.167192
1:  |F(x)| = 21.1904; step 1; tol 0.0932732
2:  |F(x)| = 2.56984; step 1; tol 0.0132365
3:  |F(x)| = 0.0737766; step 1; tol 0.00074177
4:  |F(x)| = 4.05189e-05; step 1; tol 2.71468e-07
5:  |F(x)| = 3.6113e-11; step 1; tol 7.14917e-13
0.43918347358703613
0.250583369082

You'll notice that the norm is non-zero... ahhhhh!!!! That is your homework: figure out where the mistake is and fix it!