By Tyler Ransom, Duke University Social Science Research Institute
tyler.ransom@duke.edu
Let's examine how to use a new Julia
packaged called JuMP
to illustrate some of Julia's powerful resources for optimizing objective functions of all kinds.
For an overview of the JuMP
package, see here. Essentially, JuMP
makes use of Julia
's macros to greatly simplify construction and optimization of various problems.
Here are some features of JuMP
:
In this illustration, we will use JuMP for nonlinear programming problems (i.e. maximum likelihood estimation). We will be using the open-source optimizer Ipopt
(pronounced eye-PEE-opt)
So let's load the packages we'll need and get started!
In [ ]:
using JuMP
using Ipopt
Now let's create some data inside of a function:
In [1]:
function datagen(N,T)
## Generate data for a linear model to test optimization
srand(1234)
N = convert(Int64,N) #inputs to functions such as -ones- need to be integers!
T = convert(Int64,T) #inputs to functions such as -ones- need to be integers!
const n = convert(Int64,N*T) # use -const- as a way to declare a variable to be global (so other functions can access it)
# generate the Xs
const X = cat(2,ones(N*T,1),5+3*randn(N*T,1),rand(N*T,1),
2.5+2*randn(N*T,1),15+3*randn(N*T,1),
.7-.1*randn(N*T,1),5+3*randn(N*T,1),
rand(N*T,1),2.5+2*randn(N*T,1),
15+3*randn(N*T,1),.7-.1*randn(N*T,1),
5+3*randn(N*T,1),rand(N*T,1),2.5+2*randn(N*T,1),
15+3*randn(N*T,1),.7-.1*randn(N*T,1));
# generate the betas (X coefficients)
const bAns = [ 2.15; 0.10; 0.50; 0.10; 0.75; 1.2; 0.10; 0.50; 0.10; 0.75; 1.2; 0.10; 0.50; 0.10; 0.75; 1.2 ]
# generate the std dev of the errors
const sigAns = 0.3
# generate the Ys from the Xs, betas, and error draws
draw = 0 + sigAns*randn(N*T,1)
const y = X*bAns+draw
# return generated data so that other functions (below) have access
return X,y,bAns,sigAns,n
end
Out[1]:
Now let's evaluate the function so that the function outputs are in the current scope:
In [3]:
X,y,bAns,sigAns,n = datagen(1e4,5);
We can estimate the bAns
parameter vector using OLS:
In [4]:
bHat = (X'*X)\(X'*y);
sigHat = sqrt((y-X*bHat)'*(y-X*bHat)/(n-size(X,2)));
And now we can compare the estimates with the solution:
In [5]:
[round([bHat;sigHat],3) [bAns;sigAns]]
Out[5]:
The estimator performs well.
Now let's estimate this using maximum likelihood under the (valid) assumption that the errors in our datagen()
function are independently drawn from a normal distribution with mean 0 and standard deviation to be estimated.
Recall that the likelihood function for this scenario is: $$ L_{i} = \prod_{i=1}^{N} \frac{1}{\sqrt{2\pi\sigma^{2}}} \exp \left( -\frac{\left(y_{i}-x_{i}\beta\right)^{2}}{2\sigma^{2}}\right) $$
or, more simply,
$$ L_{i} = \prod_{i=1}^{N} \frac{1}{\sigma} \phi \left(\frac{y_{i}-x_{i}\beta}{\sigma}\right) $$where $\phi\left(\cdot\right)$ is the probability density function of the standard normal distribution.
In maximum likelihood estimation, we search for the parameters that maximize the log
likelihood function. Our function above becomes:
JuMP
requires the following syntax elements:
solve()
callWe will construct these line-by-line below.
First, let's decleare the model name and solver:
In [6]:
myMLE = Model(solver=IpoptSolver(tol=1e-6));
Now let's tell JuMP
which variables to search over. For the example above, this is the vector $\beta$ and the scalar $\sigma$.
In [7]:
@defVar(myMLE, b[i=1:size(X,2)]);
@defVar(myMLE, s>=0.0);
Now we will write the objective function, which is the log likelihood function from above. We will also tell JuMP
that we are maximizing.
In [8]:
@setNLObjective(myMLE, Max, (n/2)*log(1/(2*pi*s^2))-sum{(y[i]-sum{X[i,k]*b[k],
k=1:size(X,2)})^2, i=1:size(X,1)}/(2s^2));
Note that we specify the summation term over $i$ using sum{ ,i=1:size(X,1)}
. We also have to specify the linear algebra implied by X*b
as another sum
, this time over $k$. This is because JuMP
doesn't understand matrix multiplication. This last fact adds a little bit of syntax to the objective function, but overall it is very readable.
Now that we have all of our elements set up, we can issue a solve()
call and ask JuMP
to perform the optimization:
In [9]:
status = solve(myMLE)
Out[9]:
We see that the solver finished at a solution that is a local maximum. To store the estimated values of the parameters, we can use the getValue()
functions, applied to each separate parameter:
In [10]:
bOpt = getValue(b[:]);
sOpt = getValue(s);
And if we compare them with the solution, we see that we have indeed obtained the correct parameter estimates:
In [11]:
[round([bOpt;sOpt],3) [bAns;sigAns]]
Out[11]:
We can also return other helpful values such as the objective function value at the optimum, which is the log likelihood of the estimated model.
In [12]:
getObjectiveValue(myMLE)
Out[12]:
Additionally, we can obtain the Hessian matrix of the model, which serves as a key component to the variance-covariance matrix that is used for statistical inference. The syntax for this is a bit messy, but I will put it below for reference:
In [13]:
this_par = myMLE.colVal
m_eval = JuMP.JuMPNLPEvaluator(myMLE);
MathProgBase.initialize(m_eval, [:ExprGraph, :Grad, :Hess])
hess_struct = MathProgBase.hesslag_structure(m_eval)
hess_vec = zeros(length(hess_struct[1]))
numconstr = length(m_eval.m.linconstr) + length(m_eval.m.quadconstr) + length(m_eval.m.nlpdata.nlconstr)
dimension = length(myMLE.colVal)
MathProgBase.eval_hesslag(m_eval, hess_vec, this_par, 1.0, zeros(numconstr))
this_hess_ld = sparse(hess_struct[1], hess_struct[2], hess_vec, dimension, dimension)
hOpt = this_hess_ld + this_hess_ld' - sparse(diagm(diag(this_hess_ld)));
hOpt = -full(hOpt); #since we are maximizing
seOpt = sqrt(diag(full(hOpt)\eye(size(hOpt,1))));
The above code obtained the estimated hessian matrix, took the negative of it (because we are maximizing), and then obtained standard errors of our parameter estimates from the diagonal elements of the inverse of the negative hessian.
In [14]:
[round([bOpt;sOpt],3) round(seOpt,3)]
Out[14]:
We can compare these with the OLS standard errors and confirm that they are the same:
In [15]:
seHat = sqrt((sigHat^2).*diag((X'*X)\eye(size(X,2))));
[round(bHat,3) round(seHat,3)]
Out[15]:
Constraints can easily be added to the objective function. There are two ways of doing this:
@defVar
statement, e.g. @defVar(myMLE, s>=0.0);
restricts s
to be weakly positive.@defVar
statements, but before the @setNLObjective
statement. The syntax for this is, e.g. @addConstraint(myMLE, b[15] == 0)
which would restrict the 15th element of the b
vector to equal 0.JuMP
accepts up to literally thousands of constraints, and its simple syntax makes coding much easier.
Pros of JuMP
:
Julia
's features.Matlab
's fminunc
for a simple large-scale problem.Cons of JuMP
: