MAT 258A: Homework 3

Exercise 1

In this exercise, I test the code of 'pure' Newton (newtmin), Newton method with Armijo backtracking (newtmin1), Newton method with Armijo backtracking and two different way of treatment of the Hessian (newtmin2 and newtmin3) and BFGS method with treatment of the initial Hessian (mybfgs)

$ \begin{array}{cccccc} &\textrm{pure Newton}&\textrm{newtmin1}&\textrm{newtmin2}&\textrm{newtmin3}&\textrm{GFGS}\\ 1&\surd &&&&\\ 2&\times &\times&\times&\times&\surd\\ 3&\surd &&&&\\ 4&\surd &&&&\\ 5&\surd &&&&\\ 6&\surd &&&&\\ 7&\times &\times&\times&\times&\surd\\ 8&\surd &&&&\\ 9&\surd &&&&\\ 10&\times &\times&\surd&&\\ 11&\times &\times&\times&\surd&\\ 12&\times &\times&\surd&&\\ 13&\surd &&&&\\ 14&\times &\times&\times&\times&\times\\ 15&\times &\times&\times&\times&\surd\\ 16&\surd &&&&\\ 17&\surd &&&&\\ 18&\times &\times&\times&\times&\times\\ \end{array} $

The above table shows the outcome of those methods, it is 'success' if the gradient's norm is relatively small, i.e., $\frac{\|g(x^*)\|}{\|g(x_0)\|}\leqslant10^{-5}$, this is a relatively high requirement and actually one can see that those 'failed' cases in BFGS and newtmin2 and newtmin3 also have a relatively small norm of the gradient. Below is technical details of the exercise.



In [1]:

    
using Toms566
using PyPlot



In [2]:

    
Pkg.test("Toms566")









    



INFO: Testing Toms566






    



No.  Name                             n        f(x0)     |∇f(x0)| cond(∇²f(x0))
  





    



INFO: Toms566 tests passed






    



1  Hellical valley                  3     2.50e+03     1.88e+03     9.82e+00
  2  Bigg's EXP6                      6     7.79e-01     2.55e+00     1.30e+01
  3  Gaussian                         3     3.89e-06     7.45e-03     5.10e+01
  4  Powell                           2     1.14e+00     2.00e+04     1.37e+08
  5  Box 3-dim                        3     1.03e+03     1.49e+02     1.19e+02
  6  Variably dimensioned            40     9.39e+10     1.01e+11     1.65e+04
  7  Watson                           9     3.00e+01     1.78e+02     1.63e+02
  8  Penalty I                       60     5.45e+09     8.02e+07     4.24e+00
  9  Penalty II                      65     2.87e+05     3.28e+05     9.84e+01
 10  Brown badly scaled               2     1.00e+12     2.00e+06     1.00e+00
 11  Brown and Denis                  4     7.93e+06     2.14e+06     6.93e+02
 12  Gulf research and development    3     1.21e+01     3.97e+01     4.65e+04
 13  Trigonometric                   40     2.01e-03     5.30e-02     1.57e+02
 14  Extended rosenbrock             40     4.84e+02     1.04e+03     9.90e+01
 15  Extended Powell singular        60     3.23e+03     1.78e+03     2.27e+01
 16  Beale                            2     1.42e+01     2.78e+01     7.97e+00
 17  Wood                             4     1.92e+04     1.64e+04     1.69e+02
 18  Chebyquad                       50     1.39e-02     2.65e+00     6.30e+02



In [3]:

    
function newtmin( obj, x0; maxIts=100, optTol=1e-6)
     #"Pure" Newton method
     # Minimize a function f using Newton’s method.
     # obj:  a function that evaluates the objective value,
     # gradient, and Hessian at a point x, i.e.,
     # (f, g, H) = obj(x)
     # x0: starting point.
     # maxIts (optional): maximum number of iterations.
     # optTol (optional): optimality tolerance based on
     #                    ||grad(x)|| <= optTol*||grad(x0)||
    x=x0
    status = 0
    its = 0
    (f0,g0,H0)=obj(x0)
    (f,g,H)=obj(x)
    while status != 1
        x = x-inv(H)*g
        (f,g,H)=obj(x)
        its = its+1
        if norm(g)<= optTol*norm(g0)
            status = 1
        end
        if its>maxIts
            status = 1
        end
    end
return (x, its)
end









    Out[3]:





newtmin (generic function with 1 method)



In [4]:

    
v=randn(10,1)
A=randn(10,10)
inv(A)
eye(10)









    Out[4]:





10x10 Array{Float64,2}:
 1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0



In [5]:

    
function obj(x)
    f=1/2*norm(x-v)^2
    g=x-v
    H=eye(10)
    return (f,g,H)
end









    Out[5]:





obj (generic function with 1 method)



In [6]:

    
x0=zeros(10,1)
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(x-v)









    Out[6]:





0.0



In [7]:

    
p = Problem(1) # Hellical Valley









    Out[7]:





Toms566.Problem:
 name: Hellical valley
 n: 3



In [10]:

    
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end









    Out[10]:





obj (generic function with 1 method)



In [9]:

    
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[9]:





1.0002073324421774e-8



In [10]:

    
p = Problem(2)
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[10]:





11735.764558124118



In [6]:

    
p = Problem(3) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[6]:





1.3020078124065462e-8



In [12]:

    
p = Problem(4) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[12]:





1.0155390280377853e-9



In [13]:

    
p = Problem(5) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[13]:





8.68652479136446e-7



In [14]:

    
p = Problem(6) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[14]:





6.720802512276261e-7



In [15]:

    
p = Problem(7) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[15]:





9.060968995677416e18



In [16]:

    
p = Problem(8) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[16]:





9.223518288696248e-7



In [17]:

    
p = Problem(9) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[17]:





7.262191147715546e-7



In [18]:

    
p = Problem(10) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[18]:





7.083640796727696e6



In [19]:

    
p = Problem(11) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[19]:





1.9734314374508443e8



In [20]:

    
p = Problem(12) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[20]:





2.6556527256401906e239



In [21]:

    
p = Problem(13) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[21]:





5.50382562199281e-8



In [22]:

    
p = Problem(14) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[22]:





8.050634401598488e9



In [23]:

    
p = Problem(15) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[23]:





3.242292834706322e23



In [24]:

    
p = Problem(16) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[24]:





0.0



In [25]:

    
p = Problem(17) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[25]:





9.282369975556412e-7



In [ ]:

    
p = Problem(18) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))



In [4]:

    
function newtmin1( obj, x0; maxIts=100, optTol=1e-6)
     # Newton method with Armijo backtracking
     # Minimize a function f using Newton’s method.
     # obj:  a function that evaluates the objective value,
     # gradient, and Hessian at a point x, i.e.,
     # (f, g, H) = obj(x)
     # x0: starting point.
     # maxIts (optional): maximum number of iterations.
     # optTol (optional): optimality tolerance based on
     #                    ||grad(x)|| <= optTol*||grad(x0)||
    mu=1e-4
    x=x0
    status = 0
    its = 0
    (f0,g0,H0)=obj(x0)
    (f,g,H)=obj(x)
    while status != 1
        alpha = 1
        xnew = x-alpha*inv(H)*g
        (fnew,gnew,Hnew)=obj(xnew)
        sts =-fnew+f-alpha*mu*g'*inv(H)*g
        while sts[1]<0
            alpha=alpha/2
            xnew = x-alpha*inv(H)*g
            (fnew,gnew,Hnew)=obj(xnew)
            sts=-fnew+f-alpha*mu*g'*inv(H)*g
        end
        x = x-alpha*inv(H)*g
        (f,g,H)=obj(x)
        its = its+1
        if norm(g)<= optTol*norm(g0)
            status = 1
        end
        if its>maxIts
            status = 1
        end
    end
return (x, its)
end









    Out[4]:





newtmin1 (generic function with 1 method)



In [8]:

    
p = Problem(2) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin1( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[8]:





1.0000000000000004



In [13]:

    
p = Problem(7) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin1( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[13]:





0.863325933874392



In [14]:

    
p = Problem(10) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin1( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[14]:





250000.25000099998



In [15]:

    
p = Problem(11) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin1( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[15]:





0.005490328480657573



In [16]:

    
p = Problem(12) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin1( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[16]:





0.16957718015199827



In [19]:

    
p = Problem(14) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin1( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[19]:





0.5779385738800985



In [20]:

    
p = Problem(15) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin1( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[20]:





0.45464058777161137



In [21]:

    
p = Problem(18) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin1( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[21]:





0.933985346920556



In [5]:

    
function newtmin2( obj, x0; maxIts=100, optTol=1e-6)
    # Newton method with Armijo backtracking and some treatment with Hessian, treatment 1
     # Minimize a function f using Newton’s method.
     # obj:  a function that evaluates the objective value,
     # gradient, and Hessian at a point x, i.e.,
     # (f, g, H) = obj(x)
     # x0: starting point.
     # maxIts (optional): maximum number of iterations.
     # optTol (optional): optimality tolerance based on
     #                    ||grad(x)|| <= optTol*||grad(x0)||
    mu=1e-4
    epsilon=0.01
    x=x0
    status = 0
    its = 0
    (f0,g0,H0)=obj(x0)
    (f,g,H)=obj(x)
    (V,S)=eig(H)
    bar=ones(size(H,1))*epsilon
    bar=Diagonal(bar)
    bH=S*max(bar,Diagonal(V))*S'
    while status != 1
        alpha = 1
        xnew = x-alpha*inv(bH)*g
        (fnew,gnew,Hnew)=obj(xnew)
        sts =-fnew+f-alpha*mu*g'*inv(bH)*g
        while sts[1]<0
            alpha=alpha/2
            xnew = x-alpha*inv(bH)*g
            (fnew,gnew,Hnew)=obj(xnew)
            sts=-fnew+f-alpha*mu*g'*inv(bH)*g
        end
        x = x-alpha*inv(bH)*g
        (f,g,H)=obj(x)
        (V,S)=eig(H)
        bar=ones(size(H,1))*epsilon
        bar=Diagonal(bar)
        bH=S*max(bar,Diagonal(V))*S'
        its = its+1
        if norm(g)<= optTol*norm(g0)
            status = 1
        end
        if its>maxIts
            status = 1
        end
    end
return (x, its)
end









    Out[5]:





newtmin2 (generic function with 1 method)



In [72]:

    
p = Problem(2) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin2( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[72]:





0.15069684447784829



In [73]:

    
p = Problem(7) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin2( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[73]:





0.0698683411045317



In [74]:

    
p = Problem(10) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin2( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[74]:





8.404388296420837e-13



In [77]:

    
p = Problem(11) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin2( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[77]:





0.22947981066293166



In [80]:

    
p = Problem(12) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin2( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[80]:





1.5231878310266746e-5



In [83]:

    
p = Problem(14) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin2( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[83]:





0.9826622874404953



In [86]:

    
p = Problem(15) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin2( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[86]:





0.834929873113459



In [89]:

    
p = Problem(18) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin2( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[89]:





0.32465019085415453



In [6]:

    
function newtmin3( obj, x0; maxIts=100, optTol=1e-6)
    # Newton method with Armijo backtracking and some treatment with Hessian, treatment 2
     # Minimize a function f using Newton’s method.
     # obj:  a function that evaluates the objective value,
     # gradient, and Hessian at a point x, i.e.,
     # (f, g, H) = obj(x)
     # x0: starting point.
     # maxIts (optional): maximum number of iterations.
     # optTol (optional): optimality tolerance based on
     #                    ||grad(x)|| <= optTol*||grad(x0)||
    mu=1e-4
    epsilon=0.01
    x=x0
    status = 0
    its = 0
    (f0,g0,H0)=obj(x0)
    (f,g,H)=obj(x)
    (V,S)=eig(H)
    bar=ones(size(H,1))*epsilon
    bar=Diagonal(bar)
    bH=S*max(bar,abs(Diagonal(V)))*S'
    while status != 1
        alpha = 1
        xnew = x-alpha*inv(bH)*g
        (fnew,gnew,Hnew)=obj(xnew)
        sts =-fnew+f-alpha*mu*g'*inv(bH)*g
        while sts[1]<0
            alpha=alpha/2
            xnew = x-alpha*inv(bH)*g
            (fnew,gnew,Hnew)=obj(xnew)
            sts=-fnew+f-alpha*mu*g'*inv(bH)*g
        end
        x = x-alpha*inv(bH)*g
        (f,g,H)=obj(x)
        (V,S)=eig(H)
        bar=ones(size(H,1))*epsilon
        bar=Diagonal(bar)
        bH=S*max(bar,abs(Diagonal(V)))*S'
        its = its+1
        if norm(g)<= optTol*norm(g0)
            status = 1
        end
        if its>maxIts
            status = 1
        end
    end
return (x, its)
end









    Out[6]:





newtmin3 (generic function with 1 method)



In [104]:

    
p = Problem(2) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin3( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[104]:





0.04974099571706037



In [96]:

    
p = Problem(7) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin3( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[96]:





0.003400009283027661



In [97]:

    
p = Problem(10) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin3( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[97]:





1.789679515697544e-13



In [98]:

    
p = Problem(11) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin3( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[98]:





3.3822921362972896e-7



In [99]:

    
p = Problem(14) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin3( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[99]:





0.011037420345164398



In [100]:

    
p = Problem(15) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin3( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[100]:





0.001195675544522462



In [101]:

    
p = Problem(18) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=newtmin3( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[101]:





0.0407760590874033



In [7]:

    
function mybfgs( obj, x0; maxIts=100, optTol=1e-6)
    # BFGS method with Armijo backtracking and some treatment with initial Hessian, treatment 2
     # Minimize a function f using Newton’s method.
     # obj:  a function that evaluates the objective value,
     # gradient, and Hessian at a point x, i.e.,
     # (f, g, H) = obj(x)
     # x0: starting point.
     # maxIts (optional): maximum number of iterations.
     # optTol (optional): optimality tolerance based on
     #                    ||grad(x)|| <= optTol*||grad(x0)||
    mu=1e-4
    epsilon=0.01
    x=x0
    status = 0
    its = 0
    (f0,g0,H0)=obj(x0)
    (f,g,H)=obj(x)
    (V,S)=eig(H)
    bar=ones(size(H,1))*epsilon
    bar=Diagonal(bar)
    bH=S*max(bar,abs(Diagonal(V)))*S'
     Hk=bH
    while status != 1
        mp=-Hk*g
        #////////////////////////////////////////////////////
        #line search
        alpha = 1
        xnew = x+alpha*mp
        (fnew,gnew,Hnew)=obj(xnew)
        sts =-fnew+f+alpha*mu*g'*mp
        while sts[1]<0
            alpha=alpha/2
            xnew = x+alpha*mp
            (fnew,gnew,Hnew)=obj(xnew)
            sts=-fnew+f+alpha*mu*g'*mp
        end
        #////////////////////////////////////////////////////
        sk=alpha*mp
        yk=gnew-g
        tt1=yk'*sk
        tt1=tt1[1]
        rho=1/tt1
        Hk=(eye(size(Hk,1))-rho*sk*yk')*Hk*(eye(size(Hk,1))-rho*yk*sk')+rho*sk*sk'
        x = x+alpha*mp
        (f,g,H)=obj(x)
        its = its+1
        if norm(g)<= optTol*norm(g0)
            status = 1
        end
        if its>maxIts
            status = 1
        end
    end
return (x, its)
end









    Out[7]:





mybfgs (generic function with 1 method)



In [8]:

    
p = Problem(18) 
function obj(x)
    f = p.obj(x)   # objective value at x
    g = p.grd(x)   # gradient at x
    H = p.hes(x)   # Hessian at x
    return (f,g,H)
end
x0=p.x0 
(x,it)=mybfgs( obj, x0; maxIts=100, optTol=1e-6)
norm(p.grd(x))/norm(p.grd(x0))









    Out[8]:





0.004652492176771119

Exercise 2

Below is the staff about Exercise 2, one can see that inorder to get the same level of accuracy, the steepest descent method I used in homework 2 requires 262 steps, but mybfgs only need 8 steps. It is much faster than the method in homework 2.



In [9]:

    
data = readcsv("binary.csv");
admit=data[2:401,1];
gre=data[2:401,2];
gpa=data[2:401,3];
admit=convert(Array{Float64,1}, admit);
gre=convert(Array{Float64,1}, gre)/800;
gpa=convert(Array{Float64,1}, gpa);
A=[gre gpa];
u=A;
y=admit;



In [10]:

    
function logf(x,y,u)
    alpha=x[1:length(x)-1]
    beta=x[length(x)]
    zz=ones(size(y))
    -(u*alpha+beta)'*y+(log(1+exp(u*alpha+beta)))'*zz
    #xx=0
    #for i =1:length(y)
     #   xx=xx+y[i]*(alpha'*u[i,:]'+beta)-log(1+exp(alpha'*u[i,:]'+beta))
    #end
    #return -xx
end









    Out[10]:





logf (generic function with 1 method)



In [11]:

    
function logg(x,y,u) 
    alpha=x[1:length(x)-1]
    beta=x[length(x)]
    zz=ones(size(y))
    upp=u'*y-u'*(exp(u*alpha+beta)./(1+exp(u*alpha+beta)))
    lpp=zz'*y-zz'*(exp(u*alpha+beta)./(1+exp(u*alpha+beta)))
        #upp1=0
        #upp2=0
        #lpp=0
    #for i =1:length(y)
     #   upp1=upp1+y[i]*u[i,1]-exp(alpha[1]*u[i,1]+alpha[2]*u[i,2]+beta)/(1+exp(alpha[1]*u[i,1]+alpha[2]*u[i,2]+beta))*u[i,1]
     #   upp2=upp2+y[i]*u[i,2]-exp(alpha[1]*u[i,1]+alpha[2]*u[i,2]+beta)/(1+exp(alpha[1]*u[i,1]+alpha[2]*u[i,2]+beta))*u[i,2]
     #   lpp=lpp+y[i]-exp(alpha'*u[i,:]'+beta)/(1+exp(alpha'*u[i,:]'+beta))
    #end
    return [-upp; -lpp]
end









    Out[11]:





logg (generic function with 1 method)



In [20]:

    
function logH(x,y,u)
    alpha=x[1:length(x)-1]
    beta=x[length(x)]
    zz=ones(size(y))
    myexp=exp(u*alpha+beta)
    lr=-zz'*(exp(u*alpha+beta)./(1+exp(u*alpha+beta)).^2)
    ur=-u'*(exp(u*alpha+beta)./(1+exp(u*alpha+beta)).^2)
    ll=ur'
    ul=zeros(2,2)
    for i=1:length(y)
        ul=ul-u[i,:]'*u[i,:]*myexp[i]/(1+myexp[i])^2
    end
    return [-ul ur;ll lr]
end









    Out[20]:





logH (generic function with 1 method)



In [21]:

    
function obj(x)
    f = logf(x,y,u)   # objective value at x
    g = logg(x,y,u)   # gradient at x
    H = logH(x,y,u)   # Hessian at x
    return (f,g,H)
end
x0=[0.0,0.0,0.0] 
(x,it)=mybfgs( obj, x0; maxIts=100, optTol=1e-6)









    Out[21]:





([2.152545051902534,0.7546868120233584,-4.9493772373034925],8)



In [24]:

    
alpha=x[1:2];
beta=x[3];
plot(gre[admit.==1]*800,gpa[admit.==1],"+")
plot(gre[admit.==0]*800,gpa[admit.==0],"o")
rrr=0.6*800:1/100:800
ry=-alpha[1]/alpha[2]*rrr/800+(-1/2-beta)/alpha[2]
plot(rrr,ry)









    












    Out[24]:





1-element Array{Any,1}:
 PyObject <matplotlib.lines.Line2D object at 0x7efe56953390>



In [ ]: