Trust region methods

In this notebook we'll discuss what a general trust region method is and algorithms for approximately solving the trust region problem based on the Cauchy point. We'll also discuss a conjugate-gradient trust region algorithm.

Code for the algorithms and functions test are available in the following files:


In [5]:
include("../optimizers/trustregion.jl")
include("../utils/functions.jl")
using Gadfly

Notation

Let $f_k := f(x_k)$ be the value of the function at iterate $x_k$, and let $g_k := \nabla f_k$, be the value of the gradient at iterate $x_k$.

Trust region overview

Recall the quadratic model: $$m_k(p) = f_k + g_k^\top p + p^\top B_k p,$$ which comes from the second-order Taylor expansion of $f$.

Here $B_k$ is some symmetric matrix. When $B_k$ is the exact Hessian, we get the trust region Newton method.

We first solve the trust region subproblem: $$\min_p m_k(p) \text{ s.t. } ||p|| \leq \Delta_k,$$ where $\Delta_k$ is the trust region radius. In practice, we solve the subproblem approximately -- see the next section for more details.

We define the following ratio at each iteration $k$: $$ \rho_k = \frac{f(x_k) - f(x_k + p_k) }{ m_k(0) - m_k(p) },$$ where the numerator gives the actual reduction, difference of the function at the current step and the next step, and the denominator is the predicted reduction, i.e., the reduction in $f$ predicted by the model function.

The predicted reduction $m_k(0) - m_k(p)$ is always nonnegative. So, the value of the actual reduction tells us how to adjust the trust region model. If the actual reduction $$f(x_k) - f(x_k + p_k) < 0,$$ then the new objective value $f(x_k + p_k)$ is greater than the current value of $f(x_k)$, so this step should be rejected.

Otherwise, if the actual reduction is highly positive, we can increase the trust region. If it's close to zero or negative, i.e., the model in that region doesn't match the function well, we want to decrease the trust region radius. If the ratio is close to 1, that means the model agrees well with the function, and we keep the ratio the same. That gives rise to the trust region algorithm.

Trust region adjustment pseudocode

To summarize, consider the following sketch of an algorithm (see the trust region implementation code for further details):

  1. Obtain $p_k$ by solving the trust region subproblem approximately (see below)
  2. Adjust the trust region radius

    1. Evaluate the ratio $$ \rho_k = \frac{f(x_k) - f(x_k + p_k) }{ m_k(0) - m_k(p) }$$
    2. If $\rho_k < \frac{1}{4}$:
      1. Decrease the trust region radius: $\Delta_{k+1} = \frac{1}{4}\Delta_k$
    3. Else:
      1. If $\rho_k > \frac{3}{4}$ and $||p_k|| = \Delta_k$:
        1. Increase the trust region radius: $\Delta_{k+1} = \min(2\Delta_k, \hat{\Delta})$
      2. Else:
        1. Keep the trust region radius the same: $\Delta_{k+1} = \Delta_{k}$
  3. Update the iterate

    1. If $\rho_k > \eta$, accept the step:
      1. $x_{k+1} = x_k + p_k$
    2. Else:
      1. Reject the step and stay at the same value $x_{k+1} = x_k$

Here $\eta$ is a parameter in $[0, \frac{1}{4})$, and $\hat{\Delta} > 0$ is an upper bound on how big the trust region radius can get. We choose some initial trust region radius $\Delta_0 \in (0,\hat{\Delta})$. Note that the radius is only increased if $||p_k||$ reaches the boundary of the trust region.

Approximate solutions for solving the trust-region subproblem

Although we'd ideally solve the subproblem for the step $p$ exactly, in order to get global convergence, we only need to solve it approximately such that we get a sufficient reduction in the model.

The Cauchy point

One approximation to the trust region problem is to solve the easier problem of the linear model $$l_k(p) := f_k + g_k^\top p_k,$$ as opposed to the quadratic model $m_k(p)$ which has a second-order term.

We can quantify this in terms of the Cauchy point, denoted as $$p_k^c := \tau_k p_k^s.$$

Here the Cauchy point is a product of two terms:

  1. $p_k^s$ is the vector that solves the linear version of the subproblem $$p_k^s = \arg\min_p f_k + g_k^T p \text{ s.t. } ||p|| \leq \Delta_k$$
  2. $\tau_k >0$ minimizes the model $m_k(\tau p_k^s)$ such that it satisfies the trust-region bound $$\tau_k = \arg\min_{\tau > 0} m_k(\tau p_k^s) \text{ s.t. } ||\tau p_k^s|| \leq \Delta_k.$$

The closed-form expression of the Cauchy point is given by $$p_k^c = -\tau_k \frac{\Delta_k}{||g_k||} g_k,$$ where $$\tau_k = \begin{cases} 1 & \text{ if } g_k^\top B_k g_k \leq 0 \\ \min(\frac{||g_k||^3}{(\Delta_k g_k^\top B_k g_k)}, 1) &\text{ otherwise } \end{cases} $$

So, in the algorithm, whenever we accept the Cauchy point, we'd update our iterate as follows $$ x_{k+1} = x_k + p_k^c.$$

While the Cauchy step is inexpensive to compute and leads to a globally convergent trust-region method, it performs poorly even if an optimal step length is used at each iteration, as it is essentially the steepest descent method with a particular step length.

The Cauchy point doesn't rely heavily on the matrix $B_k$ (it's only used in computing the length of the step) -- an improvement on the Cauchy point will make use of $B_k$.

The dogleg method

Many trust region approximation algorithms are designed to improve upon the Cauchy point. The dogleg method is one such method. If the Cauchy point is on the boundary, we get a large decrease and so we will accept that step. Otherwise, the Cauchy point is on the interior and we want to instead take a "Newton" step $p_B = -B^{-1} g$, where the matrix $B$ only needs to be nonsingular (instead of positive definite).

The dogleg method combines two steps: \begin{align*} p^B &= -B^{-1} g \\ p^U &= -\frac{g^\top g}{g^\top B g} g \end{align*}

Note that $p^U$ is the constrained minimizer of $m_k$ along the steepest descent direction $-g_k$.

The dogleg path has the following trajectory: \begin{align*} \tilde{p}(\tau) = \begin{cases} \tau p^U & 0 \leq \tau \leq 1 \\ p^U + (\tau-1)(p^B - p^U) &1\leq\tau\leq 2 \end{cases} \end{align*}

We can compute $\tau$ by solving the following scalar quadratic equation: $$||p^U + (\tau-1) (p^B - p^U) ||^2 = \Delta^2,$$ which we can solve using the quadratic formula.

A quick summary:

  1. if Cauchy point is on the boundary, i.e., $g_k^\top B_k g_k \leq 0$:
    1. Use the Cauchy point $p_k^c$
  2. if Cauchy point is in the interior, i.e., $g_k^\top B_k g_k > 0$
    1. Use dogleg path $\tilde{p}$.

TL;DR: we'll have at least the decrease of the Cauchy point, and this allows for a Newton point direction when we're close to the solution.

Conjugate-gradient (Steihaug)

Global convergence

Testing the optimization methods

Fenton's function


In [13]:
xvals = trust_region([3.;4.], 6, 1, 0.1, fenton, fenton_g, fenton_h, 
    2000, "dogleg");
cvals = trust_region([3.;4.], 6, 1, 0.1, fenton, fenton_g, fenton_h, 
    2000, "cg_steihaug");

println(xvals[end])


Number of indefinite fixes 0
Number of iterations: 6
Number of indefinite fixes 0
Number of iterations: 7
[1.7434520869871946,2.0296947100002467]

In [14]:
nsamps = length(xvals)
nsamps2 = length(cvals)

fx = [fenton(xvals[i]) for i in 1:nsamps]
fx2 = [fenton(cvals[i]) for i in 1:nsamps2]


Gadfly.plot(layer(x=1:nsamps, y=fx, Geom.line, Theme(default_color=color("blue"))),
layer(x=1:nsamps2, y=fx2, Geom.line, Theme(default_color=color("red"))),
#layer(x=1:nsamps3, y=fx3, Geom.line, Theme(default_color=color("orange"))),
Guide.xlabel("iteration"), Guide.ylabel("f(x)"), Guide.title("Value of function"),
Guide.manual_color_key("Legend", ["Newton dogleg", "Steihaug-CG", "quasi-Newton SR1"], ["blue", "red", "orange"]),
Scale.x_log10, Scale.y_log10)


Out[14]:
iteration 10-1.25 10-1.00 10-0.75 10-0.50 10-0.25 100.00 100.25 100.50 100.75 101.00 101.25 101.50 101.75 102.00 102.25 10-1.00 10-0.95 10-0.90 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 101.65 101.70 101.75 101.80 101.85 101.90 101.95 102.00 10-1 100 101 102 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 Newton dogleg Steihaug-CG quasi-Newton SR1 Legend 100.10 100.12 100.14 100.16 100.18 100.20 100.22 100.24 100.26 100.28 100.30 100.32 100.34 100.36 100.38 100.40 100.42 100.44 100.46 100.48 100.50 100.000 100.005 100.010 100.015 100.020 100.025 100.030 100.035 100.040 100.045 100.050 100.055 100.060 100.065 100.070 100.075 100.080 100.085 100.090 100.095 100.100 100.105 100.110 100.115 100.120 100.125 100.130 100.135 100.140 100.145 100.150 100.155 100.160 100.165 100.170 100.175 100.180 100.185 100.190 100.195 100.200 100.205 100.210 100.215 100.220 100.225 100.230 100.235 100.240 100.245 100.250 100.255 100.260 100.265 100.270 100.275 100.280 100.285 100.290 100.295 100.300 100.305 100.310 100.315 100.320 100.325 100.330 100.335 100.340 100.345 100.350 100.355 100.360 100.365 100.370 100.375 100.380 100.385 100.390 100.395 100.400 100.405 100.410 100.415 100.420 100.425 100.430 100.435 100.440 100.445 100.450 100.455 100.460 100.465 100.470 100.475 100.480 100.1 100.2 100.3 100.4 100.5 100.00 100.01 100.02 100.03 100.04 100.05 100.06 100.07 100.08 100.09 100.10 100.11 100.12 100.13 100.14 100.15 100.16 100.17 100.18 100.19 100.20 100.21 100.22 100.23 100.24 100.25 100.26 100.27 100.28 100.29 100.30 100.31 100.32 100.33 100.34 100.35 100.36 100.37 100.38 100.39 100.40 100.41 100.42 100.43 100.44 100.45 100.46 100.47 100.48 f(x) Value of function

In [17]:
nsamps = length(xvals)

grads = [norm(fenton_g(xvals[i]), 2) for i in 1:nsamps]
grads2 = [norm(fenton_g(cvals[i]), 2) for i in 1:nsamps2]
#grads3 = [norm(fenton_g(cvals[i]), 2) for i in 1:nsamps3]

Gadfly.plot(
layer(x=1:nsamps-2, y=grads[2:nsamps-1,:]./grads[1:nsamps-2,:], Geom.line, Theme(default_color=color("blue"))),
layer(x=1:nsamps2-2, y=grads2[2:nsamps2-1,:]./grads2[1:nsamps2-2,:], Geom.line, Theme(default_color=color("red"))),
#layer(x=1:nsamps3-1, y=grads3[2:nsamps3,:]./grads3[1:nsamps3-1,:], Geom.line, Theme(default_color=color("orange"))),
Guide.xlabel("iteration"), Guide.ylabel("gradient norm ratios"), Guide.title("gradient norm ratios"),
Guide.manual_color_key("Legend", ["Newton dogleg", "Steihaug-CG", "quasi-Newton SR1"], ["blue", "red", "orange"]),
Scale.x_log10, Scale.y_log10)


Out[17]:
iteration 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 10-1 100 101 102 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 Newton dogleg Steihaug-CG quasi-Newton SR1 Legend 10-11 10-10 10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 10-10.0 10-9.8 10-9.6 10-9.4 10-9.2 10-9.0 10-8.8 10-8.6 10-8.4 10-8.2 10-8.0 10-7.8 10-7.6 10-7.4 10-7.2 10-7.0 10-6.8 10-6.6 10-6.4 10-6.2 10-6.0 10-5.8 10-5.6 10-5.4 10-5.2 10-5.0 10-4.8 10-4.6 10-4.4 10-4.2 10-4.0 10-3.8 10-3.6 10-3.4 10-3.2 10-3.0 10-2.8 10-2.6 10-2.4 10-2.2 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 102.2 102.4 102.6 102.8 103.0 103.2 103.4 103.6 103.8 104.0 104.2 104.4 104.6 104.8 105.0 10-10 10-5 100 105 10-10.0 10-9.5 10-9.0 10-8.5 10-8.0 10-7.5 10-7.0 10-6.5 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 gradient norm ratios gradient norm ratios

In [16]:
Gadfly.plot(layer(x=1:nsamps, y=grads, Geom.line, Theme(default_color=color("blue"))),
layer(x=1:nsamps2, y=grads2, Geom.line, Theme(default_color=color("red"))),
#layer(x=1:nsamps3, y=grads3, Geom.line, Theme(default_color=color("orange"))),
    Guide.xlabel("iteration"), Guide.ylabel("gradient norm"), Guide.title("gradient norms"),
Guide.manual_color_key("Legend", ["Newton dogleg", "Steihaug-CG", "quasi-Newton SR1"], ["blue", "red", "orange"]),
    Scale.x_log10, Scale.y_log10)


Out[16]:
iteration 10-1.25 10-1.00 10-0.75 10-0.50 10-0.25 100.00 100.25 100.50 100.75 101.00 101.25 101.50 101.75 102.00 102.25 10-1.00 10-0.95 10-0.90 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 101.65 101.70 101.75 101.80 101.85 101.90 101.95 102.00 10-1 100 101 102 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 Newton dogleg Steihaug-CG quasi-Newton SR1 Legend 10-35 10-30 10-25 10-20 10-15 10-10 10-5 100 105 1010 1015 1020 10-30.0 10-29.5 10-29.0 10-28.5 10-28.0 10-27.5 10-27.0 10-26.5 10-26.0 10-25.5 10-25.0 10-24.5 10-24.0 10-23.5 10-23.0 10-22.5 10-22.0 10-21.5 10-21.0 10-20.5 10-20.0 10-19.5 10-19.0 10-18.5 10-18.0 10-17.5 10-17.0 10-16.5 10-16.0 10-15.5 10-15.0 10-14.5 10-14.0 10-13.5 10-13.0 10-12.5 10-12.0 10-11.5 10-11.0 10-10.5 10-10.0 10-9.5 10-9.0 10-8.5 10-8.0 10-7.5 10-7.0 10-6.5 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5 109.0 109.5 1010.0 1010.5 1011.0 1011.5 1012.0 1012.5 1013.0 1013.5 1014.0 1014.5 1015.0 10-40 10-20 100 1020 10-30 10-29 10-28 10-27 10-26 10-25 10-24 10-23 10-22 10-21 10-20 10-19 10-18 10-17 10-16 10-15 10-14 10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 107 108 109 1010 1011 1012 1013 1014 1015 gradient norm gradient norms

Rosenbrock function

The Rosenbrock function is defined by: $$f(x) = \sum_{i=1}^n [ (1-x_{2i-1})^2 + 10(x_{2i} - x_{2i-1}^2)^2 ]$$


In [18]:
xvals = trust_region(randn(100), 6, 3, 0.1, rosenbrock, 
    rosenbrock_g, rosenbrock_h, 1000, "dogleg");
cvals = trust_region(randn(100), 6, 3, 0.1, rosenbrock, 
    rosenbrock_g, rosenbrock_h, 1000, "cg_steihaug");


Number of indefinite fixes 8
Number of iterations: 8
Number of indefinite fixes 9
Number of iterations: 12

In [19]:
nsamps = length(xvals)
nsamps2 = length(cvals)

func = rosenbrock
func_g = rosenbrock_g

fx = [func(xvals[i]) for i in 1:nsamps]
fx2 = [func(cvals[i]) for i in 1:nsamps2]


Gadfly.plot(layer(x=1:nsamps, y=fx, Geom.line, Theme(default_color=color("blue"))),
layer(x=1:nsamps2, y=fx2, Geom.line, Theme(default_color=color("red"))),
#layer(x=1:nsamps3, y=fx3, Geom.line, Theme(default_color=color("orange"))),
Guide.xlabel("iteration"), Guide.ylabel("f(x)"), Guide.title("Value of function"),
Guide.manual_color_key("Legend", ["Newton dogleg", "Steihaug-CG", "quasi-Newton SR1"], 
    ["blue", "red", "orange"]),
Scale.x_log10, Scale.y_log10)


Out[19]:
iteration 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 10-1.50 10-1.45 10-1.40 10-1.35 10-1.30 10-1.25 10-1.20 10-1.15 10-1.10 10-1.05 10-1.00 10-0.95 10-0.90 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 101.65 101.70 101.75 101.80 101.85 101.90 101.95 102.00 102.05 102.10 102.15 102.20 102.25 102.30 102.35 102.40 102.45 102.50 102.55 102.60 102.65 102.70 102.75 102.80 102.85 102.90 102.95 103.00 10-2 100 102 104 10-1.5 10-1.4 10-1.3 10-1.2 10-1.1 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 102.1 102.2 102.3 102.4 102.5 102.6 102.7 102.8 102.9 103.0 Newton dogleg Steihaug-CG quasi-Newton SR1 Legend 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 107 108 109 10-4.0 10-3.8 10-3.6 10-3.4 10-3.2 10-3.0 10-2.8 10-2.6 10-2.4 10-2.2 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 102.2 102.4 102.6 102.8 103.0 103.2 103.4 103.6 103.8 104.0 104.2 104.4 104.6 104.8 105.0 105.2 105.4 105.6 105.8 106.0 106.2 106.4 106.6 106.8 107.0 107.2 107.4 107.6 107.8 108.0 10-5 100 105 1010 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 f(x) Value of function

In [26]:
nsamps = length(xvals)
nsamps2 = length(cvals)

grads = [norm(func_g(xvals[i]), 2) for i in 1:nsamps]
grads2 = [norm(func_g(cvals[i]), 2) for i in 1:nsamps2]
#grads3 = [norm(func_g(cvals[i]), 2) for i in 1:nsamps3]


Gadfly.plot(
layer(x=1:nsamps-2, y=grads[2:nsamps-1,:]./grads[1:nsamps-2,:], Geom.line, Theme(default_color=color("blue"))),
layer(x=1:nsamps2-2, y=grads2[2:nsamps2-1,:]./grads2[1:nsamps2-2,:], Geom.line, Theme(default_color=color("red"))),
#layer(x=1:nsamps3-1, y=grads3[2:nsamps3,:]./grads3[1:nsamps3-1,:], Geom.line, Theme(default_color=color("orange"))),
Guide.xlabel("iteration"), Guide.ylabel("gradient norm ratios"), Guide.title("gradient norm ratios"),
Guide.manual_color_key("Legend", ["Newton dogleg", "Steihaug-CG", "quasi-Newton SR1"], ["blue", "red", "orange"]),
Scale.x_log10, Scale.y_log10)


Out[26]:
iteration 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 10-1.50 10-1.45 10-1.40 10-1.35 10-1.30 10-1.25 10-1.20 10-1.15 10-1.10 10-1.05 10-1.00 10-0.95 10-0.90 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 101.65 101.70 101.75 101.80 101.85 101.90 101.95 102.00 102.05 102.10 102.15 102.20 102.25 102.30 102.35 102.40 102.45 102.50 102.55 102.60 102.65 102.70 102.75 102.80 102.85 102.90 102.95 103.00 10-2 100 102 104 10-1.5 10-1.4 10-1.3 10-1.2 10-1.1 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 102.1 102.2 102.3 102.4 102.5 102.6 102.7 102.8 102.9 103.0 Newton dogleg Steihaug-CG quasi-Newton SR1 Legend 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 10-4.0 10-3.9 10-3.8 10-3.7 10-3.6 10-3.5 10-3.4 10-3.3 10-3.2 10-3.1 10-3.0 10-2.9 10-2.8 10-2.7 10-2.6 10-2.5 10-2.4 10-2.3 10-2.2 10-2.1 10-2.0 10-1.9 10-1.8 10-1.7 10-1.6 10-1.5 10-1.4 10-1.3 10-1.2 10-1.1 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 10-4 10-2 100 102 10-4.0 10-3.8 10-3.6 10-3.4 10-3.2 10-3.0 10-2.8 10-2.6 10-2.4 10-2.2 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 gradient norm ratios gradient norm ratios

In [21]:
Gadfly.plot(layer(x=1:nsamps, y=grads, Geom.line, Theme(default_color=color("blue"))),
layer(x=1:nsamps2, y=grads2, Geom.line, Theme(default_color=color("red"))),
#layer(x=1:nsamps3, y=grads3, Geom.line, Theme(default_color=color("orange"))),
    Guide.xlabel("iteration"), Guide.ylabel("gradient norm"), Guide.title("gradient norms"),
Guide.manual_color_key("Legend", ["Newton dogleg", "Steihaug-CG", "quasi-Newton SR1"], ["blue", "red", "orange"]),
    Scale.x_log10, Scale.y_log10)


Out[21]:
iteration 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 10-1.50 10-1.45 10-1.40 10-1.35 10-1.30 10-1.25 10-1.20 10-1.15 10-1.10 10-1.05 10-1.00 10-0.95 10-0.90 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 101.65 101.70 101.75 101.80 101.85 101.90 101.95 102.00 102.05 102.10 102.15 102.20 102.25 102.30 102.35 102.40 102.45 102.50 102.55 102.60 102.65 102.70 102.75 102.80 102.85 102.90 102.95 103.00 10-2 100 102 104 10-1.5 10-1.4 10-1.3 10-1.2 10-1.1 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 102.1 102.2 102.3 102.4 102.5 102.6 102.7 102.8 102.9 103.0 Newton dogleg Steihaug-CG quasi-Newton SR1 Legend 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 107 108 109 10-4.0 10-3.8 10-3.6 10-3.4 10-3.2 10-3.0 10-2.8 10-2.6 10-2.4 10-2.2 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 102.2 102.4 102.6 102.8 103.0 103.2 103.4 103.6 103.8 104.0 104.2 104.4 104.6 104.8 105.0 105.2 105.4 105.6 105.8 106.0 106.2 106.4 106.6 106.8 107.0 107.2 107.4 107.6 107.8 108.0 10-5 100 105 1010 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 gradient norm gradient norms

Cute function

The cute function is given by $f: \mathbb{R}^n \rightarrow \mathbb{R}$ $$ f(x) = \sum_{i=1}^{n-4} (-4x_i+3)^2 + (x_i^2 + 2x_{i+1}^2 + 3x_{i+2}^2 + 4 x_{i+3}^2 + 5x_n^2)^2 $$


In [22]:
xvals = trust_region(ones(50)*10, 6, 3, 0.1, cute, cute_g, cute_h, 2000, "dogleg");
cvals = trust_region(ones(50)*10, 6, 3, 0.1, cute, cute_g, cute_h, 2000, "cg_steihaug");


Number of indefinite fixes 0
Number of iterations: 30
Number of indefinite fixes 0

In [23]:
nsamps = length(xvals)
nsamps2 = length(cvals)

func = cute
func_g = cute_g

fx = [func(xvals[i]) for i in 1:nsamps]
fx2 = [func(cvals[i]) for i in 1:nsamps2]


Gadfly.plot(layer(x=1:nsamps, y=fx, Geom.line, Theme(default_color=color("blue"))),
layer(x=1:nsamps2, y=fx2, Geom.line, Theme(default_color=color("red"))),
#layer(x=1:nsamps3, y=fx3, Geom.line, Theme(default_color=color("orange"))),
Guide.xlabel("iteration"), Guide.ylabel("f(x)"), Guide.title("Value of function"),
Guide.manual_color_key("Legend", ["Newton dogleg", "Steihaug-CG", "quasi-Newton SR1"], ["blue", "red", "orange"]),
Scale.x_log10, Scale.y_log10)


Out[23]:
iteration 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 10-2.0 10-1.9 10-1.8 10-1.7 10-1.6 10-1.5 10-1.4 10-1.3 10-1.2 10-1.1 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 102.1 102.2 102.3 102.4 102.5 102.6 102.7 102.8 102.9 103.0 103.1 103.2 103.3 103.4 103.5 103.6 103.7 103.8 103.9 104.0 10-2 100 102 104 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 102.2 102.4 102.6 102.8 103.0 103.2 103.4 103.6 103.8 104.0 Newton dogleg Steihaug-CG quasi-Newton SR1 Legend 10-12 10-10 10-8 10-6 10-4 10-2 100 102 104 106 108 1010 1012 1014 1016 1018 1020 1022 10-10.0 10-9.5 10-9.0 10-8.5 10-8.0 10-7.5 10-7.0 10-6.5 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5 109.0 109.5 1010.0 1010.5 1011.0 1011.5 1012.0 1012.5 1013.0 1013.5 1014.0 1014.5 1015.0 1015.5 1016.0 1016.5 1017.0 1017.5 1018.0 1018.5 1019.0 1019.5 1020.0 10-10 100 1010 1020 10-10 10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 107 108 109 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 f(x) Value of function
Number of iterations: 32

In [24]:
nsamps = length(xvals)

grads = [norm(func_g(xvals[i]), 2) for i in 1:nsamps]
grads2 = [norm(func_g(cvals[i]), 2) for i in 1:nsamps2]
#grads3 = [norm(func_g(cvals[i]), 2) for i in 1:nsamps3]


Gadfly.plot(
layer(x=1:nsamps-1, y=grads[2:nsamps,:]./grads[1:nsamps-1,:], Geom.line, Theme(default_color=color("blue"))),
layer(x=1:nsamps2-1, y=grads2[2:nsamps2,:]./grads2[1:nsamps2-1,:], Geom.line, Theme(default_color=color("red"))),
#layer(x=1:nsamps3-1, y=grads3[2:nsamps3,:]./grads3[1:nsamps3-1,:], Geom.line, Theme(default_color=color("orange"))),
Guide.xlabel("iteration"), Guide.ylabel("gradient norm ratios"), Guide.title("gradient norm ratios"),
Guide.manual_color_key("Legend", ["Newton dogleg", "Steihaug-CG", "quasi-Newton SR1"], ["blue", "red", "orange"]),
Scale.x_log10, Scale.y_log10)


Out[24]:
iteration 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 10-2.0 10-1.9 10-1.8 10-1.7 10-1.6 10-1.5 10-1.4 10-1.3 10-1.2 10-1.1 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 102.1 102.2 102.3 102.4 102.5 102.6 102.7 102.8 102.9 103.0 103.1 103.2 103.3 103.4 103.5 103.6 103.7 103.8 103.9 104.0 10-2 100 102 104 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 102.2 102.4 102.6 102.8 103.0 103.2 103.4 103.6 103.8 104.0 Newton dogleg Steihaug-CG quasi-Newton SR1 Legend 10-11 10-10 10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 10-10.0 10-9.8 10-9.6 10-9.4 10-9.2 10-9.0 10-8.8 10-8.6 10-8.4 10-8.2 10-8.0 10-7.8 10-7.6 10-7.4 10-7.2 10-7.0 10-6.8 10-6.6 10-6.4 10-6.2 10-6.0 10-5.8 10-5.6 10-5.4 10-5.2 10-5.0 10-4.8 10-4.6 10-4.4 10-4.2 10-4.0 10-3.8 10-3.6 10-3.4 10-3.2 10-3.0 10-2.8 10-2.6 10-2.4 10-2.2 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 102.2 102.4 102.6 102.8 103.0 103.2 103.4 103.6 103.8 104.0 104.2 104.4 104.6 104.8 105.0 10-10 10-5 100 105 10-10.0 10-9.5 10-9.0 10-8.5 10-8.0 10-7.5 10-7.0 10-6.5 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 gradient norm ratios gradient norm ratios

In [25]:
Gadfly.plot(layer(x=1:nsamps, y=grads, Geom.line, Theme(default_color=color("blue"))),
layer(x=1:nsamps2, y=grads2, Geom.line, Theme(default_color=color("red"))),
#layer(x=1:nsamps3, y=grads3, Geom.line, Theme(default_color=color("orange"))),
    Guide.xlabel("iteration"), Guide.ylabel("gradient norm"), Guide.title("gradient norms"),
Guide.manual_color_key("Legend", ["Newton dogleg", "Steihaug-CG", "quasi-Newton SR1"], ["blue", "red", "orange"]),
    Scale.x_log10, Scale.y_log10)


Out[25]:
iteration 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 10-2.0 10-1.9 10-1.8 10-1.7 10-1.6 10-1.5 10-1.4 10-1.3 10-1.2 10-1.1 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 102.1 102.2 102.3 102.4 102.5 102.6 102.7 102.8 102.9 103.0 103.1 103.2 103.3 103.4 103.5 103.6 103.7 103.8 103.9 104.0 10-2 100 102 104 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 102.2 102.4 102.6 102.8 103.0 103.2 103.4 103.6 103.8 104.0 Newton dogleg Steihaug-CG quasi-Newton SR1 Legend 10-35 10-30 10-25 10-20 10-15 10-10 10-5 100 105 1010 1015 1020 1025 1030 1035 10-30 10-29 10-28 10-27 10-26 10-25 10-24 10-23 10-22 10-21 10-20 10-19 10-18 10-17 10-16 10-15 10-14 10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 107 108 109 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 10-40 10-20 100 1020 1040 10-30 10-28 10-26 10-24 10-22 10-20 10-18 10-16 10-14 10-12 10-10 10-8 10-6 10-4 10-2 100 102 104 106 108 1010 1012 1014 1016 1018 1020 1022 1024 1026 1028 1030 gradient norm gradient norms

Cute2 function


In [27]:
xvals = trust_region(ones(50)*10, 6, 3, 0.1, cute2, cute2_g, cute2_h, 2000, "dogleg");
cvals = trust_region(ones(50)*10, 6, 3, 0.1, cute2, cute2_g, cute2_h, 2000, "cg_steihaug");


Number of indefinite fixes 0
Number of iterations: 27
Number of indefinite fixes 0

In [28]:
nsamps = length(xvals)
nsamps2 = length(cvals)

func = cute2
func_g = cute2_g

fx = [func(xvals[i]) for i in 1:nsamps]
fx2 = [func(cvals[i]) for i in 1:nsamps2]


Gadfly.plot(layer(x=1:nsamps, y=fx, Geom.line, Theme(default_color=color("blue"))),
layer(x=1:nsamps2, y=fx2, Geom.line, Theme(default_color=color("red"))),
#layer(x=1:nsamps3, y=fx3, Geom.line, Theme(default_color=color("orange"))),
Guide.xlabel("iteration"), Guide.ylabel("f(x)"), Guide.title("Value of function"),
Guide.manual_color_key("Legend", ["Newton dogleg", "Steihaug-CG", "quasi-Newton SR1"], ["blue", "red", "orange"]),
Scale.x_log10, Scale.y_log10)


Out[28]:
iteration 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 10-1.50 10-1.45 10-1.40 10-1.35 10-1.30 10-1.25 10-1.20 10-1.15 10-1.10 10-1.05 10-1.00 10-0.95 10-0.90 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 101.65 101.70 101.75 101.80 101.85 101.90 101.95 102.00 102.05 102.10 102.15 102.20 102.25 102.30 102.35 102.40 102.45 102.50 102.55 102.60 102.65 102.70 102.75 102.80 102.85 102.90 102.95 103.00 10-2 100 102 104 10-1.5 10-1.4 10-1.3 10-1.2 10-1.1 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 102.1 102.2 102.3 102.4 102.5 102.6 102.7 102.8 102.9 103.0 Newton dogleg Steihaug-CG quasi-Newton SR1 Legend 10-12 10-10 10-8 10-6 10-4 10-2 100 102 104 106 108 1010 1012 1014 1016 1018 1020 1022 10-10.0 10-9.5 10-9.0 10-8.5 10-8.0 10-7.5 10-7.0 10-6.5 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5 109.0 109.5 1010.0 1010.5 1011.0 1011.5 1012.0 1012.5 1013.0 1013.5 1014.0 1014.5 1015.0 1015.5 1016.0 1016.5 1017.0 1017.5 1018.0 1018.5 1019.0 1019.5 1020.0 10-10 100 1010 1020 10-10 10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 107 108 109 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 f(x) Value of function

In [29]:
nsamps = length(xvals)

grads = [norm(func_g(xvals[i]), 2) for i in 1:nsamps]
grads2 = [norm(func_g(cvals[i]), 2) for i in 1:nsamps2]
#grads3 = [norm(func_g(cvals[i]), 2) for i in 1:nsamps3]


Gadfly.plot(
layer(x=1:nsamps-1, y=grads[2:nsamps,:]./grads[1:nsamps-1,:], Geom.line, Theme(default_color=color("blue"))),
layer(x=1:nsamps2-1, y=grads2[2:nsamps2,:]./grads2[1:nsamps2-1,:], Geom.line, Theme(default_color=color("red"))),
#layer(x=1:nsamps3-1, y=grads3[2:nsamps3,:]./grads3[1:nsamps3-1,:], Geom.line, Theme(default_color=color("orange"))),
Guide.xlabel("iteration"), Guide.ylabel("gradient norm ratios"), Guide.title("gradient norm ratios"),
Guide.manual_color_key("Legend", ["Newton dogleg", "Steihaug-CG", "quasi-Newton SR1"], ["blue", "red", "orange"]),
Scale.x_log10, Scale.y_log10)


Number of iterations: 28
Out[29]:
iteration 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 10-1.50 10-1.45 10-1.40 10-1.35 10-1.30 10-1.25 10-1.20 10-1.15 10-1.10 10-1.05 10-1.00 10-0.95 10-0.90 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 101.65 101.70 101.75 101.80 101.85 101.90 101.95 102.00 102.05 102.10 102.15 102.20 102.25 102.30 102.35 102.40 102.45 102.50 102.55 102.60 102.65 102.70 102.75 102.80 102.85 102.90 102.95 103.00 10-2 100 102 104 10-1.5 10-1.4 10-1.3 10-1.2 10-1.1 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 102.1 102.2 102.3 102.4 102.5 102.6 102.7 102.8 102.9 103.0 Newton dogleg Steihaug-CG quasi-Newton SR1 Legend 10-11 10-10 10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 10-10.0 10-9.8 10-9.6 10-9.4 10-9.2 10-9.0 10-8.8 10-8.6 10-8.4 10-8.2 10-8.0 10-7.8 10-7.6 10-7.4 10-7.2 10-7.0 10-6.8 10-6.6 10-6.4 10-6.2 10-6.0 10-5.8 10-5.6 10-5.4 10-5.2 10-5.0 10-4.8 10-4.6 10-4.4 10-4.2 10-4.0 10-3.8 10-3.6 10-3.4 10-3.2 10-3.0 10-2.8 10-2.6 10-2.4 10-2.2 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 102.2 102.4 102.6 102.8 103.0 103.2 103.4 103.6 103.8 104.0 104.2 104.4 104.6 104.8 105.0 10-10 10-5 100 105 10-10.0 10-9.5 10-9.0 10-8.5 10-8.0 10-7.5 10-7.0 10-6.5 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 gradient norm ratios gradient norm ratios

In [30]:
Gadfly.plot(layer(x=1:nsamps, y=grads, Geom.line, Theme(default_color=color("blue"))),
layer(x=1:nsamps2, y=grads2, Geom.line, Theme(default_color=color("red"))),
#layer(x=1:nsamps3, y=grads3, Geom.line, Theme(default_color=color("orange"))),
    Guide.xlabel("iteration"), Guide.ylabel("gradient norm"), Guide.title("gradient norms"),
Guide.manual_color_key("Legend", ["Newton dogleg", "Steihaug-CG", "quasi-Newton SR1"], ["blue", "red", "orange"]),
    Scale.x_log10, Scale.y_log10)


Out[30]:
iteration 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 10-1.50 10-1.45 10-1.40 10-1.35 10-1.30 10-1.25 10-1.20 10-1.15 10-1.10 10-1.05 10-1.00 10-0.95 10-0.90 10-0.85 10-0.80 10-0.75 10-0.70 10-0.65 10-0.60 10-0.55 10-0.50 10-0.45 10-0.40 10-0.35 10-0.30 10-0.25 10-0.20 10-0.15 10-0.10 10-0.05 100.00 100.05 100.10 100.15 100.20 100.25 100.30 100.35 100.40 100.45 100.50 100.55 100.60 100.65 100.70 100.75 100.80 100.85 100.90 100.95 101.00 101.05 101.10 101.15 101.20 101.25 101.30 101.35 101.40 101.45 101.50 101.55 101.60 101.65 101.70 101.75 101.80 101.85 101.90 101.95 102.00 102.05 102.10 102.15 102.20 102.25 102.30 102.35 102.40 102.45 102.50 102.55 102.60 102.65 102.70 102.75 102.80 102.85 102.90 102.95 103.00 10-2 100 102 104 10-1.5 10-1.4 10-1.3 10-1.2 10-1.1 10-1.0 10-0.9 10-0.8 10-0.7 10-0.6 10-0.5 10-0.4 10-0.3 10-0.2 10-0.1 100.0 100.1 100.2 100.3 100.4 100.5 100.6 100.7 100.8 100.9 101.0 101.1 101.2 101.3 101.4 101.5 101.6 101.7 101.8 101.9 102.0 102.1 102.2 102.3 102.4 102.5 102.6 102.7 102.8 102.9 103.0 Newton dogleg Steihaug-CG quasi-Newton SR1 Legend 10-35 10-30 10-25 10-20 10-15 10-10 10-5 100 105 1010 1015 1020 1025 1030 1035 10-30 10-29 10-28 10-27 10-26 10-25 10-24 10-23 10-22 10-21 10-20 10-19 10-18 10-17 10-16 10-15 10-14 10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 107 108 109 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 10-40 10-20 100 1020 1040 10-30 10-28 10-26 10-24 10-22 10-20 10-18 10-16 10-14 10-12 10-10 10-8 10-6 10-4 10-2 100 102 104 106 108 1010 1012 1014 1016 1018 1020 1022 1024 1026 1028 1030 gradient norm gradient norms

In [ ]: