The Roots package

The Roots package contains simple routines for finding roots of continuous scalar functions of a single real variable. The basic interface is through the function fzero, which through multiple dispatch can handle many different cases.

We will use these pacakges


In [ ]:
using Plots
backend(:gadfly)
using Roots

Bracketing

For a function $f: R \rightarrow R$ a bracket is a pair $a<b$ for which $f(a)\cdot f(b) < 0$. That is they have different signs. If $f$ is continuous this forces there to be a zero on the interval $[a,b]$, otherwise, if $f$ is only piecewise continuous, there must be a point $c$ in $[a,b]$ with the left limit and right limit at $c$ having different signs. These values can be found, up to floating point roundoff.

That is, a value $a < c < b$ can be found with either f(c) == 0.0 or f(prevfloat(c)) * f(nextfloat(c)) <= 0.

To illustrate, consider the function $f(x) = \cos(x) - x$. From the graph we see readily that $[0,1]$ is a bracket:


In [ ]:
f(x) = cos(x) - x
plot(f, -2,2)


Out[ ]:

The basic function call specifies a bracket using vector notation:


In [ ]:
x = fzero(f, [0, 1])
x, f(x)


Out[ ]:
(0.7390851332151607,0.0)

For that function f(x) == 0.0. Next consider $f(x) = \sin(x)$. A known root is $\pi$. Basic trignometry tells us that $[\pi/2, 3\pi2]$ will be a bracket:


In [ ]:
f(x) = sin(x)
x = fzero(f, [pi/2, 3pi/2])
x, f(x)


Out[ ]:
(3.141592653589793,1.2246467991473532e-16)

This value of x does not produce f(x) == 0.0, however, it is as close as can be:


In [ ]:
f(prevfloat(x)) * f(x) < 0.0 || f(x) * f(nextfloat(x)) < 0.0


Out[ ]:
true

That is at x the function is changing sign.

The basic algorithm used for bracketing when the values are simple floating point values is the bisection method. Though there are algorithms that mathematically should converge faster (and one is used for the case where BigFloat numbers are used) by exploiting floating point computations this algorithm uses fewer function calls and runs faster.

Using an initial guess

If a bracket is not known, but a good initial guess is, the fzero function provides an interface to some different algorithms. The basic algorithm is modeled after an algorithm used for HP-34 calculators. This algorithm is much more robust to the quality of the initial guess and does not rely on tolerances for a stopping rule. In many cases it satisfies the criteria for a bracketing solution.

For example, we have:


In [ ]:
f(x) = cos(x) - x
x = fzero(f , 1)
x, f(x)


Out[ ]:
(0.7390851332151607,0.0)

And


In [ ]:
f(x) = x^3 - 2x - 5
x = fzero(f, 2)
x, f(x), f(prevfloat(x)) * f(nextfloat(x))


Out[ ]:
(2.0945514815423265,-8.881784197001252e-16,-2.524354896707238e-29)

For even more precision, BigFloat numbers can be used


In [ ]:
x = fzero(sin, big(3))
x, f(x), x - pi


Out[ ]:
(3.141592653589793238462643383279503027458928160728751989361348208457812241803072,1.972309137312023369855102830054239338911808658100138722227535611337779564923164e+01,1.432617587613536461683864036161499958355168739140227720938615263585174704284396e-34)

Higher order methods

The default call to fzero uses a second order method at best and then bracketing, which involves potentially many more function calls. For some functions, a higher-order method might be better suited. There are algorithms of order 1 (secant method), 2 (Steffensen), 5, 8, and 16. The order 2 method is generally more efficient, but is more sensitive to the initial guess than, say, the order 8 method. These algorithms are accessed by specifying a value for the order argument:


In [ ]:
f(x) = 2x - exp(-x)
x = fzero(f, 1, order=2)
x, f(x)


Out[ ]:
(0.35173371124919584,0.0)

In [ ]:
f(x) = (x + 3) * (x - 1)^2
x = fzero(f, -2, order=5)
x, f(x)


Out[ ]:
(-3.0,0.0)

In [ ]:
x = fzero(f, 2, order=8)
x, f(x)


Out[ ]:
(1.0000000195707177,1.5320519747160834e-15)

The latter shows that zeros need not be simple zeros (i.e. $f'(x) = 0$, if defined) to be found. For the higher-order methods, there is a tolerance that can be specified so that a value is returned as a zero if abs(f(x)) < tol. The default method for fzero uses a very strict tolerance for this, otherwise defaulting to an error that at times might be very close to the actual zero. For this problem it finds the exact value:


In [ ]:
x = fzero(f, 2)
x, f(x)


Out[ ]:
(1.0000000320628983,4.11211783186984e-15)

But not for a similar problem:


In [ ]:
fzero(x -> x^6, 1)


Out[ ]:
0.003397791170761169

(Though the answer is basically on track, the algorithm takes too long to improve itself to the very stringent range set. For problems where a bracket is found, this dithering won't happen.)

The higher-order methods are basically various derivative-free versions of Newtons method which has update step $x - f(x)/f'(x)$. For example, Steffensen's method is essentially replacing $f'(x)$ with $(f(x + f(x)) - f(x))/f(x)$. This is just a forward-difference approximation to the derivative with "$h$" being $f(x)$, which presumably is close to $0$ already. The methods with higher order combine this with different secant line approaches that minimize the number of function calls. The default method uses a combination of Steffensen's method with modifications, a quadratic fit, and, if possible, a bracketing approach. It may need many more function calls than the higher-order methods. These higher-order methods can be susceptible to some of the usual issues found with Newton's method: poor initial guess, small first derivative, or large second derivative near the zero.

For a classic example where basically the large second derivative is the issue, we have $f(x) = x^{1/3}$:


In [ ]:
f(x) = cbrt(x)
x = fzero(f, 1, order=8)	# all of 2, 5, 8, and 16 fail


Out[ ]:
Roots.ConvergenceFailed("Failed to converge in 13 steps. Attempted division by 0. Last value was 3.8743071591115165e8.")

However, the default finds the root here


In [ ]:
x = fzero(f, 1)
x,  f(x)


Out[ ]:
(0.0,0.0)

Finally, we show another example illustrating that the default fzero call is more forgiving to an initial guess. The devilish function defined below comes from a test suite of difficult functions. The default method finds the zero:


In [ ]:
f(x) = cos(100*x)-4*erf(30*x-10)
plot(f, -2, 2)


Out[ ]:

In [ ]:
fzero(f, 1)


Out[ ]:
0.33186603357456257

Whereas, with order=n methods fail. For example,


In [ ]:
fzero(f, 1, order=8)


Out[ ]:
Roots.ConvergenceFailed("Failed to converge in 30 steps. Last value was -2816.8439033521086.")

Basically the high order oscillation can send the proxy tangent line off in nearly random directions.

Polynomials

The Polynomials package provides a type for working with polynomial functions that allows many typical polynomial operations to be defined. In this context, the roots function is used to find the roots of a polynomial.

For example,


In [ ]:
using Polynomials
x = poly([0.0])			# (x - 0.0)
roots((x-1)*(x-2)*(x-3))


Out[ ]:
3-element Array{Float64,1}:
 1.0
 2.0
 3.0

As a convenience, this package adds a function interface to roots:


In [ ]:
f(x) = (x-1)*(x-2)*(x^2 + x + 1)
roots(f)


Out[ ]:
4-element Array{Complex{Float64},1}:
 -0.5+0.866025im
 -0.5-0.866025im
  1.0+0.0im     
  2.0+0.0im     

The fzeros function will find the real roots of a univariate polynomial:


In [ ]:
fzeros( (x-1)*(x-2)*(x^2 + x + 1))


Out[ ]:
2-element Array{Any,1}:
 1//1
  2.0

As with roots, this function be called with a function:


In [ ]:
f(x) = x*(x-1)*(x^2 + 1)^4
fzeros(f)


Out[ ]:
2-element Array{Any,1}:
  0  
 1//1

The algorithm can have numeric issues when the polynomial degree gets too large, or the roots are too close together.

The multroot function will also find the roots. The algorithm does a better job when there are multiple roots, as it implements an algorithm that first identifies the multiplicity structure of the roots, and then tries to improve these values.


In [ ]:
multroot((x-1)*(x-2)*(x-3))	# roots, multiplicity


Out[ ]:
([0.9999999999999976,2.0000000000000067,2.9999999999999964],[1,1,1])

The factor function provides a more pleasant output.

The roots function degrades as there are multiplicities:


In [ ]:
p = (x-1)^2*(x-2)^3*(x-3)^4
roots(p)


Out[ ]:
9-element Array{Complex{Float64},1}:
 0.999999+0.0im        
      1.0+0.0im        
  1.99954+0.0im        
  2.00023+0.000398985im
  2.00023-0.000398985im
  2.99787+0.00213481im 
  2.99787-0.00213481im 
  3.00213+0.00212188im 
  3.00213-0.00212188im 

Whereas, multroot gets it right.


In [ ]:
factor(p)


Out[ ]:
Dict{Number,Int64} with 3 entries:
  2.9999999999999996 => 4
  2.000000000000001  => 3
  0.9999999999999997 => 2

The difference gets dramatic when the multiplicities get quite large.

Classical methods

The package provides some classical methods for root finding: newton, halley, and secant_method. We can see how each works on a problem studied by Newton himself. Newton's method uses the function and its derivative:


In [ ]:
f(x) = x^3 - 2x - 5
fp(x) = 3x^2 - 2
x = newton(f, fp, 2)
x, f(x), f(prevfloat(x)) * f(nextfloat(x))


Out[ ]:
(2.0945514815423265,-8.881784197001252e-16,-2.524354896707238e-29)

To see the algorithm in progress, the argument verbose=true may be specified.

The secant method needs two starting points, here we start with 2 and 3:


In [ ]:
x = secant_method(f, 2,3)
x, f(x), f(prevfloat(x)) * f(nextfloat(x))


Out[ ]:
(2.094551481542327,3.552713678800501e-15,-8.67746995743113e-30)

Halley's method has cubic convergence, as compared to Newton's quadratic convergence. It uses the second derivative as well:


In [ ]:
fpp(x) = 6x
x = halley(f, fp, fpp, 2)
x, f(x), f(prevfloat(x)) * f(nextfloat(x))


Out[ ]:
(2.0945514815423265,-8.881784197001252e-16,-2.524354896707238e-29)

For many function, the derivatives can be computed automatically. The ForwardDiff package provides a means. This package wraps the process into an operator, D which returns the derivative of a function f (for simple-enough functions):


In [ ]:
newton(f, D(f), 2)


Out[ ]:
2.0945514815423265

Or for Halley's method


In [ ]:
halley(f, D(f), D(f,2), 2)


Out[ ]:
2.0945514815423265

(The operator D2(f) is a convenience for D(f,2).) Specifying the derivative(s) can be skipped, the functions will default to the above calls.

Finding critical points

The D function makes it straightforward to find critical points (where the derivative is $0$ or undefined). For example, the critical point of the function $f(x) = 1/x^2 + x^3, x > 0$ can be found with:


In [ ]:
f(x) = 1/x^2 + x^3
fzero(D(f), 1)


Out[ ]:
0.9221079114817278

For more complicated expressions, D will not work. In this example, we have a function $f(x, \theta)$ that models the flight of an arrow on a windy day:


In [ ]:
function flight(x, theta)
 	 k = 1/2
	 a = 200*cosd(theta)
	 b = 32/k
	 tand(theta)*x + (b/a)*x - b*log(a/(a-x))
end


Out[ ]:
flight (generic function with 1 method)

The total distance flown is when flight(x) == 0.0 for some x > 0: This can be solved for different theta with fzero. In the following, we note that log(a/(a-x)) will have an asymptote at a, so we start our search at a-1:


In [ ]:
function howfar(theta)
	 a = 200*cosd(theta)
	 fzero(x -> flight(x, theta), a-1)
end


Out[ ]:
howfar (generic function with 1 method)

To see the trajectory if shot at 30 degrees, we have:


In [ ]:
theta = 30
plot(x -> flight(x,  theta), 0, howfar(theta))


Out[ ]:

To maximize the range we solve for the lone critical point of howfar within the range. The derivative can not be taken automatically with D. So, here we use a central-difference approximation and start the search at 45 degrees, the angle which maximizes the trajectory on a non-windy day:


In [ ]:
h = 1e-5
howfarp(theta) = (howfar(theta+h) - howfar(theta-h)) / (2h)
fzero(howfarp, 45)


Out[ ]:
26.262308924144275