Quadrature is the term used for numerical evaluation of a definite integral, or put more simply, finding the area under a curve. There are a number of methods of quadrature which we will learn today. If we have a function $f(x)$ defined between $a$ and $b$, the integral is defined as
\begin{equation} F\left ( x \right ) = \int_{a}^{b} f\left ( x \right )dx \end{equation}and its result is the area under the curve. As you know, this operation is an essential part of calculus, being the inverse of differentiation. Knowing the value of the area under a curve is imporant to all kinds of applications. Many expressions are difficult to integrate, or the function which governs their shape is unknown, and we only have data. Therefore, no course on numerical methods is complete without discussing the different methods of numerical integration. A fundamental property of a definite integral is that
\begin{equation} \int_{a}^{b} f\left ( x \right )dx = \int_{c}^{a} f\left ( x \right )dx + \int_{b}^{c} f\left ( x \right )dx \end{equation}where $c$ is a point between $a$ and $b$. Therefore, we can perform an integration by splitting the function up into sections and summing the result of each integration. If the function is complicated or unknown, we can approximate its distribution between subdivisions and choose the size of subdivisions to get an accurate result. This is known as adaptive quadrature. In this lecture, five different quadrature methods will be covered in the context of a simple function: the Midpoint rule, Trapezoid rule, Simpson's rule, Composite Simpson's rule and Weddle's rule. Then, there is an excersise to apply numerical integration to a geophysical data set.
First, let's begin with a simple function to demonstrate the different methods of numerical integration. Let
\begin{equation} f\left ( x \right ) = \sin \left ( x \right ) \end{equation}We want to know what the area under the sin function is between 0 and $pi$. The $\sin \left ( x \right )$ function is known to integrate to $-\cos \left ( x \right )$, so we can integrate it ourselves by hand to find the true area under the curve:
\begin{equation} F\left ( x \right ) = \int_{0}^{\pi} \sin \left ( x \right ) = \left [ -\cos\left ( x \right )+ C \right ]_{0}^{\pi} =-\cos\left ( \pi \right ) + \cos\left ( 0 \right ) = 2 \end{equation}Let's start by plotting the function between these points.
In [1]:
%pylab inline
import os
import numpy as np
from scipy import integrate # This gives us the fucntions for different numerical integration methods!
import matplotlib.pyplot as plt
x = linspace(0,np.pi,20) #Get the value of pi from numpy
y = sin(x)
plt.plot(x,y)
plt.xlim([0,np.pi]) #Set x axis limits between 0 and pi
Out[1]:
That's what our function looks like. Let's investigate the different quadrature rules with reference to it, finding the area under the curve and seeing how it differs from the true area which we know is 2.
The midpoint rule is the simplest quadrature rule. Let $h = b - a$ be the length of the interval. The midpoint rule, $M$, approximates the integral by the area of a rectangle, with base of length $h$ and height of the value of $f(x)$ at the midpoint,
\begin{equation} M = hf \left ( \frac {a+b} {2} \right ) \end{equation}The result of the integration is the combined area of all the rectanges. A complex example looks like this:
The SciPy module features many different integration functions, and you can find thorough documentation for these functions (including methods not covered in this course) here. This library does not contain a function for the midpoint rule, so let's make our own.
In [2]:
def midpoint_rule(start_point, end_point, function, number_of_bins=10):
#First, check how big each bin needs to be. Bin is just another word for rectangle.
bin_size = float(end_point - start_point)/number_of_bins
#To make sure the function works as expected and doesn't output garbage accidentally,
#let's assert that there are more than 0 bins, and that the number of bins is a whole number
assert bin_size > 0
assert type(number_of_bins) == int
#By doing this, the function will give an error if either of these are not true,
#avoiding problems later. This is a neat alternative to a try-except loop.
#In programming it is always good to know more than one way of doing things!
#Create the variable to contain the sum of all the areas
running_total = 0.0
#Find the first midpoint
mid = start_point + bin_size/2
#Loop to create each rectangle
while (mid < end_point):
#Find the area of the next rectangle and add it to the running total
running_total += bin_size * function(mid)
#Move the midpoint up to the next centre of the bin
mid += bin_size
#Return our running total result
return running_total
Now let's test the midpoint function.
In [3]:
print "The area found by direct integration = 2"
print "The area found with the midpoint rule with 1 rectangle = ", midpoint_rule(0,np.pi,sin,1)
print "The area found with the midpoint rule with 2 rectangles = ", midpoint_rule(0,np.pi,sin,2)
print "The area found with the midpoint rule with 10 rectangles = ", midpoint_rule(0,np.pi,sin,10)
print "The area found with the midpoint rule with 100 rectangles = ", midpoint_rule(0,np.pi,sin,100)
print "The area found with the midpoint rule with 1000 rectangle = ", midpoint_rule(0,np.pi,sin,1000)
With one rectangle, we are simply finding the area of a box of shape $\pi \times$ 1, so of course the result is $\pi$. As we increase the number of rectangles we split the function up by, we increase the accuracy of our area. The simplicity of this method is its weakness, as rectangles are rarely a good approximation for the shape of a smooth function. We should want to use as few shapes as possible to approximate our function, because each additional rectangle is one extra time round the loop, increasing processing time. Instead, let's try another shape...
If we swap the shape of the rectangle for a trapezoid, we arrive at the trapezoid rule. The trapezoid rule, $T$, approximates the integral by the area of a trapezoid with base $h$ and sides equal to the vaules of the integral at the two end points,
\begin{equation} T = h \frac {f\left ( a\right ) + f \left (b \right )} {2} \end{equation}An example looks more accurate than the midpoint rule:
Numpy has a function for the trapezoid rule, numpy.trapz, but we'll make our own that works in a similar way to our midpoint rule function.
In [4]:
def trapezoid_rule(start_point, end_point, function, number_of_bins=10):
bin_size = float(end_point - start_point)/number_of_bins
assert bin_size > 0
assert type(number_of_bins) == int
running_total = 0.0
#Loop to create each trapezoid
for i in range(number_of_bins):
#Set the start of this bin
this_bin_start = start_point + bin_size * (i)
#Find the area of the next rectangle and add it to the running total
running_total += bin_size*float(function(this_bin_start)+function(this_bin_start+bin_size))/2
#Move the midpoint up to the start of the next trapezoid
#Return our running total result
return running_total
Testing the function in a similar way:
In [5]:
print "The area found by direct integration = 2"
print "The area found with the trapezoid rule with 1 trapezoid = ", trapezoid_rule(0,np.pi,sin,1)
print "The area found with the trapezoid rule with 2 trapezoids = ", trapezoid_rule(0,np.pi,sin,2)
print "The area found with the trapezoid rule with 10 trapezoids = ", trapezoid_rule(0,np.pi,sin,10)
print "The area found with the trapezoid rule with 100 trapezoids = ", trapezoid_rule(0,np.pi,sin,100)
print "The area found with the trapezoid rule with 1000 trapezoids = ", trapezoid_rule(0,np.pi,sin,1000)
We can see a few differences compared to the midpoint rule. First, the trapezoid rule always under-estimates the area, whereas the midpoint rule over-estimates. The result for finding the integral of $\sin$ with only one triangle is very small, because that trapezoid has almost no height. Perhaps most suprisingly, the midpoint rule is more accurate than the trapezoid rule.
The accuracy of a quadrature rule is predicted by examining its behaviour in practice with polynomials. The order of a rule is the regree of the lowest degree polynomial that the rule does not integrate exactly. If a rule of order $p$ is used to integrate a smooth function of length $h$, then Taylor series analysis shows the error is proportional to $h^p$. Both the midpoint and trapezoid rules are exact for constant and linear functions, but are not exact for quadratics. Therefore, they have order two. Let's use a different example (which integrates to whole numbers) to quantify how accurate the midpoint and trapezoid rules are in comparison to one another. If we are calculating
\begin{equation} \int_{0}^{1} x^{2}dx = \frac {1}{3} \end{equation}the midpoint rule gives
\begin{equation} M = 1 \frac {1}{3}^{2} = \frac {1}{4} \end{equation}and the trapezoid rule gives
\begin{equation} T = 1 \frac {0+1^{2}}{2} = \frac {1}{2} \end{equation}The error for $M$ is 1/12, while the error for $T$ is -1/6. Therefore, the midpoint rule is twice as accurate as the trapezoid rule. This is true more generally for smooth functions over small intervals.
Knowing the error estimates from the two rules explored so far allows us to combine them to create a new rule, more accurate than either one seperately, with a little help from the mathematician Thomas Simpson - luckily not related to Homer.
If the error in $T$ is exactly -2 times the error in $M$, then solving
\begin{equation} S-T = -2 \left ( S-M\right ) \end{equation}for $S$ gives us the exact value of the integral. Therefore, the solution
\begin{equation} S = \frac{2}{3}M + \frac{1}{3}T \end{equation}is usually a more accurate approximation that either $M$ or $T$ alone. This is known as Simpson's rule. It can also be found by integrating the quadratic function that interpolates the integral at two end points, $a$ and $b$, and the midpoint, $c = \left ( a+b\right )/2$:
\begin{equation} S = \frac{h}{6}\left ( f \left ( a\right ) + 4f \left ( c\right ) + f\left ( b\right )\right ) \end{equation}This expression now integrates up to cubics exactly, so it is order 4. It looks like a much closer fit to the function:
Let's make a function to test it out...
In [6]:
def simpsons_rule(start_point, end_point, function, number_of_bins=10):
bin_size = float(end_point - start_point)/number_of_bins
assert bin_size > 0
assert type(number_of_bins) == int
running_total = 0.0
#Loop to create each shape
for i in range(number_of_bins):
#Find a, c, and b
this_bin_start = start_point + bin_size * (i)
this_bin_mid = this_bin_start + bin_size/2
this_bin_end = this_bin_start + bin_size
#Calculate the rule and add to running total.
running_total += (bin_size/6)*float(function(this_bin_start)+4*function(this_bin_mid)+function(this_bin_end))
#Return our running total result
return running_total
In [7]:
print "The area found by direct integration = 2"
print "The area found with Simpson's rule with 1 bin = ", simpsons_rule(0,np.pi,sin,1)
print "The area found with Simpson's rule with 2 bins = ", simpsons_rule(0,np.pi,sin,2)
print "The area found with Simpson's rule with 10 bins = ", simpsons_rule(0,np.pi,sin,10)
print "The area found with Simpson's rule with 100 bins = ", simpsons_rule(0,np.pi,sin,100)
print "The area found with Simpson's rule with 1000 bins = ", simpsons_rule(0,np.pi,sin,1000)
As you can see, Simpson's rule is very accurate. But we can get it even more accurate...
Let's take our calculations another step further. Previously, we have considered $a$ at the start of the interval, and $b$ at the end, with $c$ at the mid point. Instead, let's split the domain up into smaller subintervals, by making $d$ and $e$ the midpoints of these subintervals. $d = \left ( a+c\right )/2$, and $e = \left( c+b\right )/2$. If we apply Simpson's rule to each subinterval to obtain a rule over $\left[a,b\right]$:
\begin{equation} S_2 = \frac{h}{12}\left ( f \left ( a\right ) + 4f \left ( d\right ) + 2f\left ( c\right ) + 4f \left ( e\right ) + f \left ( b\right )\right ) \end{equation}This is known as the Composite Simpson's rule, an example of a composite quadrature rule as it is a method that subdivides each interval. Composite methods produce much better results for oscillatory functions. Another composite rule we have used in previous lectures was cumtrapz, which uses the composite trapezoidal rule (you don't need to know any details about this rule).
We finally don't need to write our own function, as scipy includes a function for the Composite Simpson's rule, simps. Since we have written functions above, let's make another one so we know it works in the same way as the others, so we can be sure our results are comparable.
In [8]:
def simpsons_composite_rule(start_point, end_point, function, number_of_bins=10):
bin_size = float(end_point - start_point)/number_of_bins
assert bin_size > 0
assert type(number_of_bins) == int
running_total = 0.0
#Loop to create each shape
for i in range(number_of_bins):
#Find a, d, c, e, and b
this_bin_start = start_point + bin_size * (i)
this_bin_quarter = this_bin_start + bin_size*1./4
this_bin_mid = this_bin_start + bin_size*2./4
this_bin_three_quarter = this_bin_start + bin_size*3./4
this_bin_end = this_bin_start + bin_size
#Calculate the rule and add to running total. Sorry about the extremely long line...
running_total += (bin_size/12)*float(function(this_bin_start)+4*function(this_bin_quarter)+2*function(this_bin_mid)+4*function(this_bin_three_quarter)+function(this_bin_end))
#Return our running total result
return running_total
In [9]:
print "The area found by direct integration = 2"
print "The area found with Composite Simpson's rule with 1 bin = ", simpsons_composite_rule(0,np.pi,sin,1)
print "The area found with Composite Simpson's rule with 2 bins = ", simpsons_composite_rule(0,np.pi,sin,2)
print "The area found with Composite Simpson's rule with 10 bins = ", simpsons_composite_rule(0,np.pi,sin,10)
print "The area found with Composite Simpson's rule with 100 bins = ", simpsons_composite_rule(0,np.pi,sin,100)
print "The area found with Composite Simpson's rule with 1000 bins = ", simpsons_composite_rule(0,np.pi,sin,1000)
This is a slight improvement for a simple function like $\sin$, but will be much more of an improvement for functions which oscillate more with respect to the size of our bins.
Now we are on our final rule... Given that $S$ and $S_2$ approximate the same integral, their difference can be used to estimate the error:
\begin{equation} E = \left ( S_2 - S \right) \end{equation}Therefore, we can combine these rules to make an even more accurate approximation. Both of the rules are of order four, but the $S_2$ step size is half the S step size, so $S_2$ is roughly $2^4$ times as accurate. Let's call this super accurate rule $Q$, and we can find it by solving:
\begin{equation} Q - S = 16 \left ( Q - S_2 \right ) \end{equation}resulting in:
\begin{equation} Q = S_2 + \frac {\left (S_2 - S \right )}{15} \end{equation}This is known as Weddle's rule or the extrapolated Simposon's rule, because it uses two different values of $h$ and extrapolates towards $h = 0$. Making a function of this is easy as we just call our other two Simpson functions.
In [10]:
def weddles_rule(start_point, end_point, function, number_of_bins=10):
result_simpson = simpsons_rule(start_point, end_point, function, number_of_bins)
result_composite = simpsons_composite_rule(start_point, end_point, function, number_of_bins)
return result_composite + float(result_composite - result_simpson)/15
In [11]:
print "The area found by direct integration = 2"
print "The area found with Weddle's rule with 1 bin = ", weddles_rule(0,np.pi,sin,1)
print "The area found with Weddle's rule with 2 bins = ", weddles_rule(0,np.pi,sin,2)
print "The area found with Weddle's rule with 10 bins = ", weddles_rule(0,np.pi,sin,10)
print "The area found with Weddle's rule with 100 bins = ", weddles_rule(0,np.pi,sin,100)
print "The area found with Weddle's rule with 1000 bins = ", weddles_rule(0,np.pi,sin,1000)
You can see our final rule is much more accurate for fewer required bins. This covers everything you need to know about quadrature. You will find the class problems in a different notebook.
In [11]: