This lesson was developed using materials from the Computational Physics book by Mark Newman at University of Michigan and material adapted from the numerical methods course by John Matthews at CSU Fullerton.
Instructions: Create a new directory called MCIntegration
with a notebook called MCIntegrationTour
. Give it a heading 1 cell title Monte Carlo Integration. Read this page, typing in the code in the code cells and executing them as you go.
Do not copy/paste.
Type the commands yourself to get the practice doing it. This will also slow you down so you can think about the commands and what they are doing as you type them.</font>
Save your notebook when you are done, then try the accompanying exercises.
In [ ]:
%pylab inline
import numpy as np
import matplotlib.pyplot as plt
In a previous lesson we explored numerical integration by choosing points along the function and calculating/summing areas. In this lesson we will explore an entirely different technique, one that is very useful because it easily extends to intgrating functions in any number of dimensions. There are many areas of physics where we need to evaluate multi-dimensional integrals. For example, to find the electric field due to a surface or volume of charge.
Monte Carlo methods are a class of algorithms that use random number sampling to obtain numerical results. They were pioneered by scientists working on the Manhattan Project to simulate the interactions of neutrons with matter during the development of the first atomic weapons. "Monte Carlo" was the code name used to disguise the work of Stanislaw Ulam and John von Neumann as they developed the first algorithms using random number sampling. The choice of that particular codename comes from the fact that the technique resembles the games of chance played in the casinos of Monte Carlo in the principality of Monaco. (Las Vegas was not yet as well known as a center of gambling.)
The first application of Monte Carlo methods that we will explore is for integrating functions. In another lesson we will look at using random number sampling to simulate physical phenomena.
At its essence, Monte Carlo (MC) integration uses a random sampling of points within a domain where our function is defined to estimate the area bounded by the function. It is actually easier to explain with an analogy in 2 dimensions, even though it will work in any number (1, 2, 3, ..., $n$) of dimensions. That's one of the reasons it is so powerful. It is also easily converted to an algorithm that a computer can execute. Another bonus.
Consider the example shown here:
I want to know the area of the circular dartboard. If I throw $N$ darts randomly at the board (I'm not a very good dart player), the area of the circle can be approximated as the ratio of the number of darts within the circle, $N_{in}$ to the total number of darts thrown, $N = N_{in} + N_{out}$, times the area of the rectangular board:
$$ A_{circ} \simeq \frac{N_{in}}{N}A_{board} $$In this case, count all the dots and you find that there are 40 of them, with 15 inside the dartboard and 25 in the white area of the rectangular board. The board has sides of length 4, and the radius of the circle is 1.375. Thus we can compare the exact value for the area with our approximate value:
$A_{board} = l \times w$
$A_{exact} = \pi r^2$
In [ ]:
l = w = 4.
r = 1.375
Nin = 15
N = 40
Aboard = l*w
Aexact = pi*r**2
Acirc = (Nin/float(N))*Aboard #since we calc a ratio, make sure they are floats!
print "Aboard =",Aboard
print "Aexact =",Aexact
print "Acirc =",Acirc
print "Error =",(100.*(Acirc-Aexact)/Aexact),"%"
So good to about 1%. If we want to make it an even better approximation, we increase the number of darts. To improve the speed/efficiency, we can shrink the size of the rectangular board so it is just big enough for our circular dartboard to fit inside. This minimizes the number of "wrong" darts that get thrown.
The darts were thrown "randomly" in this case because I am not a good dart player, but it turns out that we can also improve our efficiency by being "selectively random". For example, we can try to target our dart throws such that we don't get overlapping hits. We could also try to ensure that we uniformly fill the space in the box with dart throws using as few darts as possible so that we don't accidentally get too many regions with more samples in them than others. Choosing how to sample the region of interest is an interesting but complicated topic in and of itself. We won't go into detail in this course, but you can read more about it by googling "importance sampling".
For this example, we did some of the work ourselves - throwing the darts randomly and counting the number of darts inside/outside the boundary we care about. Generally, we want to let the computer do all of the work. This method of Monte Carlo integration is known as the "Hit or Miss" or "Rejection Sampling" method. Let's outline the steps we need to implement it:
We'll start with a simple 1-dimensional example and implement the most basic hit or miss method using truly random samples within our domain. Let's try to write the code for the series of steps described above.
Here is the function we'll work with:
$f(x) = x^{1/2} + \sin(x)$
Let's define it and plot it so we have an idea of what we are looking at (Step 1).
In [ ]:
f = lambda x: x**(0.5) + np.sin(x)
In [ ]:
x = np.arange(0.,15.,0.001)
In [ ]:
plt.plot(x,f(x),lw=3)
plt.ylabel(r'$x^{1/2} + \sin(x)$',fontsize=20)
plt.xlabel(r'$x$', fontsize = 20)
plt.xlim(0.,15.)
plt.show()
Suppose we want to integrate this function between the limits $(1.7,12.4)$. We can create the bounding box (Step 2) for our random sampling from the these limits and the maximum value of the function within that range.
We want to create a mask that will select only values within our limits of integration and then find the maximum value of the function in that range to set the box.
In [ ]:
xmin = 1.7
xmax = 12.4
#find the maximum value of f within the boundary to set ymax
subx = x[logical_and((x > xmin),(x < xmax))]
ymax = f(subx).max()*1.05
ymin = 0.
Note the use of the logical_and
function to be able to set the mask with two constraints, rather than just one.
In [ ]:
#Replot the function
plt.plot(x,f(x),lw=3)
plt.ylabel(r'$x^{1/2} + \sin(x)$',fontsize=20)
plt.xlabel(r'$x$', fontsize = 20)
plt.xlim(0.,15.)
#plot the box
plt.plot([xmin, xmin], [ymin, ymax], color='k', linestyle='--') #vertical left
plt.plot([xmax, xmax], [ymin, ymax], color='k', linestyle='--') #vertical right
plt.plot([xmin, xmax], [ymax, ymax], color='k', linestyle='--') #horizontal top
plt.show()
(Step 3) is to create random sample points within our boundary. You should investigate the random
library in numpy
. Let's create a set of sample points along the x and y axes. The random_sample()
function returns random numbers within the bounds [0.,1.). The ) means it generates numbers up to the end of the range but not equal to it. In other words, for a random sample between 0. and 1. you will sometimes get 0. but never 1., only numbers very close to it, i.e. 0.9999, etc.
In [ ]:
help(random_sample)
To get the values within our limits, we need to scale and translate the values into our range, as suggested in the help:
In [ ]:
N=10000
samples_x = xmin + (xmax-xmin)*np.random.random_sample(N);
samples_y = ymin + (ymax-ymin)*np.random.random_sample(N);
In [ ]:
#Replot the function
plt.plot(x,f(x),lw=3)
plt.ylabel(r'$x^{1/2} + \sin(x)$',fontsize=20)
plt.xlabel(r'$x$', fontsize = 20)
plt.xlim(0.,15.)
#Plotting all of them is not recommended
#plt.plot(samples_x,samples_y,'g.')
#Just show the first 500 points or it gets hard to see
plt.plot(samples_x[1:500],samples_y[1:500],'g.')
#plot the box
plt.plot([xmin, xmin], [ymin, ymax], color='k', linestyle='--') #vertical left
plt.plot([xmax, xmax], [ymin, ymax], color='k', linestyle='--') #vertical right
plt.plot([xmin, xmax], [ymax, ymax], color='k', linestyle='--') #horizontal top
plt.show()
The fraction of points $(x_i,y_i)$ that satisfy the condition $y_i \leq f(x_i)$ is an estimate of the ratio of the integral of $f(x)$ to the area of the rectangle.
Now we just have to keep track of those. We can try a mask on samples_y
In [ ]:
newmask = (samples_y < f(samples_x))
In [ ]:
newmask.sum()
And replot the points that lie below the function for proof:
In [ ]:
#Replot the function
plt.plot(x,f(x),lw=3)
plt.ylabel(r'$x^{1/2} + \sin(x)$',fontsize=20)
plt.xlabel(r'$x$', fontsize = 20)
plt.xlim(0.,15.)
#Just show the first 500 points or it gets hard to see
plt.plot(samples_x[1:500],samples_y[1:500],'.')
#Use the mask to show the ones below the function in red
plt.plot(samples_x[newmask[:500]],samples_y[newmask[:500]],'r.')
#plot the box
plt.plot([xmin, xmin], [ymin, ymax], color='k', linestyle='--') #vertical left
plt.plot([xmax, xmax], [ymin, ymax], color='k', linestyle='--') #vertical right
plt.plot([xmin, xmax], [ymax, ymax], color='k', linestyle='--') #horizontal top
#plt.savefig("MCIntegration.png")
plt.show()
Now we can compute our integral from (Step 5): the area of the rectangle times the ratio of the number of red and green dots.
In [ ]:
I = (xmax-xmin)*(ymax-ymin)*newmask.sum()/newmask.size
print I
How well did we do? Let's evaluate our exact function between the two endpoints.
The exact (indefinite) analytical solution to our integral is
$$ \int f(x)dx = \int x^{1/2} dx + \int sin(x) dx = \frac{2}{3}x^{3/2} - \cos(x) $$We can evaluate this at the two endpoints to obtain the exact solution:
In [ ]:
Iana = lambda x: (2./3.)*x**(3./2.) - np.cos(x)
Iexact = Iana(xmax)-Iana(xmin)
Ierr = 100*np.abs(Iexact-I)/Iexact
print "I_exact =",Iexact
print "I_hitormiss =",I
print "Error on I =",Ierr,"%"
Not too bad, but how many points would it take to reduce the error below 0.001%? Let's put our work into an interact
object and explore the result as a function of $N$.
In [ ]:
from IPython.html.widgets import interact, interactive
from IPython.html import widgets
from IPython.utils.traitlets import link
In [ ]:
def hit_or_miss(N=10000,xmin=1.7,xmax=12.4):
#Set up the function
f = lambda x: x**(0.5) + np.sin(x)
x = np.arange(0.,15.,0.001)
Iana = lambda x: (2./3.)*x**(3./2.) - np.cos(x)
Iexact = Iana(xmax)-Iana(xmin)
#find the maximum value of f within the boundary to set ymax
subx = x[logical_and((x > xmin),(x < xmax))]
ymin = 0.
if (subx.size > 0):
ymax = f(subx).max()*1.05
else:
ymax = 0.
##########################################################
#Here's the integration part:
samples_x = xmin + (xmax-xmin)*np.random.random_sample(N);
samples_y = ymin + (ymax-ymin)*np.random.random_sample(N);
newmask = (samples_y < f(samples_x))
I = (xmax-xmin)*(ymax-ymin)*newmask.sum()/newmask.size
Ierr = 100*np.abs(Iexact-I)/Iexact
##########################################################
#Print our results:
print "I_exact =",Iexact
print "I_hitormiss =",I
print "Error on I =",Ierr,"%"
#Plot our function
plt.plot(x,f(x),lw=3)
plt.ylabel(r'$x^{1/2} + \sin(x)$',fontsize=20)
plt.xlabel(r'$x$', fontsize = 20)
plt.xlim(0.,15.)
#Just show the first 500 points or it gets hard to see
plt.plot(samples_x[1:500],samples_y[1:500],'g.')
#Use the mask to show the ones below the function in red
plt.plot(samples_x[newmask[:500]],samples_y[newmask[:500]],'r.')
#plot the box
plt.plot([xmin, xmin], [ymin, ymax], color='k', linestyle='--') #vertical left
plt.plot([xmax, xmax], [ymin, ymax], color='k', linestyle='--') #vertical right
plt.plot([xmin, xmax], [ymax, ymax], color='k', linestyle='--') #horizontal top
plt.show()
In [ ]:
v = interact(hit_or_miss,N=(1000,100000,100),xmin=(0.,15.),xmax=(0.,15.))
What do you observe when you vary the number of samples?
It turns out that the uncertainty in the Monte Carlo integration result depends only in part on the number of samples. It also depends on the variance in the function. For functions with rapidly changing values (and hence, higher variance), increasing the random samples, well, "randomly", doesn't really help. To improve the results we would have to be selective about our random numbers. (Importance sampling would be a useful technique here.)
The hit or miss method is not the only (nor the most efficient) way to perform Monte Carlo integration. There is another method, based on the mean-value or central-limit theorem of calculus, which states that the definite integral over an interval $(a,b)$ is determined by the average value of the integrand $f(x)$ in the range $a \leq x \leq b$.
To determine this average, we sample the $x_i$ at random (instead of at regular intervals) and compute the values of $f(x_i)$. For the one-dimensional integral $\int f(x) dx$, the estimate of the integral in the sample mean method is given by
$$ I_N = (b-a)\langle f \rangle $$with $\langle f \rangle$ the usual definition of the mean:
$$ \langle f \rangle = \frac{1}{N}\sum\limits_{i=1}^N f(x_i) $$The $x_i$ are random numbers distributed uniformly in the interval $a \leq x_i \leq b$, and $N$ is the number of trials. This is formally identical to the rectangular approximation using regular, evenly spaced samples $x_i$ except that the $N$ points are chosen with random spacing.
We can estimate the uncertainty in our numerical result using
$$ \mathrm{unc}_N = (b-a) \sqrt{\frac{\langle f^2 \rangle - \langle f \rangle^2}{N}} $$which is related to the standard deviation of the randomly distributed values of $f(x_i)$. The fact that the uncertainty depends on $1\sqrt{N}$ means we would have to increase our number of samples by a factor of 100 to improve the accuracy of the result by a factor of 10.
Since the computation time will scale with the number of samples, if higher accuracy is desired, other techniques such as importance sampling are needed. Nevertheless, for higher dimensional integrals this random sampling method does a better job than the rectangular approximation, is more efficient, and easier to implement.
To calculate the integral using the sample mean method, just take the randomly sampled $x_i$ values and compute the function at all of those points $f(x_i)$, take the average $\langle f \rangle$, and multiply by the interval $(b-a)$ . This approximates the area under the curve reasonably well.
Let's write some code to compute the integral of the function $f(x) = x^{1/2} + \sin(x)$ and its uncertainty using the sample mean method for $N$=10000.
In [ ]:
def sample_mean(N=10000,xmin=1.7,xmax=12.4):
#Set up the function
f = lambda x: x**(0.5) + np.sin(x)
x = np.arange(0.,15.,0.001)
Iana = lambda x: (2./3.)*x**(3./2.) - np.cos(x)
Iexact = Iana(xmax)-Iana(xmin)
##########################################################
#Here's the integration part:
samples_x = xmin + (xmax-xmin)*np.random.random_sample(N);
approx = f(samples_x)
I = approx.mean()*(xmax-xmin)
#Numerical uncertainty
Iunc = (xmax-xmin)*np.sqrt( ((approx**2).mean()-(approx.mean())**2)/N )
#Comparison with exact result
Ierr = 100*np.abs(Iexact-I)/Iexact
##########################################################
#find the maximum value of f within the boundary to set ymax
ymax = approx.max()
ymin = 0.
#Print our results
print "I_exact =", Iexact
print "I_samplemean =", I, " +/- ",Iunc
print "Error on I =", Ierr, "%"
#Plot our function
plt.plot(x,f(x),lw=3)
plt.ylabel(r'$x^{1/2} + \sin(x)$',fontsize=20)
plt.xlabel(r'$x$', fontsize = 20)
plt.xlim(0.,15.)
#plot the box
plt.plot([xmin, xmin], [ymin, ymax], color='k', linestyle='--') #vertical left
plt.plot([xmax, xmax], [ymin, ymax], color='k', linestyle='--') #vertical right
plt.plot([xmin, xmax], [ymax, ymax], color='k', linestyle='--') #horizontal top
#Plot the average value of the function from the samples on the same graph for comparison:
plt.plot((xmin,xmax),(approx.mean(),approx.mean()),'r-',label="Average value of function at sample points")
plt.legend(loc="upper left")
plt.show()
v2 = interact(sample_mean,N=(1000,100000,100),xmin=(0.,15.,0.001),xmax=(0.,15.,0.001))
The area under the average is approximately equal to the area under the curve. The uncertainty is still controlled by the variance of the function, but notice that we now have to sample half as many numbers to get a result that is nearly equivalent, if not a little better than our "hit-or-miss" method. This makes the sample mean method much more advantageous. Play with the xmin
and xmax
sliders to see how the function average and the boundary box change. For example, let xmin
= ~7 and xmax
= ~9. How do the estimated uncertainty and percent error compare when the function has smaller variance?
Many problems in physics involve averaging over many variables. For example, suppose we know the position and velocity dependence of the total energy of ten interacting particles. In three dimensions each particle has three velocity components and three position components. Hence the total energy is a function of 60 variables, and a calculation of the average energy per particle involves computing a $d$ = 60 dimensional integral. (More accurately, the total energy is a function of 60 − 6 = 54 variables if we use center of mass and relative coordinates.) If we divide each coordinate into $p$ intervals, there would be $p^{60}$ points to sum. Clearly, standard numerical methods such as Simpson’s rule would be impractical for this example.
Because the computational time is roughly proportional to $N$ in both the classical and Monte Carlo methods, for low dimensions, the classical numerical methods such as Simpson’s rule are preferable to Monte Carlo methods unless the domain of integration is very complicated. However, the error in the conventional numerical methods increases with dimension, so Monte Carlo methods are essential for higher dimensional integrals.
The general method for evaluating multidimensional integrals such as
$$ I = \int\int\int f(x,y,z) dx dy dz $$is the same as before. Form a volume that encloses the region of interest and randomly sample space points within the volume. Compute the average value of the function at each of those points and multiply by the total volume to get the estimate of the integral.
For example, in 3 dimensions, we can solve
$$\int_0^{9/10}\int_0^1\int_0^{11/10} (4 - x^2 - y^2 - z^2)dxdydz$$which has exact solution 14817/5000=2.9634.
Let's look at the evolution of the uncertainty as we increase the number of samples.
In [ ]:
fxyz = lambda x,y,z: 4. - x**2 - y**2 - z**2
xa = 0.; xb = 9./10.
ya = 0.; yb = 1.
za = 0.; zb = 11./10.
i = (10,100,1000,10000,100000,1000000)
for N in i:
x = (xb-xa)*np.random.random_sample(N)
y = (yb-ya)*np.random.random_sample(N)
z = (zb-za)*np.random.random_sample(N)
I = (xb-xa)*(yb-ya)*(zb-za)*fxyz(x,y,z).mean()
err = (xb-xa)*(yb-ya)*(zb-za)*np.sqrt(((fxyz(x,y,z)**2).mean() - (fxyz(x,y,z).mean())**2)/N)
exact = 14817./5000
print "Exact =",exact,"I=",I," +/-",err,"percentErr =",100.*np.abs(I-exact)/exact
The advantages for this method of integration are
Let's try one in even higher dimensions. We'll need more symbols to represent our growing number of coordinates so let's use the standard subscript notation $\vec{x} = (x_i)$ for $i = 1,2,...,n$.
$$ \int_0^{7/10}\int_0^{4/5}\int_0^{9/10}\int_0^{1}\int_0^{11/10}(6-x_1^2-x_2^2-x_3^2-x_4^2-x_5^2)dx_1dx_2dx_3dx_4dx_5$$which has the exact solution 63987/25000=2.55948.
In [ ]:
func5D = lambda x1,x2,x3,x4,x5: 6 - x1**2 - x2**2 - x3**2 - x4**2 - x5**2
L = 7./10; W = 4./5; H = 9./10; P = 1.; Q = 11./10
exact = 63987./25000
for N in (10,100,1000,10000,100000,1000000):
x1 = L*np.random.random_sample(N)
x2 = W*np.random.random_sample(N)
x3 = H*np.random.random_sample(N)
x4 = P*np.random.random_sample(N)
x5 = Q*np.random.random_sample(N)
I = (L*W*H*P*Q)*func5D(x1,x2,x3,x4,x5).mean()
unc = L*W*H*P*Q*np.sqrt((func5D(x1,x2,x3,x4,x5)**2).mean() - (func5D(x1,x2,x3,x4,x5).mean())**2)
err = 100*np.abs(I - exact)/exact
print "Integral=%.5f, +/- %.6f, percent err = %.3f, N = %d"%(I,unc,err,N)
And finally for a function with no exact solution:
$$ \int_0^{7/10}\int_0^{4/5}\int_0^{9/10}\int_0^{1}\int_0^{11/10}\sqrt{6-x_1^2-x_2^2-x_3^2-x_4^2-x_5^2}dx_1dx_2dx_3dx_4dx_5$$
In [ ]:
func5D = lambda x1,x2,x3,x4,x5: np.sqrt(6 - x1**2 - x2**2 - x3**2 - x4**2 - x5**2)
L = 7./10; W = 4./5; H = 9./10; P = 1.; Q = 11./10
for N in (10,100,1000,10000,100000,1000000):
x1 = L*np.random.random_sample(N)
x2 = W*np.random.random_sample(N)
x3 = H*np.random.random_sample(N)
x4 = P*np.random.random_sample(N)
x5 = Q*np.random.random_sample(N)
I = (L*W*H*P*Q)*func5D(x1,x2,x3,x4,x5).mean()
unc = L*W*H*P*Q*np.sqrt((func5D(x1,x2,x3,x4,x5)**2).mean() - (func5D(x1,x2,x3,x4,x5).mean())**2)
print "Integral=%.5f, +/- %.6f, N = %d"%(I,unc,N)
The solution converges more quickly, suggesting that in this case the smoothness of the function dominates the uncertainty rather than the sample size.