N-Body Simulation Using Python

For our first online lesson, we're going to explore the following:

How to read code, and make code more readable.
How to use visualization to help debug scientific code.
What kinds of tests are most useful when working with scientific software.
Some useful features of object-oriented programming.

The example we'll use for this lesson is a (very) simple simulation of the orbits of the four Jovian planets around the Sun. Please take a few minutes before we meet online to read through this notebook with a partner, and answer the questions embedded in it. (The "with a partner" part is important: we'd like you to compare thoughts as you go along.)

We'll start by reviewing the equations we will implement:

The gravitational force between two bodies: $$\vec{F} = \frac{G m_{1} m_{2} \vec{r}_{12} }{|\vec{r}_{12}|^{3}}$$

where $\vec{r}_{12} = \vec{r}_{2} - \vec{r}_{1} $.

Replacing force $F$ with change in momentum: $$m_{1} \frac{d\vec{v}_{1}}{dt} = \frac{G m_{1}m_{2}\vec{r}_{12}}{|\vec{r}_{12}|^{3}}$$

To intgrate these equation we will solve for $d\vec{v}_{1}$: $$d\vec{v}_{1} = \frac{G m_{2}\vec{r}_{12}}{|\vec{r}_{12}|^{3}}dt $$

Since we are dealing with an N body problem, we sum across the N planets: $$\vec{v}_{1} = \sum_{i\neq1}^{N} \frac{G m_{i}\vec{r}_{1i}}{|\vec{r}_{1i}|^{3}}dt $$

A important diagnostic for our simulation will be energy conservation. The sum of potential and kinetic energies should be conserved.

The total poential energy of the system is: $$U=\sum_{i\neq j}^{N} \frac{Gm_{i}m_{j}}{|\vec{r}_{ij}|} $$

The kinetic energy of each planet is: $$V=\sum_{i}^N \frac{m_{i}|\vec{v}_{i}|^{2}}{2} $$

Please note in the code we will set $G$ to 1 i.e. ignore it.

We'll start our program with the main function that controls everything else:



In [1]:

    
def main(bodies, timestep, num_steps):
    '''Main driver for really simple N-body simulation.'''

    # Checkpoint original energy and time.                                                                                                                                  
    e_original = total_energy(bodies)
    t_original = time.time()

    # Evolve.                                                                                                                                                               
    advance(timestep, num_steps, bodies)

    # Get final energy and time.                                                                                                                                                             
    t_final = time.time()
    e_final = total_energy(bodies)
    
    # Report.
    d_energy = 100 * abs((e_final - e_original) / e_original)
    d_time = t_final - t_original
    print 'Energy residual: %f - %f (%f %%)' % (e_final, e_original, d_energy)
    print 'Elapsed time: %f' % d_time

The simulation itself will be done by a function called advance; everything else here is bookkeeping. What's important is that it's readable bookkeeping: even if you've never used the time library, you can probably figure out what time.time() does, and you can guess from their names what the total_energy function and the d_energy variable are for.

Question 1: do you prefer e_original and e_final, energy_original and energy_final, or energy_i (for "initial") and energy_f? What does your reading partner prefer?

So what's in the advance function?



In [4]:

    
def advance(dt, num_steps, bodies):
    '''Advance the simulation num_steps times, in steps dt long.'''

    for step in range(num_steps):

        # Update velocities of bodies.                                                                                                                                      
        for (left, right) in pairs(bodies):
            d_pos = left.pos - right.pos
            mag = dt * (d_pos.squared() ** -1.5)
            left_force = left.mass * mag
            right_force = right.mass * mag
            left.vel -= d_pos * right_force
            right.vel += d_pos * left_force

        # Move bodies.                                                                                                                                                      
        for b in bodies:
            b.pos += b.vel * dt

The outer loop is straightforward: it just counts off timesteps. In each timestep, we have one block of code to figure out each body's ΔV, and another block that moves all the bodies. (We have to do all the ΔV calculations before moving anything so that we're taking consistent snapshots of our world.)

Question 2: do you think this would be more readable if advance was written as:

def advance(dt, num_steps, bodies):
    for step in range(num_steps):
        for (left, right) in pairs(bodies):
            update_velocities(left, right)
        for b in bodies:
            update_position(b)

i.e., if we took the inline calculations out into functions of their own?

Question 3: do you think the velocity update calculations are correct? Why or why not?

What isn't obvious here is how celestial bodies are represented. Inside the inner loop, where left and right are both celestial bodies, we are using their positions (as left.pos and right.pos), their masses (left.mass and right.mass), and their velocities (left.vel and right.vel) in various calculations. For the moment, we'll just assume that we have a way to represent and do math on vectors and scalars; we'll come back later and fill this in.

Before we do that, though, let's fill in our other calculation function:



In [5]:

    
def total_energy(bodies):
    '''Calculate the total potential and kinetic energy of the system.'''

    energy = 0.0

    # Potential energy.                                                                                                                                                     
    for (left, right) in pairs(bodies):
        d_pos = left.pos - right.pos
        energy -= (left.mass * right.mass) / (d_pos.squared() ** 0.5)

    # Kinetic energy.                                                                                                                                                        
    for b in bodies:
        energy += b.mass * b.vel.squared() / 2.0

    return energy

Again, this is pretty straightforward: there's a loop over pairs of bodies to calculate potential energy, and another loop over individual bodies to calculate kinetic energy.

We now have to figure out how to store celestial bodies and their vector properties. Bodies are easiest, so we'll create a class for that first:



In [6]:

    
class Body(object):
    '''Represent a celestial body's name, mass, position, and velocity.'''
    def __init__(self, name, mass, pos, vel):
        self.name = name
        self.mass = mass
        self.pos = pos
        self.vel = vel

Here, we've defined a class called Body that stores a name, a mass, a position, and a velocity. The specially-named constructor __init__ tells Python how to initialize an object of this class. When we use the name of the class as if it were a function, like this:



In [7]:

    
example = Body('Pluto', 1.0, 100.0, 0.001)

Python does the following:

Create an object without any associated data values.
Call Body.__init__ with that object as its first parameter, and 'Pluto', 1.0, 100.0, and 0.001 as its other parameters.
Give us back the object to assign to example when Body.__init__ is finished.

We can check that the constructor has done its job by printing example's data:



In [8]:

    
print 'example.name', example.name
print 'example.mass', example.mass
print 'example.pos', example.pos
print 'example.vel', example.vel









    



example.name Pluto
example.mass 1.0
example.pos 100.0
example.vel 0.001

This is the simplest possible use of object-oriented programming: all we've done is create a container for named properties, without any special abilities.

Question 3: inside the constructor, we use expressions like self.mass = mass to assign the value passed in as the parameter mass to the object's data member self.mass. Some people feel that giving constructor parameters the same names as data members helps keep them straight; other people find it confusing, and would write the constructor as:

def __init__(self, the_name, the_mass, the_pos, the_vel):
    self.name = the_name
    self.mass = the_mass
    self.pos = the_pos
    self.vel = the_vel

Which do you find easier to read and understand?

But in our program, we want position and velocity to be vectors, not scalars, so we need to define another class for that. We also need to teach those vectors how to subtract themselves from one another, how to calculate dot products, and everything else we've been assuming so far. Here's what the code looks like:



In [9]:

    
class Vec3(object):
    '''Represent a 3-vector (used for positions, velocities, etc.).'''

    def __init__(self, x, y, z):
        '''Construct vector from XYZ components.'''
        self.x = x
        self.y = y
        self.z = z

    def __add__(self, other):
        '''Add with another vector, returning a new vector as result.'''
        return Vec3(self.x + other.x, self.y + other.y, self.z + other.z)

    def __sub__(self, other):
        '''Subtract another vector, returning a new vector as result.'''
        return Vec3(self.x - other.x, self.y - other.y, self.z - other.z)

    def __mul__(self, scalar):
        '''Multiply by scalar, returning a new vector as result.'''
        return Vec3(self.x * scalar, self.y * scalar, self.z * scalar)

    def __div__(self, scalar):
        '''Divide by a non-zero scalar, returning a new vector as result.'''
        return self * (1. / scalar)

    def squared(self):
	'''Sum of squares of components.'''
        return self.x * self.x + self.y * self.y + self.z * self.z

This is the most complicated piece of object-oriented code we've seen so far, but in fact there's only one new idea. When Python sees an expression like:

x + 2

the first thing it does is ask the value that x refers to whether it knows how to do addition. The way it asks is to see if it has a method called __add__. If that's there, then Python lets the object handle addition itself. If it isn't, it tries to do the addition itself. (The real story is slightly more complicated, but this is the basic idea.)

Similarly, when Python sees:

my_position = your_position * 2

it asks your_position if it has a method called __mul__. If so, it calls your_position.__mul__(2), and assigns the result to my_position. Programmers can use these specially-named methods to control how Python does array indexing, function calling, and almost everything else. In our case, we're using it to make our 3D vectors behave like real mathematical vectors.

Question 4: explain to your reading partner what methods are called when we run the code below.

alpha = Vec3(0.0, 1.0, 2.0)
beta = Vec3(5.5, 6.6, 7.7)
gamma = alpha * 3 + beta

How many Vec3 objects have been created in total by the time these three lines are finished?

Question 5: why do __add__, __mul__, __sub__, and __div__ create new vectors? Why don't they just update the vector they're being called on?

Question 6: why does __div__ use `self (1./scalar)instead ofVec3(self.x/scalar, self.y/scalar, self.z/scalar)`?*

OK, let's do some simulation. We'll borrow values from the Computer Benchmark Benchmarks Game:



In [10]:

    
from math import pi
import time

SOLAR_MASS = 4 * pi * pi
DAYS_PER_YEAR = 365.24
TIMESTEP = 0.01

BODIES = [
    Body('Sun',
         SOLAR_MASS,
         Vec3(0.0, 0.0, 0.0),
         Vec3(0.0, 0.0, 0.0)),

    Body('Jupiter',
         9.54791938424326609e-04 * SOLAR_MASS,
         Vec3( 4.84143144246472090e+00,
              -1.16032004402742839e+00,
              -1.03622044471123109e-01),
         Vec3( 1.66007664274403694e-03 * DAYS_PER_YEAR,
               7.69901118419740425e-03 * DAYS_PER_YEAR,
              -6.90460016972063023e-05 * DAYS_PER_YEAR)),

    Body('Saturn',
         2.85885980666130812e-04 * SOLAR_MASS,
         Vec3( 8.34336671824457987e+00,
               4.12479856412430479e+00,
              -4.03523417114321381e-01),
         Vec3(-2.76742510726862411e-03 * DAYS_PER_YEAR,
               4.99852801234917238e-03 * DAYS_PER_YEAR,
               2.30417297573763929e-05 * DAYS_PER_YEAR)),

    Body('Uranus',
         4.36624404335156298e-05 * SOLAR_MASS,
         Vec3( 1.28943695621391310e+01,
              -1.51111514016986312e+01,
              -2.23307578892655734e-01),
         Vec3( 2.96460137564761618e-03 * DAYS_PER_YEAR,
               2.37847173959480950e-03 * DAYS_PER_YEAR,
              -2.96589568540237556e-05 * DAYS_PER_YEAR)),

    Body('Neptune',
         5.15138902046611451e-05 * SOLAR_MASS,
         Vec3( 1.53796971148509165e+01,
              -2.59193146099879641e+01,
               1.79258772950371181e-01),
         Vec3( 2.68067772490389322e-03 * DAYS_PER_YEAR,
               1.62824170038242295e-03 * DAYS_PER_YEAR,
              -9.51592254519715870e-05 * DAYS_PER_YEAR)),
]

And as a last step, we'll write a function that takes a list and returns all the unique pairs of elements from it:



In [11]:

    
def pairs(vals):
    '''Create list of all unique pairs of elements in vals.'''
    result = []
    for x in range(len(vals) - 1):
        temp = vals[x+1:]
        for y in temp:
            result.append([vals[x], y])
    return result

Here's a simple test that shows what the function does:



In [12]:

    
print pairs(['a', 'b', 'c'])









    



[['a', 'b'], ['a', 'c'], ['b', 'c']]

Let's run the program:



In [13]:

    
main(BODIES, TIMESTEP, 100)









    



Energy residual: -0.169267 - -0.169290 (0.013735 %)
Elapsed time: 0.012000

And again for a longer period:



In [14]:

    
main(BODIES, TIMESTEP, 1000)









    



Energy residual: -0.169297 - -0.169267 (0.018040 %)
Elapsed time: 0.114000

Is that right? The only honest answer is that we don't know. If we want to convince ourselves that this code is operating correctly (as opposed to merely not crashing) we have some more work to do. One approach is to check that important physical quantities are conserved as well as energy:

Question 7: write a function that calculates the total momentum of the bodies in the system, and see how well it is conserved for simulations of various lengths.

Another approach is visualization. We can use a plotting library like matplotlib to draw the paths traced by the planets. If our simulation is working correctly, these ought to be closed ellipses.

Question 8: pair program with your partner—four hands, one keyboard—to do the following:

Modify advance to record the positions of the planets at each time step. You can do this by creating a list and appending values to it (so that it grows bit by bit), or by figuring out in advance how many values you're going to record and creating a NumPy array of the right size (so that all the memory is allocated at once). You can even write the coordinates to a file that you read back in later, although this will be much slower.
Write a function called draw that plots the XY positions of the planets after the simulation has completed. One way to do this is to create a scatter plot with dots showing the positions the planets were in at different times; another is to join up those dots with lines to create proper orbits. Whatever kind of display you choose, you should look at what matplotlib and pyplot want as input before you decide how to solve part 1, so that you create data in the right format for plotting.

How large or small were the changes you had to make to advance? Do the planets seem to be tracing closed paths? </em>

Profiling

So far we have timed a single run of our code. The run time can be influenced by what the OS is doing. Just like in measurements it can be useful to take several runs and average across them.



In [25]:

    
%%timeit
%%capture
main(BODIES, TIMESTEP, 100)









    



100 loops, best of 3: 10.5 ms per loop

To understand which parts of our code are inefficient we need more detail than the total run runtime. %%prun breaks down the runtime for each function in our code.



In [49]:

    
%%prun
main(BODIES, TIMESTEP, 100)









    



Energy residual: -0.169307 - -0.169297 (0.005558 %)
Elapsed time: 0.018000

14446 function calls in 0.018 seconds

Ordered by: internal time

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)

    1    0.007    0.007    0.018    0.018 <ipython-input-4-3274ad7f158a>:1(advance)        
 6020    0.003    0.000    0.003    0.000 <ipython-input-9-0a80e1eb825d>:4(__init__)
 2500    0.003    0.000    0.004    0.000 <ipython-input-9-0a80e1eb825d>:18(__mul__)
 2020    0.002    0.000    0.003    0.000 <ipython-input-9-0a80e1eb825d>:14(__sub__)
 1500    0.002    0.000    0.002    0.000 <ipython-input-9-0a80e1eb825d>:10(__add__)
  102    0.001    0.000    0.001    0.000 <ipython-input-11-fbe07d7faa9c>:1(pairs)
 1030    0.001    0.000    0.001    0.000 <ipython-input-9-0a80e1eb825d>:26(squared)
 1020    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
  103    0.000    0.000    0.000    0.000 {range}
    2    0.000    0.000    0.000    0.000 <ipython-input-5-0fbbe459b4cb>:1(total_energy)
    1    0.000    0.000    0.018    0.018 <ipython-input-1-b089cc6275f3>:1(main)
    4    0.000    0.000    0.000    0.000 iostream.py:196(write)
  102    0.000    0.000    0.000    0.000 {len}
    4    0.000    0.000    0.000    0.000 {method 'decode' of 'str' objects}
    4    0.000    0.000    0.000    0.000 {_codecs.utf_8_decode}
    4    0.000    0.000    0.000    0.000 {method 'write' of '_io.StringIO' objects}
    4    0.000    0.000    0.000    0.000 {isinstance}
    4    0.000    0.000    0.000    0.000 iostream.py:86(_is_master_process)
    4    0.000    0.000    0.000    0.000 iostream.py:95(_check_mp_mode)
    4    0.000    0.000    0.000    0.000 utf_8.py:15(decode)
    6    0.000    0.000    0.000    0.000 {time.time}
    1    0.000    0.000    0.018    0.018 <string>:2(<module>)
    4    0.000    0.000    0.000    0.000 {nt.getpid}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    1    0.000    0.000    0.000    0.000 {abs}

In the ouput table above the interesting columns are the first 'ncall', the second 'tottime' and the last 'filename:lineno(function)'. We spend the most time in advance because it runs the numerical integration loop, notice that it is only called once. A lot of time is spent in our vector functions: init, mul, sub and add.

Vectorizing



In [ ]: