Benchmarking. Julia provides several tools for measuring code performance. Perhaps the simplest way is using the @time or @elapsed macro which can be placed prior to a command, like “@time randn(1000)”. The @time macro prints the time, but returns the value of the following expression. The @elapsed macro discards the following expression’s return value and returns the elapsed time evaluating the expression. For finer control, you can investigate the tic(), toc() and toq() functions (see here in the Julia manual for details).
Q1: For several common mathematical functions, calculate how many million evaluations your computer performs per second. Try at least three arithmatic operations, two trig functions, and a logarithm. For example,
N = 1000000;
println("rand: ", 1./(@elapsed x = rand(N)));
println(“.+: “,1./(@elapsed x.+x));
In [1]:
N = 1000000
x = rand(N)
println(".+ ",1./(@elapsed 1.+x))
println("+ ",1./(@elapsed x+x))
println(".* ",1./(@elapsed x.*x))
println("sin ",1./(@elapsed sin(x)))
println("tan ",1./(@elapsed tan(x)))
println("log ",1./(@elapsed log(x)))
println("sin(tan(log(x+x))) ",1./(@elapsed sin(tan(log(x+x)))))
Q2: For the text below, we’ll consider the log_likelihood function and synthetic data from Lab/HW #1, which was probably very similar to the following.
In [3]:
srand(42);
Nobs = 100;
z = zeros(Nobs);
sigma = 2. * ones(Nobs);
y = z + sigma .* randn(Nobs);
normal_pdf(z, y, sigma) = exp(-0.5*((y-z)/sigma)^2)/(sqrt(2*pi)*sigma);
function log_likelihood(y::Array, sigma::Array, z::Array)
n = length(y);
sum = zero(y[1]);
for i in 1:n
sum = sum + log(normal_pdf(y[i],z[i],sigma[i]));
end;
return sum;
end
Out[3]:
2a) Write a function, calc_time_log_likelihood(Nobs::Int, M::Int = 1), that takes one required argument (the number of observations) and one optional argument (the number of times to repeat the calculation), initializes the y, sigma and z arrays with the same values each call (with the same Nobs), and returns the time required to evaluate the log_likelihood function M times.
In [4]:
function calc_time_log_likelihood(Nobs::Int, M::Int=1)
#Let's steal Eric's code to iniatialize the dummy arrays
srand(42);
z = zeros(Nobs);
sigma = 2. * ones(Nobs);
y = z + sigma .* randn(Nobs)
return @elapsed [log_likelihood(y,sigma,z) for i in 0:M ] #I still love list comprehensions so, so much!
end
Out[4]:
2b) Time how long it takes to run log_likelihood using Nobs=100 and M = 1. Time it again. Compare the two. Why was there such a big difference?
In [4]:
println(calc_time_log_likelihood(100,1))
println(calc_time_log_likelihood(100,1))
It was a pretty big difference the very first time I ran it. Then I ran the cell again and Julia is to smart (I even restarted the kernal), so it doesn't look like there was a big difference. But the first time the function was called it had to compile incresing the overhead, for the subsecant calls it was already compiled in memory and is just evauluated.
2c) If you were only calling this function a few times, then speed wouldn’t matter. But if you were calling it with a very large number of observations or many times, then the performance could be important. Plot the run time versus the size of the array. The following example demonstrates how to use the PyPlot module, “comprehensions” to construct arrays easily, and the map function to apply a function to an array of values.
In [4]:
using PyPlot
n_list = [ 2^i for i=1:10 ]
elapsed_list = map(calc_time_log_likelihood,n_list)
plot(log10(n_list), log10(elapsed_list), color="orange", linewidth=2, marker="^", markersize=12);
xlabel("log N"); ylabel("log (Time/s)");
2d) Does the run time increase linearly with the number of observations? If not, try to explain the origin of the non-linearity. What does this imply for how you will go about future timing.
No, it is nonlinear until N ~ 15, then it becomes linear. I am not sure why.
Assertions: Sometimes a programmer calls a function with arguments that either don’t make sense or represent a case that the function was not originally designed to handle properly. The worst possible function behavior in such a case is returning an incorrect result without any warning that something bad has happened. Returning an error at the end is better, but can make it difficult to figure out the problem. Generally, the earlier the problem is spotted, the easier it will be to fix the problem. Therefore, good developers often include assertions to verify that the function arguments are acceptable.
2e) Time your current function with Nobs = 100 & M = 100. Then add @assert statements to check that the length of the y, z and sigma arrays match. Retime with the same parameters and compare. What does this imply for your future decisions about how often to insert assert statements in your codes?
In [15]:
Nobs = 100
M = 100
println("Original: ",calc_time_log_likelihood(Nobs,M))
function asserted_log_likelihood(y::Array, sigma::Array, z::Array)
n = length(y);
sum = zero(y[1]);
@assert y == sigma & sigma == z
for i in 1:n
sum = sum + log(normal_pdf(y[i],z[i],sigma[i]));
end;
return sum;
end
function asserted_calc_time_log_likelihood(Nobs::Int, M::Int=1)
#Let's steal Eric's code to iniatialize the dummy arrays
srand(42);
z = zeros(Nobs);
sigma = 2. * ones(Nobs);
y = z + sigma .* randn(Nobs)
@assert y == sigma & sigma == z
return @elapsed [asserted_log_likelihood(y,sigma,z) for i in 0:M ] #I still love list comprehensions so, so much!
end
println("Asserted: ",calc_time_log_likelihood(Nobs,M))
The assertions seem to speed the code up a bit, why? Does the assertion stop julia from doing something additionally under ther hood?
2f) Write a version of the log_likelihood and calc_time_likelihood functions that take a function as an argument, so that you could use them for distributions other than the normal distribution. Retime and compare. What does this imply for your future decisions about whether to write generic functions that take another function as an argument?
2g) Write a version of the log_likelihood function that makes use of your knowledge about the normal distribution, so as to avoid calling the exp() function. Retime and compare. How could this influence your future decisions about whether generic or specific functions?
2h) Optional (important for IDL & Python gurus): Rewrite the above function without using any for loops. Retime and compare. What does this imply for choosing whether to write loops or vectorized code?
2i) Optional (potenetially useful for IDL & Python gurus): Try to improve the performance using either the Devectorize and/or NumericExpressions modules. Retime and compare. Does this change your conclusions from part 2h? If so, how?
Running low on the time I set aside for homework, so I am going to start on this problem which looks more fun
Q3: Consider integrating the equations of motion for a test particle orbiting a star fixed at the origin. In 2D, there are two 2nd order ODEs:
$$\frac{d^2 r_x}{dt^2} = a_x = -GM r_x/r^3$$
$$\frac{d^2 r_y}{dt^2} = a_y = -GM r_y/r^3,$$
where $r = \sqrt(r_x^2+r_y^2)$ and $a_x$ and $a_y$ are the accelerations in the x and y directions.
Numerically, it is generally easier to integrate 1st order ODEs, so we transform these into:
\begin{eqnarray}
d v_x/dt & = a_x & = -GM r_x/r^3 \\
d v_y/dt & = a_y & = -GM r_y/r^3 \\
d r_x/dt & = v_x & \\
d r_y/dt & = v_y & \\
\end{eqnarray}
Euler’s method is a simplistic algorithm for integrating a set of 1st order ODEs with state stored in a vector x:
$$
r(t_n+1) = r(t_n) + dt * \frac{dr}{dt}|_{t_n} \\
v(t_n+1) = v(t_n) + dt * \frac{dv}{dt}|_{t_n}
$$
3a) Write a function integrate_euler!(state,dt, duration) that integrates a system (described by state) using Euler’s method and steps of size dt for an interval of time given by duration. It would probably be wise to write at least two additional functions. Have your function return a two dimensional array containing a list of the states of the system during your integration
Shouldn't the ! be dropped, since it takes a vector as input but returns a 2 dimensonal list of vectors instead of the modified version of the oringinal vector?
In [145]:
function derrivatives(state::Array,GM)
r = sqrt(state[1]^2+state[2]^2)
dvx = -GM*state[1]/r^3
dvy = -GM*state[2]/r^3
drx = state[3]
dry = state[4]
return [drx,dry,dvx,dvy]
end
function update_state(state::Array,ds::Array,dt)
newstate = [state[1]+dt*ds[1],state[2]+dt*ds[2],state[3]+dt*ds[3],state[4]+dt*ds[4]]
#print(newstate)
#state[1] = newstate[1] Tried to make the a mutable function (!), but this kept giving an inexact error???
#state[2] = newstate[2]
#state[3] = newstate[3]
#state[4] = newstate[4]
return newstate
end
function integrate_euler(state::Array,dt,duration,GM=1)
#state vector = (r_x, r_y, v_x, v_y)
#I like dictionaries, but apparently julia can't do a dictionary of arrays or at least I couldn't get it to work
r_x,r_y,v_x,v_y = [],[],[],[]
N = duration/dt
for i in 0:N
ds = derrivatives(state,GM)
state = update_state(state,ds,dt)
#print(state)
r_x = vcat(r_x,[state[1]])
r_y = vcat(r_y,[state[2]])
v_x = vcat(v_x,[state[3]])
v_y = vcat(v_y,[state[4]])
end
out = (r_x,r_y,v_x,v_y)
return out
end
Out[145]:
3b) Use the above code to integrate a system with an initial state of (r_x, r_y, v_x, v_y) = (1, 0, 0, 1) for roughly 3 orbits (i.e., 3*2pi time units if you set GM=1) using approximately 200 time steps per orbit. Inspect the final state and comment on the accuracy of the integration.
In [156]:
orbit = integrate_euler([1,0,0,1],2pi/200,6pi);
println("X:",orbit[1][end]," Y:",orbit[2][end]," Vx:",orbit[3][end]," Vy:",orbit[4][end])
That looks pretty far from three orbits around back to (1,0,0,1)
3c) Plot the trajectory of the test particle (e.g., x vs y). Is the calculated behavior accurate?
In [155]:
using PyPlot
plot(orbit[1],orbit[2])
xlabel("X")
ylabel("Y")
Out[155]:
Clearly there is an unseen companion perturbing the orbit! Either that or the accuracy of the intergration is not great.
3d) Write a new function integrate_leapfrog!(state,dt, duration) that is analogous to integrate_euler, but uses the “leapfrog” or modified Euler’s integration method: $$ r(t_n+1/2) = r(t_n) + dt/2 * \frac{dr}{dt}|_{t_n} \\ v(t_n+1) = v(t_n) + dt * \frac{dv}{dt}|_{t_n+1/2} \\ r(t_n+1) = r(t_n+1/2) + dt/2 * \frac{dr}{dt}|_{t_n+1/2} $$
In [148]:
function leapfrog(state::Array,dt)
Rx = state[1] + .5*dt*state[3]
Ry = state[2] + .5*dt*state[4]
newstate = [Rx,Ry,state[3],state[4]]
return newstate
end
function update_state_leapfrog(state::Array,ds,dt)
newstate = [state[1]+.5*dt*ds[1],state[2]+.5*dt*ds[2],state[3]+dt*ds[3],state[4]+dt*ds[4]]
#print(newstate)
#state[1] = newstate[1] Tried to make the a mutable function (!), but this kept giving an inexact error???
#state[2] = newstate[2]
#state[3] = newstate[3]
#state[4] = newstate[4]
return newstate
end
function integrate_leapfrog(state::Array,dt,duration,GM=1)
#state vector = (r_x, r_y, v_x, v_y)
#I like dictionaries, but apparently julia can't do a dictionary of arrays or at least I couldn't get it to work
r_x,r_y,v_x,v_y = [],[],[],[]
N = duration/dt
for i in 0:N
state = leapfrog(state,dt)
ds = derrivatives(state,GM)
state = update_state_leapfrog(state,ds,dt)
#print(state)
r_x = vcat(r_x,[state[1]])
r_y = vcat(r_y,[state[2]])
v_x = vcat(v_x,[state[3]])
v_y = vcat(v_y,[state[4]])
end
out = (r_x,r_y,v_x,v_y)
return out
end
Out[148]:
3e) Repeat the integration from part 3b, but using the integrate_leapfrog code. Inspect the final end state and make a similar plot. Describe the main difference in the results and explain on what basis you would choose one over the other.
In [159]:
orbitL = integrate_leapfrog([1,0,0,1],2pi/200,6pi);
println("X:",orbitL[1][end]," Y:",orbitL[2][end]," Vx:",orbitL[3][end]," Vy:",orbitL[4][end])
using PyPlot
plot(orbitL[1],orbitL[2])
xlabel("X")
ylabel("Y")
figure()
plot(orbit[1],orbit[2],label="Eular")
plot(orbitL[1],orbitL[2],label="Leapfrog")
xlabel("X")
ylabel("Y")
legend(loc="lower left")
Out[159]:
They both look bad, I don't think a case can be made one way or another.
3f) Time how long your code for a similar integration of 100 orbits. How long would it take to integrate for 4.5*10^9 orbits (e.g., age of Earth)?
In [164]:
Time_eular = @elapsed integrate_euler([1,0,0,1],2pi/200,200pi)
Time_Leap_frog = @elapsed integrate_leapfrog([1,0,0,1],2pi/200,200pi)
time_age_eular = Time_eular * 4.5e7
time_age_leap_frog = Time_Leap_frog * 4.5e7
println("Eular: ",Time_eular," sec; Time for 4.5e9: ", time_age_eular/86400.," days")
println("Leap Frog: ",Time_Leap_frog," sec; Time for 4.5e9: ", time_age_leap_frog/86400.," days")
3g) Repeat the integration from part 3e, but using a larger dt. Inspect the final end state. How much less accurate is the integration that used a larger dt?
3h) Make a log-log plot of the accuracy of the integration versus the integration duration for durations spanning at least a few orders of magnitude. Estimate the slope (by eye is fine). What are the implications for the accuracy of a long-term integration?
3i) List a few ideas for how you could accelerate the integrations. Comment on how likely each of your ideas is to make a significant difference. (If you try it out, report the new time for 100 orbits, so we can compare.) Discuss any drawbacks of each idea to accelerate the integrations.
Q4: Consider a modern laptop CPU with 4 GB ($=4*2^{30}$) of usable memory and capable performing 20 Gflops/second. Assume it uses 8 bytes of memory to store each floating point number (“double precision”). [Based on Oliveira & Stewart’s Writing Scientific Software, Chapter 5, Problem #6.]
4a) What is the largest matrix that the above computer could fit into its available memory at one time?
4b) If we use LU factorization to solve a linear system, estimate how long would it take to solve the maximum size linear system that would fit into memory at once. You may assume the computation is maximally efficient, the computer reaches peak performance and the LU decomposition requires $\sim \frac{2}{3} n^3$ flops.
4c) For a modern laptop, does memory or compute time limit the size of system that can be practically solved with LU decomposition?
4d) For real life problems, what other considerations are likely to limit performance?
4e) How could one practically solve even larger systems?
In [6]:
10^1.2
Out[6]:
In [ ]: