Performance

Julia is carefully designed with performance in mind. However, it is just as possible to write slow code in Julia as in any other dynamic language! The difference is that with some relatively minor tweaks, we can see dramatic increases in performance in Julia. A detailed discussion of performance-related issues can be found in the Julia manual.

We have already seen the first two (related) tweaks:

(1) Don't work in global scope

(2) Wrap everything in a function

Type stability

An important concept is that of type stability. Let's take as an (over-simplified) example the calculation of collisions of a particle with a fixed disc.

Fix the centre of the disc at $(0,0)$ with radius $r$. Suppose the particle starts at $(x, y)$ with $y$ large and negative and moves to the right, with velocity $(1, 0)$ for simplicity. We wish to calculate the collision with the disc and return the collision point. However, they may not be a collision, so we return false in this case:


In [47]:
function find_collision(x, y, r)
    collision = false
    
    if abs(y) < r
        x_collision = -sqrt(r^2 - y^2)
        collision = (x_collision, y)
    end
    
    collision
    
end


Out[47]:
find_collision (generic function with 1 method)

In [48]:
find_collision(-10, 0.2, 1)


Out[48]:
(-0.9797958971132712,0.2)

In [49]:
find_collision(-10, 0.9, 1)


Out[49]:
(-0.4358898943540673,0.9)

In [50]:
find_collision(-10, 0.99, 1)


Out[50]:
(-0.14106735979665894,0.99)

In [51]:
find_collision(-10, 1.1, 1)


Out[51]:
false

Let's run this many times for random initial heights $y$:


In [52]:
function run(N)
    for i in 1:N
        y = -10. + 20*rand()
        find_collision(-10., y, 1.)
    end
end


Out[52]:
run (generic function with 1 method)

In [53]:
run(10)  # compile the function first before running

In [54]:
@time run(10^8)


elapsed time: 2.010927848 seconds (640155280 bytes allocated, 21.50% gc time)

The calculation seems pretty fast, but note the huge amount of memory allocated.

Let's rewrite it to return the same type, even when there is no collision. For example, we could say that the collision occurs at $\infty$:


In [43]:
function find_collision2(x, y, r)
    
    collision = (Inf, Inf)
    
    if abs(y) < r
    
        x_collision = -sqrt(r^2 - y^2)
        collision = (x_collision, y)
    end
    
    collision
end


Out[43]:
find_collision2 (generic function with 1 method)

In [44]:
function run2(N)
    for i in 1:N
        y = -10. + 20*rand()
        find_collision2(-10., y, 1.)
    end
end


Out[44]:
run2 (generic function with 1 method)

In [45]:
run2(10)

In [46]:
@time run2(10^8)


elapsed time: 0.914406089 seconds (96 bytes allocated)

We see that the execution time is cut in half, and there is no longer an excessive allocation (although this can often result in a more dramatic improvement for a more complicated function).

The difference is that in the first version, the type of the variable can change from boolean to tuple, whereas in the second version it is always a tuple. Type instability is the first thing to look for when there is a problem with excessive allocations.

(3) Avoid type instability.

A tool: @code_warntype

A useful tool to detect type instability is @code_warntype, available only in Julia v0.4. This should, for now, be run from the REPL. We will see an example using the above functions.

Another tool: Profiling

The simplest mechanism for profiling is @time that we have been using; this outputs the time taken and allocations performed, but does not return this information. There are macros @elapsed that returns the time, and @allocated that returns the amount of memory allocated.

Julia has a built-in profiler:


In [55]:
@profile sin(10)


Out[55]:
-0.5440211108893698

In [56]:
Profile.print()


1 task.jl; anonymous; line: 340
 1 ...3/IJulia/src/IJulia.jl; eventloop; line: 123
  1 ...src/execute_request.jl; execute_request_0x535c5df2; line: 157
   1 loading.jl; include_string; line: 97
    1 profile.jl; anonymous; line: 14

There is a package ProfileView that gives a graphical view of the profile information.

Using a combination of these tools, it should be possible to pin down performance hot-spots.

(4) Profile!