@time, @elapsed and @inboundsIn this notebook, we demonstrate how fast Julia is, compared to other dynamically typed languages.
Read the text Why Julia? (3 min)
Read Performance tips section of the Julia manual. (20 min)
The reader should understand effects of "just-in-time compiler" called LLVM on the speed of execution of programs. The reader should be able to write simple, but fast, programs containing loops.
Some examples are taken from The Julia Manual.
In [1]:
function f(n)
s = 0
for i = 1:n
s += i/2
end
s
end
Out[1]:
In order for the fast execution, the function must first be compiled. Compilation is performed automatically, when the function is invoked for the first time. Therefore, the first call can be done with some trivial choice of parameters.
The timing can be done by two commands, @time and @elapsed:
In [2]:
?@time
Out[2]:
In [3]:
?@elapsed
Out[3]:
In [4]:
@time f(1)
Out[4]:
In [5]:
# This run is much faster, since the function is already compiled
@elapsed f(1)
Out[5]:
Let us now run the big-size computation. Notice the unnaturally high byte allocation and the huge amount of time spent on garbage collection.
In [6]:
# Notice the unnaturally high byte allocation!
@time f(1000000)
Out[6]:
In [7]:
# We shall be using @time from now on
@elapsed f(1000000)
Out[7]:
Since your computer can execute several Gigaflops (floating-point operations per second), this is rather slow. This slowness is due to type instability: variable s is in the beginning assumed to be of type Integer, while at every other step, the result is a real number of type Float64. Permanent checking of types requires permanent memory allocation and deallocation (garbage collection). This is corrected by very simple means: just declare s as a real number, and the execution is more than 10 times faster with almost no memory allocation (and, consequently, no garbage collection).
In [8]:
function f1(n)
s = 0.0
for i = 1:n
s += i/2
end
s
end
Out[8]:
In [9]:
@time f1(1)
Out[9]:
In [10]:
@time f1(1000000)
Out[10]:
@time can alo be invoked as a function:
In [11]:
@time(f1(1000000))
Out[11]:
In [12]:
@time s2=f1(1000000)
Out[12]:
In [13]:
@time(s2=f1(1000000))
Out[13]:
Exponential moving average is a fast one pass formula (each data point of the given data set $A$ is accessed only once) often used in high-frequency on-line trading (see Online Algorithms in High-Frequency Trading for more details). Notice that the output array $X$ is declared in advance.
Using return in the last line is here optional.
In [14]:
function fexpma{T}( A::Vector{T}, α::T )
# fast exponential moving average: X - moving average,
# A - data, alpha - exponential forgetting parameter
n = length(A)
X = Array{T}(n) # Declare X
β = one(T)-α
X[1] = A[1]
for k = 2:n
X[k] = β*A[k] + α*X[k-1]
end
return X
end
Out[14]:
In [15]:
# First run for compilation
fexpma([1.0],0.5)
Out[15]:
We now generate some big-size data:
In [16]:
# Big random slightly increasing sequence
A=[rand() + 0.00001*k*rand() for k=1:20_000_000]
Out[16]:
In [17]:
@time X=fexpma(A,0.9)
Out[17]:
In [18]:
function fexpma{T}( A::Vector{T}, α::T )
# fast exponential moving average: X - moving average,
# A - data, alpha - exponential forgetting parameter
n = length(A)
X = Array{T}(n) # Declare X
β = one(T)-α
X[1] = A[1]
@inbounds for k = 2:n
X[k] = β*A[k] + α*X[k-1]
end
return X
end
Out[18]:
In [19]:
@time X=fexpma(A,0.9)
Out[19]:
Similar Matlab programs give the following timing for the two versions of the function, first without prior declaration of $X$ and then with prior declaration. The latter version is nine times faster.
function X = fexpma0( A,alpha )
% fast exponential moving average: X - moving average, A - data, alpha - exponential forgetting parameter
n=length(A);
beta=1-alpha;
X(1)=A(1);
for k=2:n
X(k)=beta*A(k)+alpha*X(k-1);
end
>> A=rand(20000000,1)+0.00001*[1:20000000]'.*rand(20000000,1);
>> tic, X=fexpma0(A,0.9); toc
Elapsed time is 3.073359 seconds.
function X = fexpma( A,alpha )
% fast exponential moving average: X - moving average, A - data, alpha - exponential forgetting parameter
n=length(A);
X=zeros(n,1); % Allocate X in advance
beta=1-alpha;
X(1)=A(1);
for k=2:n
X(k)=beta*A(k)+alpha*X(k-1);
end
>> tic, X=fexpma(A,0.9); toc
Elapsed time is 0.320976 seconds.
Let us plot the data $A$ and its exponential moving average $X$. The dimension of the data is too large for meaningful direct plot. In Julia we can use @manipulate command to slide through the data. It takes a while to read packages Winston (for plotting) and Interact, but this is needed only for the first invocation.
In [21]:
using Winston
using Interact
In [22]:
@manipulate for k=1:1000:20000000
plot(collect(k:k+1000),A[k:k+1000],"r.",
collect(k:k+1000),X[k:k+1000],"b")
end
Out[22]:
More details about optimizing your programs are given in the Profiling Notebook.
The following example is from Pre-allocating outputs. The functions loopinc() and loopinc_prealloc() both compute $\sum_{i=2}^{10000001}i$, the second one being 15 times faster:
In [23]:
function xinc(x)
return [x, x+1, x+2]
end
function loopinc()
y = 0
for i = 1:10^7
ret = xinc(i)
y += ret[2]
end
y
end
function xinc!{T}(ret::AbstractVector{T}, x::T)
ret[1] = x
ret[2] = x+1
ret[3] = x+2
nothing
end
function loopinc_prealloc()
ret = Array{Int}(3)
y = 0
for i = 1:10^7
xinc!(ret, i)
y += ret[2]
end
y
end
Out[23]:
In [24]:
@time loopinc()
Out[24]:
In [26]:
@time loopinc_prealloc() # After the second run
Out[26]:
The following example is from Access arrays in memory order, along columns.
Multidimensional arrays in Julia are stored in column-major order, which means that arrays are stacked one column at a time. This convention for ordering arrays is common in many languages like Fortran, Matlab, and R (to name a few). The alternative to column-major ordering is row-major ordering, which is the convention adopted by C and Python (numpy) among other languages. The ordering can be verified using the vec() function or the syntax [:]:
In [27]:
B = rand(0:9,4,3)
Out[27]:
In [28]:
B[:]
Out[28]:
In [29]:
vec(B)
Out[29]:
The ordering of arrays can have significant performance effects when looping over arrays. Loops should be organized such that the subsequent accessed elements are close to each other in physical memory.
The following functions accept a Vector and and return a square Array with the rows or the columns filled with copies of the input vector, respectively.
In [30]:
function copy_cols{T}(x::Vector{T})
n = size(x, 1)
out = Array{eltype(x)}(n, n)
for i=1:n
out[:, i] = x
end
out
end
function copy_rows{T}(x::Vector{T})
n = size(x, 1)
out = Array{eltype(x)}(n, n)
for i=1:n
out[i, :] = x
end
out
end
Out[30]:
In [31]:
copy_cols([1.0,2])
copy_rows([1.0,2])
Out[31]:
In [32]:
x=rand(5000) # generate a random vector
Out[32]:
In [33]:
@time C=copy_cols(x) # We generate a large matrix
Out[33]:
In [34]:
@time D=copy_rows(x); # This is few times slower
In [35]:
?repmat
Out[35]:
In [36]:
@time C1=repmat(x,1,5000);
In [ ]: