Design considerations

As a JIT ("Just in Time")-compiled language, Julia is designed for good performance. Currently, it is usually expected that it should usually be able to reach speeds within at most a factor of 2 of that of corresponding C code.

However, to attain decent performance, there are certain principles that must be used in code; see the Performance tips section of the Julia manual for more details.

Profiling

When profiling, always run each function once with the correct argument types before timing it, since the first time it is run the compilation time will play a large role.


In [1]:
Pkg.update()


INFO: Updating METADATA...
INFO: Updating cache of Compose...
INFO: Updating cache of Gadfly...
INFO: Computing changes...
INFO: Cloning cache of Contour from git://github.com/tlycken/Contour.jl.git
INFO: Upgrading Compose: v0.3.0 => v0.3.1
INFO: Installing Contour v0.0.1
INFO: Upgrading Gadfly: v0.3.0 => v0.3.1
INFO: Building Datetime

In [2]:
@time sin(10)


elapsed time: 6.941e-6 seconds (96 bytes allocated)
Out[2]:
-0.5440211108893698

In [3]:
a = 3


Out[3]:
3

No global variables please!

Global variables are slow in Julia: do not use global variables!

Your main program should be wrapped in a function. Any time you are tempted to use globals, just send them as arguments to functions, and return them if necessary.

If you have many variables to pass around, wrap them in a type, e.g. called State

Type stability

The second important idea for gaining performance is that of type stability.

Any calculation will be immediately slowed down by having variables which can change type during a calculation, simply due to the extra work that must be done at run-time to check the type of the variables. (This is one of the main reasons for the slowness of Python and the necessity for type declarations in Cython to gain speed.)

A simple example (due to Leah Hanson) is the following pair of almost-identical functions:


In [4]:
function sum1(N::Int)
    total = 0
    
    for i in 1:N
        total += i/2
    end
    
    total
end

function sum2(N::Int)
    total = 0.0
    
    for i in 1:N
        total += i/2
    end
    
    total
end


Out[4]:
sum2 (generic function with 1 method)

We must first run the functions once each to compile them, before looking at any timings:


In [5]:
sum1(10), sum2(10)


Out[5]:
(27.5,27.5)

[Happily, they produce the same result!]


In [6]:
N = 10000000

@time sum1(N)
@time sum2(N)


elapsed time: 0.656544448 seconds (320108552 bytes allocated, 36.27% gc time)
elapsed time: 0.039772714 seconds (96 bytes allocated)
Out[6]:
2.50000025e13

The second version is consistently over 10 times faster than the first version, due simply to type stability. It also allocates almost no memory. The first version allocates an enormous amount of memory (in fact, it is allocating and deallocating all the time), and spends a large fraction of its time in garbage collection.

To help with type stability, there are functions zero(x) and one(x) that return the correctly-typed zero and one with the same type as the variable x:

Packages: Lint.jl, TypeCheck.jl


In [2]:
x = 1
zero(x)


Out[2]:
0

In [7]:
y = 0.5
zero(y)


Out[7]:
0.0

In [8]:
x = BigFloat("0.1")
one(x)


Out[8]:
1e+00 with 256 bits of precision

Exploring the guts

Julia gives us access to basically every step in the compilation process:


In [34]:
code_lowered(sum1, (Int,))


Out[34]:
1-element Array{Any,1}:
 :($(Expr(:lambda, {:N}, {{:total,:#s246,:#s245,:#s244,:i},{{:N,:Any,0},{:total,:Any,2},{:#s246,:Any,18},{:#s245,:Any,2},{:#s244,:Any,18},{:i,:Any,18}},{}}, :(begin  # In[23], line 2:
        total = 0 # line 4:
        #s246 = colon(1,N)
        #s245 = top(start)(#s246)
        unless top(!)(top(done)(#s246,#s245)) goto 1
        2: 
        #s244 = top(next)(#s246,#s245)
        i = top(tupleref)(#s244,1)
        #s245 = top(tupleref)(#s244,2) # line 5:
        total = total + i / 2
        3: 
        unless top(!)(top(!)(top(done)(#s246,#s245))) goto 2
        1: 
        0:  # line 8:
        return total
    end))))

In [35]:
code_lowered(sum2, (Int,))


Out[35]:
1-element Array{Any,1}:
 :($(Expr(:lambda, {:N}, {{:total,:#s246,:#s245,:#s244,:i},{{:N,:Any,0},{:total,:Any,2},{:#s246,:Any,18},{:#s245,:Any,2},{:#s244,:Any,18},{:i,:Any,18}},{}}, :(begin  # In[23], line 12:
        total = 0.0 # line 14:
        #s246 = colon(1,N)
        #s245 = top(start)(#s246)
        unless top(!)(top(done)(#s246,#s245)) goto 1
        2: 
        #s244 = top(next)(#s246,#s245)
        i = top(tupleref)(#s244,1)
        #s245 = top(tupleref)(#s244,2) # line 15:
        total = total + i / 2
        3: 
        unless top(!)(top(!)(top(done)(#s246,#s245))) goto 2
        1: 
        0:  # line 18:
        return total
    end))))

In [36]:
code_typed(sum1, (Int,))


Out[36]:
1-element Array{Any,1}:
 :($(Expr(:lambda, {:N}, {{:total,:#s246,:#s245,:#s244,:i,:_var0,:_var1},{{:N,Int64,0},{:total,Any,2},{:#s246,UnitRange{Int64},18},{:#s245,Int64,2},{:#s244,(Int64,Int64),18},{:i,Int64,18},{:_var0,Int64,18},{:_var1,Int64,18}},{}}, :(begin  # In[23], line 2:
        total = 0 # line 4:
        #s246 = $(Expr(:new, UnitRange{Int64}, 1, :(top(getfield)(Intrinsics,:select_value)(top(sle_int)(1,N::Int64)::Bool,N::Int64,top(box)(Int64,top(sub_int)(1,1))::Int64)::Int64)))::UnitRange{Int64}
        #s245 = top(getfield)(#s246::UnitRange{Int64},:start)::Int64
        unless top(box)(Bool,top(not_int)(#s245::Int64 === top(box)(Int64,top(add_int)(top(getfield)(#s246::UnitRange{Int64},:stop)::Int64,1))::Int64::Bool))::Bool goto 1
        2: 
        _var0 = #s245::Int64
        _var1 = top(box)(Int64,top(add_int)(#s245::Int64,1))::Int64
        i = _var0::Int64
        #s245 = _var1::Int64 # line 5:
        total = total::Union(Int64,Float64) + top(box)(Float64,top(div_float)(top(box)(Float64,top(sitofp)(Float64,i::Int64))::Float64,top(box)(Float64,top(sitofp)(Float64,2))::Float64))::Float64::Float64
        3: 
        unless top(box)(Bool,top(not_int)(top(box)(Bool,top(not_int)(#s245::Int64 === top(box)(Int64,top(add_int)(top(getfield)(#s246::UnitRange{Int64},:stop)::Int64,1))::Int64::Bool))::Bool))::Bool goto 2
        1: 
        0:  # line 8:
        return total::Union(Int64,Float64)
    end::Union(Int64,Float64)))))

In [37]:
code_typed(sum2, (Int,))


Out[37]:
1-element Array{Any,1}:
 :($(Expr(:lambda, {:N}, {{:total,:#s246,:#s245,:#s244,:i,:_var0,:_var1},{{:N,Int64,0},{:total,Float64,2},{:#s246,UnitRange{Int64},18},{:#s245,Int64,2},{:#s244,(Int64,Int64),18},{:i,Int64,18},{:_var0,Int64,18},{:_var1,Int64,18}},{}}, :(begin  # In[23], line 12:
        total = 0.0 # line 14:
        #s246 = $(Expr(:new, UnitRange{Int64}, 1, :(top(getfield)(Intrinsics,:select_value)(top(sle_int)(1,N::Int64)::Bool,N::Int64,top(box)(Int64,top(sub_int)(1,1))::Int64)::Int64)))::UnitRange{Int64}
        #s245 = top(getfield)(#s246::UnitRange{Int64},:start)::Int64
        unless top(box)(Bool,top(not_int)(#s245::Int64 === top(box)(Int64,top(add_int)(top(getfield)(#s246::UnitRange{Int64},:stop)::Int64,1))::Int64::Bool))::Bool goto 1
        2: 
        _var0 = #s245::Int64
        _var1 = top(box)(Int64,top(add_int)(#s245::Int64,1))::Int64
        i = _var0::Int64
        #s245 = _var1::Int64 # line 15:
        total = top(box)(Float64,top(add_float)(total::Float64,top(box)(Float64,top(div_float)(top(box)(Float64,top(sitofp)(Float64,i::Int64))::Float64,top(box)(Float64,top(sitofp)(Float64,2))::Float64))::Float64))::Float64
        3: 
        unless top(box)(Bool,top(not_int)(top(box)(Bool,top(not_int)(#s245::Int64 === top(box)(Int64,top(add_int)(top(getfield)(#s246::UnitRange{Int64},:stop)::Int64,1))::Int64::Bool))::Bool))::Bool goto 2
        1: 
        0:  # line 18:
        return total::Float64
    end::Float64))))

In [38]:
code_llvm(sum1, (Int, ))


define %jl_value_t* @"julia_sum1;19538"(i64) {
top:
  %1 = alloca [5 x %jl_value_t*], align 8
  %.sub = getelementptr inbounds [5 x %jl_value_t*]* %1, i64 0, i64 0
  %2 = getelementptr [5 x %jl_value_t*]* %1, i64 0, i64 2, !dbg !2590
  store %jl_value_t* inttoptr (i64 6 to %jl_value_t*), %jl_value_t** %.sub, align 8
  %3 = load %jl_value_t*** @jl_pgcstack, align 8, !dbg !2590
  %4 = getelementptr [5 x %jl_value_t*]* %1, i64 0, i64 1, !dbg !2590
  %.c = bitcast %jl_value_t** %3 to %jl_value_t*, !dbg !2590
  store %jl_value_t* %.c, %jl_value_t** %4, align 8, !dbg !2590
  store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8, !dbg !2590
  %5 = getelementptr [5 x %jl_value_t*]* %1, i64 0, i64 3
  store %jl_value_t* null, %jl_value_t** %5, align 8
  %6 = getelementptr [5 x %jl_value_t*]* %1, i64 0, i64 4
  store %jl_value_t* null, %jl_value_t** %6, align 8
  store %jl_value_t* inttoptr (i64 140474354759232 to %jl_value_t*), %jl_value_t** %2, align 8, !dbg !2591
  %7 = icmp sgt i64 %0, 0, !dbg !2592
  br i1 %7, label %L, label %L3, !dbg !2592

L:                                                ; preds = %top, %L
  %8 = phi %jl_value_t* [ %16, %L ], [ inttoptr (i64 140474354759232 to %jl_value_t*), %top ], !dbg !2592
  %"#s245.0" = phi i64 [ %9, %L ], [ 1, %top ]
  %9 = add i64 %"#s245.0", 1, !dbg !2592
  store %jl_value_t* %8, %jl_value_t** %5, align 8, !dbg !2593
  %10 = sitofp i64 %"#s245.0" to double, !dbg !2593
  %11 = fmul double %10, 5.000000e-01, !dbg !2593
  %12 = call %jl_value_t* @alloc_2w(), !dbg !2593
  %13 = getelementptr inbounds %jl_value_t* %12, i64 0, i32 0, !dbg !2593
  store %jl_value_t* inttoptr (i64 140474354684320 to %jl_value_t*), %jl_value_t** %13, align 8, !dbg !2593
  %14 = getelementptr inbounds %jl_value_t* %12, i64 1, i32 0, !dbg !2593
  %15 = bitcast %jl_value_t** %14 to double*, !dbg !2593
  store double %11, double* %15, align 8, !dbg !2593
  store %jl_value_t* %12, %jl_value_t** %6, align 8, !dbg !2593
  %16 = call %jl_value_t* @jl_apply_generic(%jl_value_t* inttoptr (i64 140474385387040 to %jl_value_t*), %jl_value_t** %5, i32 2), !dbg !2593
  store %jl_value_t* %16, %jl_value_t** %2, align 8, !dbg !2593
  %17 = icmp eq i64 %"#s245.0", %0, !dbg !2593
  br i1 %17, label %L3, label %L, !dbg !2593

L3:                                               ; preds = %L, %top
  %18 = phi %jl_value_t* [ inttoptr (i64 140474354759232 to %jl_value_t*), %top ], [ %16, %L ]
  %19 = load %jl_value_t** %4, align 8, !dbg !2594
  %20 = getelementptr inbounds %jl_value_t* %19, i64 0, i32 0, !dbg !2594
  store %jl_value_t** %20, %jl_value_t*** @jl_pgcstack, align 8, !dbg !2594
  ret %jl_value_t* %18, !dbg !2594
}

In [39]:
code_native(sum1, (Int,))


	.section	__TEXT,__text,regular,pure_instructions
Filename: In[23]
Source line: 2
	push	RBP
	mov	RBP, RSP
	push	R15
	push	R14
	push	R13
	push	R12
	push	RBX
	sub	RSP, 56
	mov	R12, RDI
	mov	QWORD PTR [RBP - 80], 6
Source line: 2
	movabs	RCX, 4463631920
	mov	RAX, QWORD PTR [RCX]
	mov	QWORD PTR [RBP - 72], RAX
	lea	RAX, QWORD PTR [RBP - 80]
	mov	QWORD PTR [RCX], RAX
	mov	QWORD PTR [RBP - 56], 0
	mov	QWORD PTR [RBP - 48], 0
	movabs	RAX, 140474354759232
Source line: 2
	mov	QWORD PTR [RBP - 64], RAX
	test	R12, R12
	jle	121
	mov	EBX, 1
Source line: 5
	movabs	R13, 4451150720
	movabs	R15, 140474354684320
	movabs	RCX, 4605169152
	vmovsd	XMM0, QWORD PTR [RCX]
	vmovsd	QWORD PTR [RBP - 88], XMM0
	movabs	R14, 4450807376
	mov	QWORD PTR [RBP - 56], RAX
	call	R13
	mov	QWORD PTR [RAX], R15
	vcvtsi2sd	XMM0, XMM0, RBX
	vmulsd	XMM0, XMM0, QWORD PTR [RBP - 88]
	vmovsd	QWORD PTR [RAX + 8], XMM0
	mov	QWORD PTR [RBP - 48], RAX
	movabs	RDI, 140474385387040
	lea	RSI, QWORD PTR [RBP - 56]
	mov	EDX, 2
	call	R14
Source line: 4
	inc	RBX
Source line: 5
	dec	R12
	mov	QWORD PTR [RBP - 64], RAX
	jne	-67
Source line: 8
	mov	RCX, QWORD PTR [RBP - 72]
Source line: 2
	movabs	RDX, 4463631920
Source line: 8
	mov	QWORD PTR [RDX], RCX
	add	RSP, 56
	pop	RBX
	pop	R12
	pop	R13
	pop	R14
	pop	R15
	pop	RBP
	ret

Profiling

Simple profiling of a function may be achieved using the @time macro

A detailed profile may be obtained using @profile.

A graphical view is available via the ProfileView.jl package.


In [10]:
@profile sum1(10000000)


Out[10]:
2.50000025e13

In [11]:
f(N) = sum1(N)


Out[11]:
f (generic function with 1 method)

In [12]:
@profile f(10000000)


Out[12]:
2.50000025e13

In [ ]: