Parallel programming in Julia

Chase Coleman & Spencer Lyon

3-4-16

Basics

Julia has built in support for parallel programming

To add more computing processes use the addprocs function


In [2]:
addprocs(4)


Out[2]:
4-element Array{Int64,1}:
 2
 3
 4
 5

Each process has a unique id (integer)

You can see that we added processes with id from 2 to 5

You can also add processes on remote machines.

See the docstring for addprocs for more info

To get the number of active processes use the nprocs function


In [3]:
nprocs()


Out[3]:
5

When you have n active processes, typically n-1 will be used for computation

The first process (with id 1) is used to direct the computation

Other processes are called workers:


In [4]:
workers()


Out[4]:
4-element Array{Int64,1}:
 2
 3
 4
 5

pmap

One of the easiest ways to get started parallel programming in Julia is the pmap function

In its simplest form pmap takes two arguments: a function and a collection (array or tuple)

The function is applied in parallel to each item in the collection


In [7]:
# TODO find a more compelling econ example

args = (rand(200, 200), rand(400, 400), rand(200, 200), rand(400, 400))

# first a serial version
@time for X in args
    svd(X)
end


  0.201885 seconds (98 allocations: 21.513 MB, 1.02% gc time)

In [8]:
# now in parallel
@time pmap(svd, args);


  0.105647 seconds (1.28 k allocations: 6.189 MB)

We have 4 workers and had 4 arrays, why didn't get get a 4x speedup?

Notice that 2 arrays were 200x200 and two were 400x400 (computational load is unbalanced)

Julia gave each worker one array, but some had less work than others so they finished first

This means some processes were inactive during the total computation time

Also, there is (small) overhead in passing data to the worker and passing the result back to process 1

@parallel loops

Julia also has the ability to make a for loop run in parallel

To do this use the @parallel macro

There are subtleties to using @parallel

We will cover only basic examples here

Consult the documentation for more information

There are two possible syntaxes for @parallel:

The first is

@parallel for ...
end

and simply executes the body of the for loop in parallel

The second is

@parallel (f) for ...
end

It the same, but also applies a reduction function f to the result of each iteration

The result of each iteration is the result of the last statement in the loop's body

The function f should take two elements and output one

Let's see this in practice


In [10]:
@parallel for i = 1:10
    @show randn()
end;


	From worker 3:	randn() = -1.4823226833572487
	From worker 3:	randn() = 0.008485042894568029
	From worker 2:	randn() = -1.4372949820146432
	From worker 2:	randn() = 0.14678523736405727
	From worker 2:	randn() = -1.293961883026427
	From worker 5:	randn() = -0.39493431470157886
	From worker 5:	randn() = 0.2494670221840125
	From worker 4:	randn() = -0.6202425665423612
	From worker 4:	randn() = -1.853569956218026
	From worker 3:	randn() = 1.5581361796835485

In [21]:
# now with a (+)
total = @parallel (+) for i=1:10
    @show randn()
end
println(total)


	From worker 2:	randn() = -1.1271781195591901
	From worker 2:	randn() = 0.9025559215992444
	From worker 3:	randn() = -0.912358216201542
	From worker 2:	randn() = 0.009225014074008112
	From worker 3:	randn() = 0.8497725800879736
	From worker 3:	randn() = -1.005123841699878
	From worker 5:	randn() = -0.3685509736298961
	From worker 5:	randn() = -1.637375058113114
	From worker 4:	randn() = -1.8505401231390373
	From worker 4:	randn() = -0.7754764372191437
-5.915049253800575

One issue with @parallel is that all variables used in the loop are copied to the working process, but not copied back to process 1

That means code like this will not work as you might expect:


In [22]:
a = zeros(10)
@parallel for i=1:10
    a[i] = i
end
a


Out[22]:
10-element Array{Float64,1}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

For that to work we need an array that provides shared memory access across processes...

SharedArray

A SharedArray is an array whose memory can be shared across all processes on the same machine

This comes with a number of benefits over Array for parallel computing.

Some of them are:

  • Save on the overhead of passing arrays to worker processes
  • Update arrays in a predictable way from @parallel loops

Let's see an example


In [30]:
a = SharedArray(Int, 1000)
@parallel for i in eachindex(a)
    a[i] = i
end
println(a[end-10:end])


[0,0,0,0,0,0,0,0,0,0,0]

I know what you're thinking "wait, you told me that this example should work"

Well it did...


In [31]:
println(a[end-10:end])


[990,991,992,993,994,995,996,997,998,999,1000]

... but there's a caveat

An @parallel loop with a SharedArray will run asynchronously

This means the instructions will be sent to the workers, and then the main process will just continue without waiting for workers to finish

To fix this problem we need to tell Julia to @sync all computations that happen in the loop:


In [32]:
b = SharedArray(Int, 1000)
@sync @parallel for i in eachindex(b)
    b[i] = i
end
println(b[end-10:end])


[990,991,992,993,994,995,996,997,998,999,1000]

Because SharedArray data is available to multiple processes, you need to be careful about how and when it is accessed

For more details see the SharedArray docs

More info

We've only scratched the surface of Julia's parallel computing capabilities.

For more information see these references: