In [2]:
addprocs(4)
Out[2]:
Each process has a unique id
(integer)
You can see that we added processes with id from 2 to 5
You can also add processes on remote machines.
See the docstring for addprocs
for more info
To get the number of active processes use the nprocs
function
In [3]:
nprocs()
Out[3]:
When you have n
active processes, typically n-1
will be used for computation
The first process (with id 1) is used to direct the computation
Other processes are called workers:
In [4]:
workers()
Out[4]:
In [7]:
# TODO find a more compelling econ example
args = (rand(200, 200), rand(400, 400), rand(200, 200), rand(400, 400))
# first a serial version
@time for X in args
svd(X)
end
In [8]:
# now in parallel
@time pmap(svd, args);
We have 4 workers and had 4 arrays, why didn't get get a 4x speedup?
Notice that 2 arrays were 200x200
and two were 400x400
(computational load is unbalanced)
Julia gave each worker one array, but some had less work than others so they finished first
This means some processes were inactive during the total computation time
Also, there is (small) overhead in passing data to the worker and passing the result back to process 1
There are subtleties to using
@parallel
We will cover only basic examples here
Consult the documentation for more information
There are two possible syntaxes for @parallel
:
The first is
@parallel for ...
end
and simply executes the body of the for loop in parallel
The second is
@parallel (f) for ...
end
It the same, but also applies a reduction function f
to the result of each iteration
The result of each iteration is the result of the last statement in the loop's body
The function f
should take two elements and output one
Let's see this in practice
In [10]:
@parallel for i = 1:10
@show randn()
end;
In [21]:
# now with a (+)
total = @parallel (+) for i=1:10
@show randn()
end
println(total)
One issue with @parallel
is that all variables used in the loop are copied to the working process, but not copied back to process 1
That means code like this will not work as you might expect:
In [22]:
a = zeros(10)
@parallel for i=1:10
a[i] = i
end
a
Out[22]:
For that to work we need an array that provides shared memory access across processes...
This comes with a number of benefits over Array
for parallel computing.
Some of them are:
@parallel
loopsLet's see an example
In [30]:
a = SharedArray(Int, 1000)
@parallel for i in eachindex(a)
a[i] = i
end
println(a[end-10:end])
I know what you're thinking "wait, you told me that this example should work"
Well it did...
In [31]:
println(a[end-10:end])
... but there's a caveat
An @parallel
loop with a SharedArray
will run asynchronously
This means the instructions will be sent to the workers, and then the main process will just continue without waiting for workers to finish
To fix this problem we need to tell Julia to @sync
all computations that happen in the loop:
In [32]:
b = SharedArray(Int, 1000)
@sync @parallel for i in eachindex(b)
b[i] = i
end
println(b[end-10:end])
Because SharedArray
data is available to multiple processes, you need to be careful about how and when it is accessed
For more details see the SharedArray
docs
We've only scratched the surface of Julia's parallel computing capabilities.
For more information see these references: