Introduction to Julia

Outline

Julia as a language for numerical Computation
Functional aspects
Types
Multiple Dispatch
Parallelism
Extensibility

A language for numerical computation

Julia's syntax is very similar to langages as Matlab, Python or Scilab, so switching to Julia is fast

do not require vectorized code to run fast using a LLVM-JIT compiler



In [1]:

    
using LinearAlgebra, Test
function f1(n::Int)
    A=zeros(Int64, n,n);
    for j=1:n ## Fortran Loop Style  !!!!
        for i=1:n
            A[i,j] = i^j
        end
    end
    return A
end    
@elapsed f1(10^4)









    Out[1]:





1.377285742



In [2]:

    
function f2(n::Int) 
   A = collect(1:n).^collect(1:n)'
   return A
end
#@test f1(3) == f2(3)
@elapsed f2(10^4)









    Out[2]:





1.445059204

structs have a objects semantic value (i.e. references)



In [3]:

    
a = [1,2]
b = a 
a[1] = 4
b









    Out[3]:





2-element Array{Int64,1}:
 4
 2

indices start to 1 and finish to end and use brackets [,] for indexing



In [4]:

    
a = rand(2,2)
a[:, end] .= 1
a









    Out[4]:





2×2 Array{Float64,2}:
 0.890128  1.0
 0.654133  1.0

it supports broadcasting, ie. (1,n) and (m, n) → (m,n) and (m,n)



In [5]:

    
using Statistics
a=rand(3,3)
μ=mean(a, dims=1)
mean(a.-μ, dims=1)









    Out[5]:





1×3 Array{Float64,2}:
 1.85037e-17  1.11022e-16  1.85037e-17

1D arrays support



In [6]:

    
typeof(rand(3,1))









    Out[6]:





Array{Float64,2}



In [7]:

    
typeof(rand(3))









    Out[7]:





Array{Float64,1}

Unicode symbol for notation and some operations



In [8]:

    
√Complex(-1)









    Out[8]:





0.0 + 1.0im

Functionals aspects

list comprehensions



In [9]:

    
hilbert(n::Int) = reshape([ 1//(i+j-1) for i=1:n for j=1:n],n,n)
H₃=hilbert(3) 
det(H₃)









    Out[9]:





1//2160

Anonymous functions : (x -> x*x)(2)
supports for map, reduce, filter functions



In [10]:

    
# seeking integers between squares and cubes
supSquares = filter(x->abs(√(x-1)-round(√(x-1))) < 1e-12, 1:100)
infCubes = filter(x->abs(∛(x+1)-round(∛(x+1))) < 1e-12, 1:100)
ξ = intersect(supSquares, infCubes)









    Out[10]:





1-element Array{Int64,1}:
 26

functions support variadic arguments (using tuples or names)
functions are not supposed to modify their arguments, otherwise they follow the ! convention like sort!



In [11]:

    
s=[4,1,3]; println("sort(s)=",sort(s));println("s=",s); sort!(s); println("s=",s)









    



sort(s)=[1, 3, 4]
s=[4, 1, 3]
s=[1, 3, 4]

support metaprogramming : like in Lisp "Code is data and data is code"



In [12]:

    
e = :(2+2); println(typeof(e)); println(e.args); println(eval(e))









    



Expr
Any[:+, 2, 2]
4

Types

Julia has a hierarchy of types defined by a graph with Any as a root



In [13]:

    
subtypes(Real)









    Out[13]:





4-element Array{Any,1}:
 AbstractFloat     
 AbstractIrrational
 Integer           
 Rational



In [83]:

    
println("supertype of Real → ", supertype(Real))
println("supertype of Number → ",supertype(Number))









    



supertype of Real → Number
supertype of Number → Any

x<:y is the operator for the relation "x is a subtype of y"



In [14]:

    
Int64 <: Real









    Out[14]:





true

some types are abstract (e.g Real) → cannot be instantiate



In [15]:

    
typeof(Real(1.2))









    Out[15]:





Float64

some types are concrete (e.g. Int64) → they have not subtype except Union{}



In [16]:

    
subtypes(Int64)









    Out[16]:





0-element Array{Type,1}



In [17]:

    
Union{}<:Int64









    Out[17]:





true

To extend basic types, Julia supports

composite types trough struct and mutable struct



In [85]:

    
struct zutos
    x::Int
    a::Array{Int64,1}
end 
w = zutos(2, [1,2])









    Out[85]:





zutos(2, [1, 2])



In [86]:

    
w.x = 4









    



setfield! immutable struct of type zutos cannot be changed

Stacktrace:
 [1] setproperty!(::zutos, ::Symbol, ::Int64) at ./Base.jl:21
 [2] top-level scope at In[86]:1



In [87]:

    
append!(w.a,3)'









    Out[87]:





1×3 Adjoint{Int64,Array{Int64,1}}:
 1  2  3

tuple types



In [21]:

    
typeof((3,"haha"))









    Out[21]:





Tuple{Int64,String}



In [22]:

    
Tuple{Int64,String} <: Tuple{Real,String}









    Out[22]:





true

union types



In [23]:

    
Real <: Union{String, Real}









    Out[23]:





true

parametric types



In [24]:

    
struct tableau{T<:Number}
    n::Int64
    a::Ptr{T}
end
a = tableau{Int64}(10, 0)









    Out[24]:





tableau{Int64}(10, Ptr{Int64} @0x0000000000000000)

☢ For efficiency, parametric types are invariants, except Tuples ☢



In [25]:

    
Array{Int64} <: Array{Real}









    Out[25]:





false

Multiple Dispatch

When Julia executes a function it selects a method respect to types of all arguments : Thus, we can define function piecewisely respectivly to the whole of its arguments

no automatic or magic conversions : for operators arguments are promoted to a common type (user-definable) and use the specific implementation



In [26]:

    
@code_lowered 1+2.









    Out[26]:





CodeInfo(
1 ─ %1 = Base.promote(x, y)
│   %2 = Core._apply(Base.:+, %1)
└──      return %2
)

let us define f



In [27]:

    
f(x::Float64,y::Float64) = 2x + y
f(x::Int,y::Int) = 2x + y
println(f(2.,3.));
println(f(2,3));
println(f(2,3.));









    



7.0
7






    



MethodError: no method matching f(::Int64, ::Float64)
Closest candidates are:
  f(::Int64, !Matched::Int64) at In[27]:2
  f(!Matched::Float64, ::Float64) at In[27]:1

Stacktrace:
 [1] top-level scope at In[27]:5

but if we define f



In [89]:

    
f(x::Real, y::Real) = 2x + y
    f(2.0, 3)









    Out[89]:





7.0



In [29]:

    
methods(f)









    Out[29]:




# 3 methods for generic function f: f(x::Int64, y::Int64) in Main at In[27]:2
  f(x::Float64, y::Float64) in Main at In[27]:1
  f(x::Real, y::Real) in Main at In[28]:1

Parallelism

Julia support parallelism trough differents ways

native distributive parallelism with one master and workers
native shared parallelism → macros @threads, @spawn, @wait
trough MPI.jl could support distributed parallelism
could support GPU trough packages (not treated)

Native distributed parallelism

low level routines



In [30]:

    
using Distributed
addprocs(3)
nprocs()









    Out[30]:





4



In [31]:

    
res1 = @spawnat 2 sum(rand(1000,1000).^2)
res2= @spawnat 3 sum(rand(1000,1000).^2)
fetch(res1)+fetch(res2)









    Out[31]:





666097.769417492

high level routines



In [32]:

    
@everywhere function f3(n::Int)
res = @distributed (+) for i=1:n
      sin(i*π/n)
end
return res
end
@time f3(200000000)









    



  2.105860 seconds (1.31 M allocations: 66.216 MiB, 0.63% gc time)






    Out[32]:





1.273239544735062e8



In [33]:

    
function f4(n::Int)
res =  0
for i=1:n
      res += sin(i*π/n)
end
return res
end
@time f4(200000000)









    



  3.601152 seconds (23.80 k allocations: 1.254 MiB)






    Out[33]:





1.2732395447358306e8

example with pmap



In [34]:

    
@everywhere using Statistics 
@everywhere function cost_mean(x)
   s = zero(eltype(x))
   for i = 1:100
      s += mean(log(sin(exp(xi))) for xi in x)
   end
   return s
end
X = [rand(100_000) for j=1:20];
t1 = @elapsed map(cost_mean, X)
t2 = @elapsed pmap(cost_mean, X)
println("$t1, $t2")









    



8.996415173, 4.107689785

threads support

Threads supports is still experimental but works : ⚠️ think to use export JULIA_NUM_THREADS=4 before launching Julia with Julia-1.3, new task system was develop



In [35]:

    
using Base.Threads 
println(Base.Threads.nthreads())

using a example taken from https://julialang.org/blog/2019/07/multithreading



In [47]:

    
import Base.Threads.@spawn

# sort the elements of `v` in place, from indices `lo` to `hi` inclusive
function psort!(v, lo::Int=1, hi::Int=length(v))
    if lo >= hi                       # 1 or 0 elements; nothing to do
        return v
    end
    if hi - lo < 100000               # below some cutoff, run in serial
        sort!(view(v, lo:hi), alg = MergeSort)
        return v
    end

    mid = (lo+hi)>>>1                 # find the midpoint

    half = @spawn psort!(v, lo, mid)  # task to sort the lower half; will run
    psort!(v, mid+1, hi)              # in parallel with the current call sorting
                                      # the upper half
    wait(half)                        # wait for the lower half to finish

    temp = v[lo:mid]                  # workspace for merging

    i, k, j = 1, lo, mid+1            # merge the two sorted sub-arrays
    @inbounds while k < j <= hi
        if v[j] < temp[i]
            v[k] = v[j]
            j += 1
        else
            v[k] = temp[i]
            i += 1
        end
        k += 1
    end
    @inbounds while k < j
        v[k] = temp[i]
        k += 1
        i += 1
    end

    return v
end









    Out[47]:





psort! (generic function with 3 methods)



In [48]:

    
a = rand(20000000);
b = copy(a); @time sort!(b, alg = MergeSort);   # single-threaded









    



  2.136703 seconds (111.25 k allocations: 81.999 MiB)



In [49]:

    
b = copy(a); @time sort!(b, alg = MergeSort);









    



  2.054423 seconds (11 allocations: 76.294 MiB)



In [50]:

    
b = copy(a); @time psort!(b);    # two threads









    



  1.002210 seconds (324.72 k allocations: 702.344 MiB, 5.35% gc time)



In [51]:

    
b = copy(a); @time psort!(b);









    



  0.841734 seconds (3.82 k allocations: 686.936 MiB)



In [38]:

    
using Base.Threads

f(x) = sin(x) + cos(x)

function serial(n)
    s = 0.0
    for x = 1:n
        s += f(x)
    end
    return s
end

@elapsed serial(10^7)









    Out[38]:





0.689574566



In [39]:

    
function threads(n)
    res_vec = zeros(nthreads())
    @threads for i ∈ 1:nthreads()
        res_vec[i] = local_sum(threadid(), n, nthreads())
    end
    sum(res_vec)
end

function local_sum(id, n, nthread)
    out = 0.0
    l = 1 + div(n * (id-1), nthread)
    u = div(n * id, nthread)
    for x ∈ l:u
        out += f(x)
    end
    out
end

@elapsed threads(10^7)









    Out[39]:





0.239562285

Extensibility

Julia can be extended trough its trough packages Pkg.add("MyPackage") and using MyPackage
- you can use PyCall to interact with Python → PyPlot package uses this mechanism
- use MPI or GPU with packages
you can call functions from C or Fortran using ccall



In [52]:

    
io = open("skel.c","w")
write(io, "int ajoute2(int x) { return x+2; }")
close(io)
run(`gcc -o ajoute2.so --shared skel.c`);
w = ccall((:ajoute2, "./ajoute2.so"), Int32, (Int32,), 12)
run(`rm ajoute2.so skel.c`)
println("w = $w")



In [70]:

    
# example with matplotlib
using PyCall
using PyPlot
plt = pyimport("matplotlib.pyplot")
x = range(0;stop=2*pi,length=1000); y = sin.(3*x + 4*cos.(2*x));
plt.plot(x, y, color="red", linewidth=2.0, linestyle="--")
plt.show()

calling a fortran MPI code trough Julia for optimizing

we want to minimize

$$ f(x_1,...,x_{\#procs}) = \sum_{k=1}^{\#procs} \sum_{i=1}^{p_k}|x^k_i-c^k_i|^2 $$

we use the following Julia code

...
function simu(x::Array{Float64,1})
    f = Ref{Float64}(0.)
    c = cos.(1:n)[slice[1]:slice[2]]
    df = zeros(slice[2]-slice[1]+1)
    @assert  size(c,1) == size(x, 1)
    ccall((:compute_error, "./libpar_error.so"), 
          Cvoid, (Ref{Int32}, Ptr{Float64}, Ptr{Float64}, Ref{Float64}, Ptr{Float64}), size(x, 1), x, c, f, df)
    return f[], df
end

function cost(x::Array{Float64,1})
    f,G = simu(x)
    return f
end

function grad(x::Array{Float64,1})
    f,G = simu(x)
    return G
end 

function main()
    MPI.Init()
    comm = MPI.COMM_WORLD
    nb_procs = MPI.Comm_size(comm)
    rank = MPI.Comm_rank(comm)
    global n, slice
    n = parse(Int32, ARGS[1])
    x = zeros(n)
    slice = compute_slice(n, Int32(nb_procs), Int32(rank))
    x[:] = 1:n
    x_loc = x[slice[1]:slice[2]]
    f,df = simu(x_loc)

    res = optimize(cost, grad, x_loc, BFGS(); inplace =false)
    x_min = Optim.minimizer(res)
    x_glob = MPI.Gatherv(x_min, Cint[5,5], 0, comm)
    if (rank == 0)
        println(" |sol - cos(1:$n)| = ", norm(x_glob-cos.(1:n)))
    end
    MPI.Finalize()
end
...

and the evaluation of cost is done by a fortran code

module par_error
    use mpi
    use iso_c_binding, only: c_int32_t, c_double
    public:: compute_error
contains

    subroutine compute_error(n, x, c, f, df) bind(C, name="compute_error")
        implicit none
        integer(c_int32_t), intent(in) :: n  ! size of the array
        real(c_double), intent(in) :: x(n), c(n) !
        real(c_double), intent(out) :: f
        real(c_double), intent(out) :: df(n)

        real(c_double) ::  f_loc
        integer(c_int32_t) :: code

        f = 0.d0
        ! computing objective
        f_loc = 0.5d0 * sum( (x - c)**2 )
        call MPI_ALLREDUCE( f_loc, f, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD, code )

        ! computing gradient and ...
        df = x - c
    end subroutine compute_error

end module par_error



In [80]:

    
a = read(`env LD_PRELOAD="/home/fux/sources/nahasketa/julia/interfaces/fortran/withMPI/libpar_error.so" mpirun -np 2 julia /home/fux/sources/nahasketa/julia/interfaces/fortran/withMPI/test_optim.jl  10`, String)









    Out[80]:





"cost function at x0 = 206.1740675143132\n |sol - cos(1:10)| = 9.961119953842058e-16\n"

to Conclude

julia is a fast language for doing numerical computations : good replacement for Matlab, R, or Python
since 2012, the community (numericians and computer scientists) is growing
In August 2018, the version 1.0 was published
have a look at https://julialang.org/