INTRODUCTION TO JULIA

Warning

Setting: Phd seminar in Prague 2011. MK talks about FEniCS. Wrongly, people fall asleep. FEniCS is awesome.
Setting: Phd seminar in Prague 2017. MK talks about Julia. ???. Julia is awesome.

Setup

Ubuntu package from the repository

sudo add-apt-repository ppa:staticfloat/juliareleases
sudo add-apt-repository ppa:staticfloat/julia-deps
sudo apt-get update
sudo apt-get install julia

Download binaries (all platforms)

Build from source code

git clone https://github.com/JuliaLang/julia.git
cd julia
git checkout release-0.5  # unless you want the latest master
make -j 4

Try Julia in your browser via JuliaBox. No installation but Google account required

First steps inside the notebook



In [ ]:

    
# Note that (unlike numpy) the vectors are column vectors, not row vectors
A = [1, 2, 3, 4]



In [ ]:

    
# 1-based indexing as if matlab
first(A) == A[1]



In [ ]:

    
# Some other matlab influence
A.^2   # not A**2



In [ ]:

    
# Just showing off unicode
2 ∈ A



In [ ]:

    
# Checking subsets
b = Set([2, 3])
a = Set(A)

Same as in Python we can can if b is a subset of a as b < a. But if you got used on the beuty of $\in$ you might find it hard to settle for <. So let's define our first function



In [ ]:

    
⊂(a, b) = a < b



In [ ]:

    
# Testdrive
b < a && b ⊂ a

What is Julia ?

Julia: A fresh approach to numerical computing, see paper and thesis
High level, dynamic, (dynamically) typed, general purpose language
Free and open-source. MIT licence
Went viral on Valentine's day 2014
Some features we will see in action today
- type inference
- JIT compiler
- rich typestem
- multiple dispatch
- zero overhead C/Fortran calls
- metaprogramming
An important feature missing today is built in parallelism (distributed memory)
Version 1.0 later this year

Why do we need yet another language?

Short answer - to address the two language paradigm

Express the idea in high level language(Python) - setup prototype quickly. Performance far from production code
Production code then written in low level language (C/C++)
Write performance critical parts of code in low level language and glue together (Numba, SWIG)
- FEniCS uses Python to generate C++ code (see poisson.ufl, poisson.h) that is compiled and wrapped for Python
- Diako/Mikael prototype in Python/NumPy and then write the same code in Cython
To extend NumPy/SciPy you (usually) have to write C

Julia's approach

Designed to be easy to write with performance close to C
If that is true you need one language to solve your problem
It seems to be true: benchmarks from the Julia website

| F|Julia |Python |R |Matlab |Octave |Mathm |JS |Go |LuaJIT |Java --------------|------|------|-------|-------|-------|-----------|-------|-------|-------|-------|---- fib |0.70 |2.11 |77.76 |533.52 |26.89 |9324.35 |118.53 |3.36 |1.86 |1.71 |1.21 parse_int |5.05 |1.45 |17.02 |45.73 |802.52 |9581.44 |15.02 |6.06 |1.20 |5.77 |3.35 quicksort |1.31 |1.15 |32.89 |264.54 |4.92 |1866.01 |43.23 |2.70 |1.29 |2.03 |2.60 mandel |0.81 |0.79 |15.32 |53.16 |7.58 |451.81 |5.13 |0.66 |1.11 |0.67 |1.35 pi_sum |1.00 |1.00 |21.99 |9.56 |1.00 |299.31 |1.69 |1.01 |1.00 |1.00 |1.00 rand_mat_stat |1.45 |1.66 |17.93 |14.56 |14.52 |30.93 |5.95 |2.30 |2.96 |3.27 |3.92 rand_mat_mul |3.48 |1.02 |1.14 |1.57 |1.12 |1.12 |1.30 |15.07 |1.42 |1.16 |2.36

Some benchmark problems of our own

Project Euler problem 14.

The following iterative sequence is defined for the set of positive integers:

n → n/2 (n is even)
n → 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following sequence: 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1. Which starting number, under one million, produces the longest chain? Note that whether the chain terminates for arbitrary $n$ is an unproved conjecture.

To solve PE problem's the solution must be obtained in less then 60s. Honor code is to beat 1s.
Note that the code presented here is a bruteforce solution
The answer it produces is correct



In [ ]:

    
"""Compute the Collatz chain for number n."""
function collatz_chain(n)
    k = 1
    while n > 1
        n = isodd(n) ? 3n+1 : n >> 1
        k += 1
        # println(n)
    end
    k
end

"""Which of the number [1, stop) has the longest Collatz chain."""
function solve_euler(stop)
    n, N, N_max = 1, 0, 0  
    while n < stop
        value = collatz_chain(n)
        if value > N_max
            N = n
            N_max = value
        end
        n += 1
    end
    (N, N_max)
end



In [ ]:

    
# Let's time it
N = 1000000
t0 = tic()
answer = solve_euler(N)
t1 = toc();
answer, t1



In [ ]:

    
@time solve_euler(N)

Julia is done in about 0.5s, Python need ~24s. So with Julia I am onto the next problem. With Python it is back to the drawing board.

Julia fractal

It is only approprate that we explore the Julia set
Explore points $z$ in the complex plain such that $z_{n+1}=z^2_{n}+c$ is bounded for any n



In [ ]:

    
# Julia code is a translation of the definition
"""Color Julia set. Code from https://www.youtube.com/watch?v=PsjANO10KgM"""
function julia(z, c)
    for n in 1:80
        if abs2(z) > 4
            return n-1
        end
        z = z*z + c
    end
    return 80
end

"""Color Julia set. Prealoc version"""
function julia(x, y, c)
    J = zeros(length(x), length(y))
    index = 0
    for r in y, i in x
        index += 1
        J[index] = julia(complex(r, i), c)
    end
    J
end



In [ ]:

    
# We explore a few points
cs = (complex(-0.06, 0.67), complex(0.279, 0), complex(-0.4, 0.6), complex(0.285, 0.01))

x = collect(1:-0.002:-1);
y = collect(-1.5:0.002:1.5);

Js = []
# Evaluate fractal generation
t0 = tic()
for c in cs
    # No prealoc 
    push!(Js, [julia(complex(r, i), c) for i in x, r in y]);
    # Prealoc out
    # push!(Js, julia(x, y, c));
end
t1 = toc()

@printf("Generated in %.4f s\n", t1);
println("Image size $(size(Js[1], 1))x$(size(Js[1], 2))")



In [ ]:

    
# Here we use python interopt(more on that later) to plot the fractals
using PyPlot

for J in Js
    figure(figsize=(12, 8))
    imshow(J, cmap="viridis", extent=[-1.5, 1.5, -1, 1])
end
show()

Python requires close to 5s to generate the fractals. With either of the Julia versions the execution time is close to 1s. Note that at this point we have two julia methods. Which one gets called when the code is run is our first encounter with multiple dispatch.

Writing code like Python

Illustrate (not unrealistic) workflow by a made example of postprocessing
No type declarations



In [ ]:

    
# Reading from file. How many words in the MIT licence?
LICENCE = open("../LICENSE")
content = lowercase(readstring(LICENCE))
words = split(content, r"\W", keep=false)
close(LICENCE)
length(words)



In [ ]:

    
# We could have done this with bash.
# Shell interoperatbility `(backtick)
cmd = pipeline(`cat ../LICENSE`, `wc -w`)
run(cmd)  # 172 vs 171, less than 1% error is okay:)



In [ ]:

    
# Are all these words unique?
unique = Dict()
for word in words
    unique[word] = get(unique, word, 0) + 1
end
length(keys(unique))



In [ ]:

    
# What are the most common?
for (_, (k, v)) in zip(1:5, sort(collect(unique), by=last, rev=true))
    println("$(k) $(v)")
end

Note that collect unique makes a list by iterating over the dictionary. This iterations is over KEY-VALUE pairs. This is unlike Python. Let's see what else is different.



In [ ]:

    
# iteration is over (key, values) pairs
for (key, value) in zip(1:4, collect(unique))
    println("$key => $value")  # "%.4g" % (...)
end



In [ ]:

    
# : is => in the dictionary construction
dict = Dict(1 => "one", 2 => "two")



In [ ]:

    
# in checks for (key, value) pair
haskey(dict, 1), (1 => "en") ∈ dict



In [ ]:

    
# String concatenation is done with *. With + the operation should be commutative, which is not the case here
"Julia " * " Language"



In [ ]:

    
# A direct consequence of this is ...
"Julia"^5



In [ ]:

    
# / is for floating point division. Integer division is done by div
/(7, 4), div(7, 4), 7÷4



In [ ]:

    
# Indexing starts at one. The last element in range is included. 0..last not included has same length as 1..last included
a = [1, 2, 3]
try
    a[0]
catch BoundsError
    println("No no, a[1]=", a[1], " ", a[end] == a[3] == 3)
end

collect(1:10)[10] == 10



In [ ]:

    
# Convenience
x = 4
y = 2x  # no *
y == 8



In [ ]:

    
# More convenience. Save one "for"
[(i, j) for i in 1:4, j in "abcd"]



In [ ]:

    
# Contrast this with Python
A = reshape(collect(1:25), (5, 5))



In [ ]:

    
# In numpy you get to iterate over rows of matrix (for vectorization). In Julia this is not default. 
entries = [e for e in A]  #join(map(string, collect(A)), ", ")
# Moverover, columns are contiguous so accessing rows is not efficient.
# Finaly you iterate over values not pointers
for a in A
    a += 1
end
A



In [ ]:

    
for i in eachindex(A)
    A[i] += 1
end
A



In [ ]:

    
# Column majored linear indexing
A[8] == 8



In [ ]:

    
A = [1 2; 3 4] # 2x2 
B = [1, 2]     # 2 x 1
C = [1 2]      # 1 x 2

Finally a quick look at functions



In [ ]:

    
# One lined named function definition
f(x, y) = 2x + 4y
f(1, 0) == 2



In [ ]:

    
# Implicit return statements
function f(name)
    if isequal(name, "Miro")
        return "slav"
    end
    "!"
end

(f("Miro"), f("You")) == ("slav", "!")



In [ ]:

    
# Variable positional arguments
mysum(args...) = join(map(string, args), "+")
mysum(10, 20)



In [ ]:

    
foo(x; op=+)=op(x, 1)



In [ ]:

    
[foo(1), foo(2; op=-), foo(4; op=*)]



In [ ]:

    
# Lambdas: remove vowels
vowels = Set(['a', 'e', 'i', 'o', 'u', 'y'])
map(word -> filter(l -> (l ∉ vowels), word), keys(unique))



In [ ]:

    
# Multiline lambdas
map(word -> begin
    w = Set(word)
    cc = length(setdiff(w, vowels))
    vc = length(w) - cc
    (word, (vc, cc))
    end, keys(unique))

Interoperability with Python

The whole Python ecosystem can be reached via Julia package PyCall (Pkg.add("PyCall"))



In [ ]:

    
using PyCall



In [ ]:

    
# Let's load numpy and have it compute for us the eigenvalues of some matrix
@pyimport numpy.linalg as npla
A = diagm([1, 2, 3, 4])          # Julia matrix
eigv, eigw = npla.eig(A)         # Passed to python no copying
eigv



In [ ]:

    
# Let's have numpy make the matrix and then Julia takes over for the actual work
@pyimport numpy as np
Apy = np.diag([1, 2, 3, 4])  # Note that this is Julia array passed to Numpy which comes back as a Julia matrix
eigvals(Apy)



In [ ]:

    
using PyPlot

# Julia has native packages for plotting, but why give up matplotlib
x = linspace(-1, 1, 100)
y = sin(2π*x)
figure()
plot(x, y)
xlabel("\$x\$")  # Note that dollars have to be escaped for they have special role in Julia's string (interpolation)
show()

Some of Julia's magic (reasons for speed)

Warning: I do not know/understand the whole story. It seems magical. That said, let me quote the Julia manual:

Clever application of sufficiently advanced technology can be indistiguishable from magic (A. C. Clarke)

Compiler



In [ ]:

    
#Your code is parsed, type annotated(types can be inferred). 
simplef(x, y) = x+y
@code_warntype simplef(1, 2)   # Note x, y have types here



In [ ]:

    
# When f is called with specific values JIT compiles the function. With type annotation the compiler can reason about
# the code and generate efficient code.
@code_llvm simplef(1, 2)        # Note @julia_simplef_FOO(i64, i64)



In [ ]:

    
# If called with values of same type. We get a cached function
@code_llvm simplef(1, 2)        # Note @julia_simplef_FOO(i64, i64)



In [ ]:

    
# Different value types will invoke JIT again
@code_llvm simplef(1, 2.)        # @julia_simplef_BAR(i64, double)

Type system

All values have types. You can type annotate your variables. This is not really for performance. Rather, you declare your intensions better.



In [ ]:

    
map(typeof, (1, 2., [1, 2.]))



In [ ]:

    
# This reads as is lhs an instance of rhs
1::Number, 1::Real, 1::Int



In [ ]:

    
# But 
try
    1::Int8
catch TypeError
    println("Type of 1 is $(typeof(1)) and that is not a subtype of Int8")
end

The type hierarchy is appearing.



In [ ]:

    
depth = 0
function printtypes(T)
    global depth
    if isleaftype(T)
        println("-"^(2*depth), "$T")
    else
        println("-"^depth, "$T")
        depth += 1
        map(printtypes, subtypes(T))
        depth -= 1
    end
end

# printtypes(Any);  # Loong wait. Listen to the fan - silly ideas make sounds

In the type tree the nodes are abstract types and the leafs are concrete types.
The abstract types express concepts and there are no instances of it but singletons.
The concete types can have instances but cannot be subtyped.

Multiple dispatch

If types are specified in the functions signature the compiler (first compiles and then) calls appropiate function based on the type of all the arguments. That is, unlike in OO programing, it is not the first type that owns the method. There is nothing special about the first argument; think Int.__add__(self, other) in Python.



In [ ]:

    
# + is a function +(x, y) = ... . Which one is called based on args
@which +(1, 3)



In [ ]:

    
@which 0x1+0x3



In [ ]:

    
@which 1+2.

New types can be defined easily. In the example we'd like to represent functions given by their sine series.



In [ ]:

    
"""Polynomial with value c[i]*sin(i*pi*x)"""
immutable SinePolynomial{T<:Real}
    coefs::Vector{T}
end



In [ ]:

    
# If not specified otherwise our parent is Any - the very tip of the type tree
supertype(SinePolynomial)



In [ ]:

    
# Our type is composite i.e. has fields
fieldnames(SinePolynomial)

To add 'its methods' we define functions with the new type. These are added to the dispatcher's system and called when needed.



In [ ]:

    
import Base: +, -, call
# Add () for evaluating the polynomial
# Note that we are defining SinePolynomial such that the coefficients can have any type.
# That (conrete) type is referred to by T
(p::SinePolynomial{T}){T}(x::Real) = sum([c*sin(k*π*x) for (k, c) in enumerate(p.coefs)])

# We can add/subtract polynomials. This is simple because of othogonality
# Again we can add p, q with coefficients of 2 different types
function +{T, S}(p::SinePolynomial{T}, q::SinePolynomial{S})
    np, nq = length(p.coefs), length(q.coefs)
    n = max(np, nq)
    p = [p.coefs; zeros(T, n-np)]
    q = [q.coefs; zeros(S, n-nq)]
    SinePolynomial(p+q)
end

function -{T, S}(p::SinePolynomial{T}, q::SinePolynomial{S})
    np, nq = length(p.coefs), length(q.coefs)
    n = max(np, nq)
    p = [p.coefs; zeros(T, n-np)]
    q = [q.coefs; zeros(S, n-nq)]
    SinePolynomial(p-q)
end

# See that + has been added to lookup table for dispatching
methods(+)



In [ ]:

    
import Base.*
# We better prevent multiplications
*{T, S}(p::SinePolynomial{T}, q::SinePolynomial{S}) = throw(ArgumentError("No multiplication until CosinePolynomial"))

p = SinePolynomial([1, 2, 3])
q = SinePolynomial([2, -1])

try
    p*q
catch ArgumentError
    println("Is it okay? ", applicable(*, (typeof(p), typeof(q))))
    nothing
end



In [ ]:

    
# New in Julia 0.5, vectorized notations
p = SinePolynomial([1]) + SinePolynomial([0, 1])
x = linspace(-1, 1, 100)
y = p.(x)

using PyPlot

figure()
plot(x, y)
show()

Type stability

You do not have to talk about types when writing Julia code
Compiler does its best to remove uncertainties about types when analyzing the code but this is not always possible
The generated code must then handle the uncertainty, unboxing
Consider this example (taken from Introducing Julia and Julia manual)



In [ ]:

    
# Type INstability of s
function unstable(n)
    s = 0          # S is int
    for i in 1:n
        s += s/i   # S is Float
    end
end

# Type stability of s
function stable(n)
    s = 0.0         # S is Float
    for i in 1:n
        s += s/i   # S is Float
    end
end

Let's compare the speed



In [ ]:

    
@time stable(1000000)



In [ ]:

    
@time unstable(1000000)



In [ ]:

    
# Run @code_warntype here as well to see the (un)certainty
@code_llvm stable(3)



In [ ]:

    
@code_llvm unstable(3)

Interoperability with C

C (and fortran) functions available in shared libraries can be accessed via ccall
In this case Julia's JIT generates code such calling the C function from Julia is as fast as the call from native C
The arguments to ccall are function name, library, return type, tuple of input types and the values for input parameters

Let us write julia function for setting enviromental variable using libc's setenv. We add getenv as well



In [ ]:

    
# Example setenv via libc
function my_setenv(var::AbstractString, value::AbstractString)
    ccall((:setenv, "libc"), Cint, (Cstring, Cstring, Cint), var, value, 1)
end

# We will write getenv to check it
function my_getenv(var::AbstractString)
    unsafe_string(ccall((:getenv, "libc"), Cstring, (Cstring, ), var))
end



In [ ]:

    
# Test driwe
var = "Today"
wdays = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
for w in wdays
    my_setenv(var, w)
    println("Today is $(my_getenv(var))")
end



In [ ]:

    
unsafe_string

Polynomial evaluation using GNU scientific library. Parametrized functions, restricted types (only Real coefs). The signature of gsl_poly_val



In [ ]:

    
# Polynomial evaluation using GNU scientific library. Parametrized functions, restricted types
"""Evaluate polynomial c[1]*x^0 + c[2]*x^1 ... at x"""
function polyval{T<:Real, S<:Real}(c::Vector{T}, x::S)
    c = convert(Vector{Float64}, c)
    n = length(c)
    value = ccall((:gsl_poly_eval, "libgsl"), Cdouble, (Ptr{Cdouble}, Cint, Cdouble), c, n, x)
    Float64(value)
end



In [ ]:

    
using PyPlot

# Plot some random polynomials
x = collect(-1:0.1:1) 

figure()
for i in 1:4
    c = rand(5)
    y = map(x -> polyval(c, x), x)
    plot(x, y)
end
show()

Batteries included

Julia has a very large scientific stack, e.g. FFT (FFTW by Steven G. Johnson), LA(LAPACK, BLAS, SuiteSparse), statistics
You can use existing Python modules(PyCall), existing C/Fortran libraries(ccall)
Some other (personal favourites): IterativeSolvers.jl, PETSc.jl Lazy.jl, JuMP.jl



In [ ]:

    
# And many more ...
Pkg.available()

Let's show off a bit of linear algebra features



In [ ]:

    
# The favourite Poisson problem on UnitInterval. Convergence of CG1 FEM
A = 0  # Keep A outside for loop
for ncells in [2^i for i in 8:14]
    h = 1/ncells
    x = Float64[h*i for i in 0:ncells]
    F = sin(2π*x)
    U = sin(2π*x)/4/π^2
    
    # Based on type of A specialized solver will be called
    A = SymTridiagonal(fill(2, ncells+1), fill(-1, ncells))/h
    b = h*F

    u = A\b
    e = norm(u - U)
    println(e)
end

# Which factorize was dispatched
@which factorize(A)



In [ ]:

    
# And now the eigenvalues. Convergence of the smallest eig of -u'' = lambda* u in (0, 1) with homog. Dirichlet bcs
for ncells in [2^i for i in 3:12]
    h = 1/ncells
    x = Float64[h*i for i in 0:ncells]

    A = SymTridiagonal(fill(2, ncells+1), fill(-1, ncells))/h
    # Here instead of solving GEVP for A, M we lump the mass matrix
    # and let M :=sqrt(inv(lump(M))) and solve symmetrized problem
    # M*A*M'
    Ml = spdiagm(sqrt(1./[1.+h/6; fill(h, ncells-1); 1.+h/6]))
    B = (Ml)*A*(Ml)
    B = SymTridiagonal(Ml*A*Ml)  # Here again we dispatch to a specialized method
    eigw = eigvals(B)                          

    lmin = minimum(eigw)
    println("System size $(length(eigw)), error in λ=$(abs(lmin-π^2))")
end

For Fourier transform Julia uses Fastest Fourier Transform in the West written originally and also for Julia by Steven G. Johnson - a PhD supervisor of Jeff Bezanson who is one of the creators of Julia.



In [ ]:

    
using PyCall
@pyimport scipy.fftpack as pyfft
# Show that FFT works
v = rand(20)

fft_jl = FFTW.r2r(v, [FFTW.REDFT10])
fft_py = pyfft.dct(v)

norm(fft_jl - fft_py)

Julia has it's own built in parallelism but there is also MPI.jl. If you have a working FFT and MPI it is tempting to write a spectral DNS solver for Navier-Stokes equations (in tripply periodic domain).

Additional resources

Wikibook Introducing Julia
Julia manual from the homepage
YouTube videos
- Introduction Steven Johnson @ EuroScipy
- Introduction Alan Edelman @ MIT
- Some metaprogramming Jeff Bezanson, Stefan Karpinski @ Curry On
- Julia + Python Stefan Karpinksi @ PyDATA
- ... and plenty others from JuliaCon. See Julia's YouTube channel.
Source code on GitHub

Metaprogramming

At this point you have seen Python convenience and C speed. Referring back to the talk's title I owe you Lisp macros. Julia is homoiconic - the code is representable as a data structure in the language.



In [ ]:

    
a = parse("1+2")
b = :(1+2)



In [ ]:

    
a == b

The structure can be manipulated.



In [ ]:

    
dump(a)



In [ ]:

    
# Here we change + to -
a.args[1] = :-



In [ ]:

    
# Consequences of that manipulation
eval(a) == -1

Let's have look at how this functionality can be useful



In [ ]:

    
# Suppose that we would like to define operations for this data type
type Foo
    value::Number
end

try
    Foo(2) + Foo(3)
catch MethodError
    println("+ is not defined")
end



In [ ]:

    
# We would write the following code for every method
import Base.+
+(a::Foo, b::Foo) = Foo(a.value + b.value)

(Foo(2) + Foo(3)).value == 5



In [ ]:

    
# But this can be automated
import Base: -, *, /



In [ ]:

    
methods = map(Symbol, [-, *, /])
codes = []
for method in methods
    code = quote 
        $(method)(a::Foo, b::Foo) = Foo(($method)(a.value, b.value))
    end
    @show code
    push!(codes, code)
end



In [ ]:

    
# And now eval
map(eval, codes)

# And since The three chief virtues of a programmer are: Laziness, Impatience and Hubris we generate the tests
for method in methods
    @assert eval(:($(method)(Foo(2), Foo(3)).value == $(method)(2, 3)))
end

This way of generating code is quite common in Julia's codebase, see e.g. wrapping BLAS routines. When Julia's JIT parses the code and builds an AST we can intercept the process before the code is compiled and work on the built expressions. To this end we use special functions called macros.



In [ ]:

    
# A simple example - pad the bodu with timeing statemenets. 
# We do this before compilation, otherwise 1+2 will evaluated.
macroexpand(:(@time 1 + 2))

Here is a more impressive example from the standard library which computes the inverse of the error function. The idea is that based on the values of argument different polynomial approximation is used. The polynomials are evaluated using Horner's rule and using macro the code for the polynomial evaluation is inlined.



In [ ]:

    
# We try to write the same code but evaluation of the polynomial will be done via call to external routine.
function _horner(x, p...)
    c = p[end]
    for i in length(p)-1:-1:1
        c = p[i] + c*x
    end
    c
end

function _erfinv(x::Float64)
    a = abs(x)
    if a >= 1.0
        if x == 1.0
            return Inf
        elseif x == -1.0
            return -Inf
        end
        throw(DomainError())
    elseif a <= 0.75 # Table 17 in Blair et al.
        t = x*x - 0.5625
        return x * _horner(t, 0.16030_49558_44066_229311e2,
                             -0.90784_95926_29603_26650e2,
                              0.18644_91486_16209_87391e3,
                             -0.16900_14273_46423_82420e3,
                              0.65454_66284_79448_7048e2,
                             -0.86421_30115_87247_794e1,
                              0.17605_87821_39059_0) /
                   _horner(t, 0.14780_64707_15138_316110e2,
                             -0.91374_16702_42603_13936e2,
                              0.21015_79048_62053_17714e3,
                             -0.22210_25412_18551_32366e3,
                              0.10760_45391_60551_23830e3,
                             -0.20601_07303_28265_443e2,
                              0.1e1)
    elseif a <= 0.9375 # Table 37 in Blair et al.
        t = x*x - 0.87890625
        return x * _horner(t, -0.15238_92634_40726_128e-1,
                               0.34445_56924_13612_5216,
                              -0.29344_39867_25424_78687e1,
                               0.11763_50570_52178_27302e2,
                              -0.22655_29282_31011_04193e2,
                               0.19121_33439_65803_30163e2,
                              -0.54789_27619_59831_8769e1,
                               0.23751_66890_24448) /
                   _horner(t, -0.10846_51696_02059_954e-1,
                               0.26106_28885_84307_8511,
                              -0.24068_31810_43937_57995e1,
                               0.10695_12997_33870_14469e2,
                              -0.23716_71552_15965_81025e2,
                               0.24640_15894_39172_84883e2,
                              -0.10014_37634_97830_70835e2,
                               0.1e1)
    else # Table 57 in Blair et al.
        t = 1.0 / sqrt(-log(1.0 - a))
        return _horner(t, 0.10501_31152_37334_38116e-3,
                          0.10532_61131_42333_38164_25e-1,
                          0.26987_80273_62432_83544_516,
                          0.23268_69578_89196_90806_414e1,
                          0.71678_54794_91079_96810_001e1,
                          0.85475_61182_21678_27825_185e1,
                          0.68738_08807_35438_39802_913e1,
                          0.36270_02483_09587_08930_02e1,
                          0.88606_27392_96515_46814_9) /
              (copysign(t, x) *
               _horner(t, 0.10501_26668_70303_37690e-3,
                          0.10532_86230_09333_27531_11e-1,
                          0.27019_86237_37515_54845_553,
                          0.23501_43639_79702_53259_123e1,
                          0.76078_02878_58012_77064_351e1,
                          0.11181_58610_40569_07827_3451e2,
                          0.11948_78791_84353_96667_8438e2,
                          0.81922_40974_72699_07893_913e1,
                          0.40993_87907_63680_15361_45e1,
                          0.1e1))
    end
end



In [ ]:

    
# Make sure it works
erfinv(0.002) - _erfinv(0.002)



In [ ]:

    
# Is there a difference in performance?
@time [erfinv(x) for x in -1:0.00001:1];



In [ ]:

    
# _erfinv is about 2x slower. Note the difference in memory
@time [_erfinv(x) for x in -1:0.00001:1];

Some more examples: beginnings of SymPy



In [ ]:

    
dtable = Dict(:sin => :cos, :cosh => :(sinh))

macro ∂(f)
    if f.head == :call
        quote
            $(f.args[2]) -> eval(Expr(:call, $(dtable[f.args[1]]), $(f.args[2])))
        end
    end
end



In [ ]:

    
using PyPlot

x = linspace(1, -1, 100)
y1 = (@∂ sin(x)).(x)
y2 = (@∂ cosh(x)).(x)

figure()
plot(x, y1)
plot(x, y2)
show()

And finally Domain Specific Languages, cf. UFL of FEniCS. Suppose we want to write many if-elif-elif-....elif statements



In [ ]:

    
macro cond(cases...)
    _cond(cases)
end

function _cond(cases)
    @assert all(case -> length(case.args) == 2, cases)
    pred, tvalue = first(cases).args
        
    if length(cases) == 1
        return Expr(:if, pred, tvalue)
    else
        return Expr(:if, pred, tvalue, _cond(cases[2:end]))
    end
end



In [ ]:

    
x = 1
macroexpand(:(@cond (10 < 2, x+5) (3 < 4, x-2)))



In [ ]:

    
@cond (10 < 2, x+5) (3 < 4, x-2)



In [ ]:

    
# Still some work to be done ...
@cond (3 < 4, (0 < 2, 3))

Until this point we have generated code only using Expressions / by operating on AST / manipulating symbols. There is however an aditional stage where we can hook into the compiler - at this stage the types (of some) not the values of AST nodes are known. This makes it possible to generate code specialized based on type. This is an example of staged programming.



In [ ]:

    
# Let us define a type which we intend to represent a function approximated by polynomial of certain degree
type PolyF{N}
    body::Function
end



In [ ]:

    
# evaluation
import Base.call
(f::PolyF{N}){N}(x::Real) = f.body(x)

f = PolyF{1}(x -> 2x)
f(2)



In [ ]:

    
function gauss_legendre(degree::Int)
  #=
  The implementation is based on eigenvalues of the matrix T that is similar to
  the Jacobi matrix J. Both are tridiagonal but the T matrix is symmetric and
  the main diagonal is 0.
  =#
  diag = zeros(degree)
  up_diag = Float64[k/sqrt(4*k^2-1) for k=1:degree-1]
  T = SymTridiagonal(diag, up_diag)
  xq, ev = eig(T)
  wq = reshape(2*ev[1, :].^2, degree)
  return xq, wq
end

gauss_legendre(1)



In [ ]:

    
# This is a bit like FIAT and FFC combined
using StaticArrays

# function integrate{N}(f::PolyF{N}) # if you want to see the code
@generated function integrate{N}(f::PolyF{N})
    # Lets generate the quadrature points and weights 
    xq, wq = gauss_legendre(N)
    size = length(xq)
    # Other option
    #quote
    #    wq = SVector{$(size), Float64}($(wq))
    #    fxq = SVector{$(size), Float64}(f.($(xq)))
    #    sum(fxq.*wq)
    #end
    quote
        sum(f.($(xq)).*$(wq))
    end
end

integrate(PolyF{10}(x -> cos(x)))



In [ ]:

    
# See about the code
@code_llvm(integrate(PolyF{10}(x -> cos(x))))



In [ ]:

    
# JIT kicks in for new types, note new N(degree) implies new type
@time integrate(PolyF{22}(x -> cos(x)))



In [ ]:

    
# Call it again same degree - no JIT
@time integrate(PolyF{22}(x -> cos(x)))



In [ ]:

    
# And again, with diffrent body but same degre - no JIT
@time integrate(PolyF{22}(x -> sin(x)))