Julia



Kyle Barbary


UC Berkeley

Physics Project Scientist

Berkeley Institute for Data Science Fellow


GitHub: @kbarbary

What's the big deal?

Timing relative to C (lower is better). http://julialang.org

What does it look like?


In [48]:
"""approximate π^2/6 with the first n terms in an infinite series."""
function pi_sum(n)
    sum = 0.0
    for k in 1:n
        sum += 1.0 / (k*k)
    end
    return sum
end


Out[48]:
pi_sum (generic function with 1 method)

In [49]:
sqrt(6 * pi_sum(10000))


Out[49]:
3.1414971639472147

How does it achieve speed?

  • Just-in-time (JIT) compilation
  • Careful language design

Julia solves the "two language problem"

  • Traditional dynamic scientific programming languages offer high-productivity but require writing C/C++/Fortran for speed
  • Julia is both dynamic (high-productivity) and high-performance

... but wait, there's more!

  • Multiple dispatch (similar to function overloading in C++)
  • Parametric functions and types (think templates in C++)
  • Generic programming
  • metaprogramming (think #define in C but more powerful)
  • Call C / Fortran functions directly with no overhead and no wrappers!
  • Multithreading (experimental in v0.5; target for v1.0)
  • Solid package manager and vibrant package ecosystem
  • MIT licensed

Julia for Pythonistas

One of the best things about coming to Julia from Python is that the languages are quite similar in semantics. Specifically, the way variables are assigned and passed to functions is identical. While you have to remember the surface syntax differences, you don't have to re-learn how to think about your code.

Assignment of names


In [50]:
a = [1.0, 2.0, 3.0, 4.0] # some array


Out[50]:
4-element Array{Float64,1}:
 1.0
 2.0
 3.0
 4.0

In [51]:
b = a  # assign the name "b" to the same array that 'a' is pointing to.
b[1] = 5.0  # modify the first element in that array
b


Out[51]:
4-element Array{Float64,1}:
 5.0
 2.0
 3.0
 4.0

In [52]:
a  # change is reflected in a


Out[52]:
4-element Array{Float64,1}:
 5.0
 2.0
 3.0
 4.0

Function calls: pass by sharing


In [53]:
# define a function that modifies an array
function double!(x)
    for i=1:length(x)
        x[i] *= 2.0
    end
end


Out[53]:
double! (generic function with 1 method)

In [54]:
a = [1.0, 2.0, 3.0, 4.0]


Out[54]:
4-element Array{Float64,1}:
 1.0
 2.0
 3.0
 4.0

In [55]:
double!(a)
println(a)  # modification is reflected to caller,
            # because there was only ever one array!


[2.0,4.0,6.0,8.0]

Summary: Your hard work learning Python can transfer well to Julia.

For more on how both languages treat names and values, http://nedbatchelder.com/text/names1.html is a great reference.

Julia: fixing Python annoyances

  • Python lists & ndarrays
  • Defining efficient small classes
  • optimizing bottlenecks

Julia unifies Python "lists" and ndarrays

In Python, most of us are heavy users of numpy, which provides a ndarray class for homogenous arrays. On the other hand, we also have Python's built-in list type, which are heterogeneous 1-d arrays. It can sometimes be awkward dealing with two types that have such overlapping functionality.

x = [1, 'two', 3.0]  # heterogeneous
y = np.array([1.0, 2.0, 3.0])  # homogeneous

(I end up starting a lot of functions with x = np.asarray(x).)

Julia arrays: homogeneous and heterogeneous


In [56]:
# equivalent of Python list or ndarray with dtype='object'
a = [1.0, 2, "three", 4+0im]


Out[56]:
4-element Array{Any,1}:
  1.0     
  2       
   "three"
 4+0im    

In [57]:
typeof(a)  # a is an array of heterogenous objects


Out[57]:
Array{Any,1}

In [58]:
map(typeof, a)


Out[58]:
4-element Array{Any,1}:
 Float64       
 Int64         
 ASCIIString   
 Complex{Int64}

In [59]:
# equivalent of Python ndarray with dtype=float64
b = [1.0, 2.0, 3.0, 4.0]
typeof(b)


Out[59]:
Array{Float64,1}

In [60]:
# array only takes up 4 * 8 bytes
sizeof(b)


Out[60]:
32

Arrays easily extensible to new "dtypes"

You can't do this (efficiently) in NumPy.


In [61]:
immutable Point
    x::Float64
    y::Float64
end

In [62]:
x = [Point(1., 2.), Point(3., 4.), Point(5., 6.)]


Out[62]:
3-element Array{Point,1}:
 Point(1.0,2.0)
 Point(3.0,4.0)
 Point(5.0,6.0)

In [63]:
sizeof(x)  # points are stored efficiently in-line


Out[63]:
48

This often means that you can design the code much more naturally than in Python.

For performance in Python, you'd have to do something like

class Points(object):
    """A container for two arrays giving x and y coordinates."""

    def __init__(self, x, y):
        self.x = x
        self.y = y

    def add_offset(self, x_offset, y_offset):
        self.x += x_offset
        self.y += y_offset

    # ... other methods that operate element-wise

What you really want is a Point object, but if you write classes that way in Python, performance will suffer.

Writing performance-sensitive code

(the big win)

Suppose you're doing some array operations, and it turns out to be a bottleneck:


In [64]:
# two 200 x 200 matricies
n = 200
A = rand(n, n)
B = rand(n, n);

In [65]:
f(A, B) = 2A + 3B + 4A.*A  # function we want to optimize


Out[65]:
f (generic function with 1 method)

In [66]:
using TimeIt

In [74]:
@timeit f(A, B);


1000 loops, best of 3: 321.80 µs per loop

Python version

We get similar performance in Python:

In [5]: n = 200

In [6]: from numpy.random import rand

In [7]: A = rand(n, n);

In [8]: B = rand(n, n);

In [9]: %timeit 2*A + 3*B + 4*A*A
1000 loops, best of 3: 354 µs per loop

But if needed to optimize this further, we'd have to reach for a specialized tool such as cython, numba, ...

Optimize in Julia

Using loops


In [68]:
function f2(A, B)
    length(A) == length(B) || error("array length mismatch")
    C = similar(A, promote_type(eltype(A),eltype(B)))
    @inbounds for i=1:length(C)
        C[i] = 2A[i] + 3B[i] + 4A[i]*A[i]
    end
    return C
end


Out[68]:
f2 (generic function with 1 method)

In [75]:
@timeit f2(A, B);


10000 loops, best of 3: 66.63 µs per loop

Using loops and pre-allocated memory


In [70]:
function f3!(A, B, C)
    length(A) == length(B) == length(C) || error("array length mismatch")
    @inbounds for i=1:length(C)
        C[i] = 2A[i] + 3B[i] + 4A[i]*A[i]
    end
end


Out[70]:
f3! (generic function with 1 method)

In [71]:
C = similar(A, promote_type(eltype(A),eltype(B)))
@timeit f3!(A, B, C);


10000 loops, best of 3: 54.51 µs per loop

Sane built-in package manager

Packages and managing dependencies are super important. Julia's Pkg is declarative (like conda). It's not the mess that pip is!

Pkg.add("Cosmology")

would add "Cosmology" to the requirements:


In [72]:
;cat ~/.julia/v0.4/REQUIRE


IJulia
Cosmology
ERFA
ForwardDiff
Requests
HTTPClient
DocOpt
Example
Gadfly
Winston
Logging
SIUnits
PyCall

Julia figures out dependencies and installs the optimal version of every package to satisfy dependencies minimally

Try it out:

  • In the browser: http://juliabox.com

  • Julia on Lawrencium cluster:

    module load julia
  • At NERSC/Cori (experimental support):

    module load julia

Thanks for listening!

Julia downsides

  • Less mature package ecosystem (But rapidly expanding. Plus, PyCall is pretty good.)
  • Slower module loading (but improving with "precompilation" and will probably improve more in the future).
  • Dynamically dispatched code (when Julia can't infer the types) can be slow.
  • Language is still changing. Currently at v0.5; be ready to update code for v1.0 (1-2 years away).