Welcome to Julia. This short presentation covers the following topics:
Julia is a new scientific language designed for "greedy, unreasonable, demanding" programmers who want speed, flexibility, and usability.
It's a young language - in open development for about three years - but it shows a lot of promise. In particular, Julia promises to help break the "two-language problem". In science, one often wants to try out things on a small scale, develop code interactively, and tinker, before settling down to make it run fast on large amounts of data. Many scientists and programmers find that they need to move between two languages to do this: a dynamic language like Python that is easy to code in, and a fast language like C that gives numerical power. For many of us who aren't the best programmers in the world (myself definitely included), doing this incurs a significant overhead as we struggle with two different languages. Julia aims to remove the need for this division.
Here are benchmark results from a number of algorithms implemented in various languages. It's important to note that aren't based on writing optimal code in every situation. Rather, they're intended to show what sort of raw speed you can get by writing the same algorithm (which may be relatively naïve) in different languages. In general, Julia can provide speeds to within a factor of 2 of C, and orders of magnitude faster than dynamic languages like Python or Matlab.
In [1]:
# mathematical operations
[1,2,3] + [4,5,6]
# => [5, 7, 9]
# nice short functions
f(x) = 7x
f(3)
# => 21
# arrays
onetwothree = [1,2,3]
notseven = [1:6; 8:10]
square = [1 2 3;
4 5 6;
7 8 9]
# indexing
square[2,3]
Out[1]:
Julia has a very good "foreign function interface" - it's easy to work with stuff written in other languages. This is important for a young language, as it makes it possible to draw on vast libraries of Python code which don't yet have equivalents in Julia.
In [2]:
using PyCall
@pyimport astropy.io.fits as fits
hdulist = fits.open("data/fits-spectrum.fits")
hdrdata = hdulist[2][:data]
hdrdata["DATAMAX"]
Out[2]:
In [3]:
using PyPlot
f(x, y) = (1 - x^2) + (y - x^2)^2
xs = linspace(-2, 2)
ys = linspace(-1, 3)
zs = [f(x,y) for x in xs, y in ys];
surf(xs, ys, zs, cmap="Spectral_r")
Out[3]:
Of course, Julia has its own plotting libraries like Winston and Gadfly (below).
In [4]:
import Gadfly
funcs = [sin, cos, besselj1]
Gadfly.plot(funcs, 0, 8)
Out[4]:
You can download Julia and install it on your own computer. As of the time of writing, version 0.3.6 is the stable version.
You can also try Juno which is a nice editor bundled with Julia.
Advantages:
Disadvantages:
Thanks to Roderick, we have our own installation of Julia on the IoA cluster. See the tutorial for a little more information on setting this up.
Activate it by doing
module load git python/2.7
module load julia
Following this, typing julia
will start the julia command line. It looks like this:
Advantages:
Disadvantages:
With a Google account (Gmail, Google+, Youtube, etc) you can use JuliaBox, sync your notebooks to Google Drive or connect a Github account. JuliaBox has a large number of common packages already installed.
Advantages:
Disadvantages:
Much of the secret to Julia's speed comes from its clever system of types. Many more restrictive languages feature strong type systems as well, but Julia is nice in that it doesn't make you think about them unless you want to. If you don't specify a type, Julia will try and work it out for you, and will do its best to make your code fast regardless.
In [5]:
ccall((:clock, "libc"), Int32, ())
Out[5]:
The diagram above shows how part of the type system is laid out. All numbers belong to an appropriate type, and Julia knows how to perform the most efficient operations on any particular type. When you write your functions, you have the option of specifying the types of data you're going to provide. This has a couple of advantages:
I am doing work involving the equations of state (EOS) of planetary interiors. Some equations of state are simple: they just need a function evaluation, or an interpolation. Some need numerical inversion, so they need additional information to evaluate. And some are piecewise, consisting of many equations of state stacked together. By laying my code out in a type hierarchy, I can choose whether the functions I write will apply to all of these, or just some, as is appropriate.
Let's say we want to classify things according to whether they're Living
or Dead
. Let's further say that we want to make two types of living things: Animal
s, for which we know a species name and a noise, and Plant
s, for which we only know a species name. Here's a hierarchy of types for that.
Let's further imagine that we want to define two functions here: one called isalive
that tells us if a particular thing is alive, and another called say
that tells us what noise a particular thing makes. We want say
to work on every Living
thing, even Plant
s, but we don't want it to work for dead things.
How would that look?
In [ ]:
abstract Living
abstract Dead
type Skeleton <: Dead end
type Plant <: Living
species
end
type Animal <: Living
species
noise
end;
We just defined the types, making Plant
s and Animal
s Living
. We also made a Dead
Skeleton
type, because why not.
In [ ]:
randy = Animal("squirrel", "scritch scritch")
palmy = Plant("palm tree")
skelly = Skeleton();
Then we make some instances of those types. Randy the squirrel, who goes "scritch scritch". Palmy the palm tree, who doesn't have a noise. And Skelly the skeleton.
In [ ]:
isalive(l) = isa(l, Living)
say(a::Animal) = println("The $(a.species) goes '$(a.noise)'.")
say(l::Living) = println("I don't know what a $(l.species) sounds like.");
We write our functions. The isalive
function just checks if something is Living
. But say
is more interesting: we wrote it twice, once for Animal
s in particular and again as a fallback Living
case. Now everything should behave as expected:
In [6]:
println(isalive(palmy))
println(isalive(skelly))
say(randy)
say(palmy);
And finally, we should get an error when we try to call the function say
on something it's not defined to handle.
In [7]:
say(skelly)
Here we roll a 6-sided die one million times and sum the result. You can see that making the for-loop into a parallel for-loop is a simple matter of adding an @parallel
tag, and we also have the option of adding a reduction function at the end. In this case we choose to sum all the values (+)
. If we started Julia on multiple cores, it would be distributed automatically among them.
In [8]:
N = 1_000_000
roll_a_die() = rand(1:6)
total = @parallel (+) for i=1:N
roll_a_die()
end
average = total / N
Out[8]:
If you want to never have to rewrite your code ever again, Julia is not for you. Things will change - though at this stage, the language is stable enough that they will only be minor. Far more significant will be how packages change over time.