This tutorial follows the Welcome to Julia talk.
It assumes that you come with some knowledge of Python, IDL, Matlab, or a similar interactive (interpreted) language. That is, this is not a "how to program" tutorial, because you know that already. The syntax of Julia is similar to other interpreted languages and we won't be spending much time discussing it, except to point out differences.
You might also benefit from a little bit of knowledge about compiled languages like C or Fortran, at least the idea of types: why telling the computer what type of values you are giving it is important. However, we'll go through this as well if time permits, as it's key to understanding how to write performant code.
I/O
Exercise 1: Filling the gaps: calling Python
Macros
Exercise 2: Working with data: FITS images and arrays
Types
Exercise 3: Easier programming with types and multiple dispatch
3 methods to use Julia:
> module load git python/2.7
> module load julia
> julia
The following command line options might be useful:
> julia -pN
: parallel processing; replace N with the number of cores to use> julia <filename>
: execute a file> julia -e "<code>"
: execute a piece of codeThe REPL (Read-Eval-Print-Loop) will be familiar if you've used an interactive language before. Commands entered at the REPL will be executed immediately, so you can use it as a calculator or for tinkering. You can also load packages and script files from within the REPL.
Here's what the REPL looks like:
Jupyter is the new name of the IPython Notebook, an interactive browser-based programming environment. It now supports multiple languages including Julia, Python, and R (hence the name). I recommend that you load IPython 3.0 to gain access to Jupyter, because the default version on the IoA system is older (2.1).
> module load python27_modules/ipython/3.0
> ipython notebook
Jupyter is the thing you are looking at now. It is similar to the REPL in that it interactive, but it also stores a full graphical history of your work. It is ideal for showing code or presenting results without having to write a full report. You can write in LaTeX and Markdown, and can also use it to make slideshows, like Welcome to Julia. Notebooks that you make can be shared online. In fact, you can view all the notebooks in this tutorial by going to http://nbviewer.ipython.org/github/swt30/ioa-julia-tutorials/.
quit()
pwd()
cd("dir")
mv("f1", "f2")
workspace()
ans
in the REPLShell mode or "Shelling out": press ; to change the REPL to shell mode
shell> ls
julia> *press ;*
Help: press ? to change to help mode
help?> logspace
julia> *press ?*
Shell commands can also be created with backticks like this:
In [1]:
cmd = `echo hello`
run(cmd)
In [2]:
for i=1:3
println(i)
end
In [3]:
# random 2x2 matrix
rand(2, 2)
Out[3]:
In [4]:
# if we want a 100x100 random matrix, but don't want to print it
rand(100, 100);
logsp
and hitting TAB
\alpha
and hitting TAB
a couple of times. Try other symbol names like \lambda
or \pi
. See what happens if you hit enter to evaluate these. All Unicode symbols can be used as, or as part of, variable names. Some like \pi
have predefined values, while others like \ge
(≥) are functions.
In [5]:
λ = 3700 # \lambda
10λ
Out[5]:
In [6]:
π # \pi
Out[6]:
In [7]:
a = 5
b = 5
a ≥ b # \ge
Out[7]:
In [8]:
3 ∈ (1,2,4) # \in
Out[8]:
Modules in Julia can be loaded in a couple of different ways depending on what you want to do with them.
Load modules with using
when you want to access them fully. Loading a module with using
brings in the module and makes its functions immediately available. If any function names clash, you'll be told.
In [9]:
using PyPlot
All the functions that PyPlot chooses to export, like plot()
, can now be called directly. Some functions internal to the module may not be exported by it, but you can always access these by doing PyPlot.function
.
using PyPlot
and then typing PyPlot.
and pressing TAB
to see all the functions available. (In the notebook this will bring up a drop-down menu and you'll need to press ENTER
to choose)get_cmaps()
, a function exported by PyPlot, to print the colourmaps available. In the notebook you get a pretty colour chart.
In [10]:
get_cmaps()
Out[10]:
find_backend()
, a function that PyPlot doesn't make available. This function is internal to the module so if you want to access it, you have to do PyPlot.find_backend()
. The idea is that you shouldn't usually need to access these internal functions, though.
In [11]:
find_backend() # will tell you it's not defined
In [12]:
import DataFrames
# we want a function called countne that counts things that aren't equal
a1 = [1, 2, 3, 4, 5]
a2 = [1, 6, 7, 4, 9]
DataFrames.countne(a1, a2)
Out[12]:
In [13]:
import StatsBase: sample, geomean
# we want functions for sampling and the geometric mean
# you can also write: import StatsBase.sample, StatsBase.geomean
colours = ["red", "blue", "green"]
sample(colours, 5)
Out[13]:
In [14]:
geomean([1, 2, 3])
Out[14]:
In [15]:
import Base.length # you would do this if you want to add to the `length` function
The function include
is like copying and pasting a script right where you put the include
. Easy! If speed is at all important here, you will want to write this script as a function, something like main()
, and then call that. This is for performance reasons (more later).
In [16]:
;cat scripts/hello.jl
In [17]:
include("scripts/hello.jl")
In [18]:
x = 3
y = 7
z = x + y
Out[18]:
In [19]:
sin(pi/2)
Out[19]:
You can write simple functions by putting them on one line and using an equals sign. You can omit *
signs if the thing in front is a number and the thing afterwards is a variable (or something in brackets)
In [20]:
f(x) = 3x^2 - 4x + 1
f(3)
Out[20]:
In [21]:
3(2 + 1)
Out[21]:
In [22]:
2^3
Out[22]:
div(a,b)
if you need integer division.)
In [23]:
7/9
Out[23]:
In [24]:
7/9 + 14/9
Out[24]:
//
In [25]:
7//9
Out[25]:
In [26]:
7//9 + 14//9
Out[26]:
true
and false
, with no capitals.
In [27]:
3 < 0
Out[27]:
Use the typeof
function to see the type of a value.
In [28]:
typeof(9)
Out[28]:
In [29]:
typeof(7.1)
Out[29]:
Some types are parameterised. The parameters will show up in curly braces. For example, here's a rational number constructed with real integers in the numerator and denominator:
In [30]:
typeof(7//9)
Out[30]:
We'll talk more about types later, but for now it suffices to know that functions will only work on certain types. This should come as no surprise: you can't divide a string by a number.
In [31]:
typeof(1.8 + 3im)
Out[31]:
In [32]:
"hello" / 2
Understanding error messages is a good goal for the early stages of understanding a language. In Julia, at first you'll often be getting errors that look like the above. This message tells you that the function '/'
(i.e. division) doesn't know what to do if presented with two things that have the type of ASCIIString and Int.
In [33]:
# Some functions make arrays
linspace(0, 5, 10)
Out[33]:
In [34]:
# Type them in yourself
[1, 6, 2] # 1D array = column vector
Out[34]:
In [35]:
# Comprehensions like in Python
[x^2 for x=1:10]
Out[35]:
The 1:10 we saw above makes a "range" which can be iterated over like xrange
in python. You can also use them to make arrays directly.
In [36]:
# Use ranges to make arrays
[1:5] # this syntax will change, so alternatives are...
collect(1:5)
[x for x=1:5]
Out[36]:
In [37]:
# Also do row vectors and matrices
[2 3 5] # using spaces makes a 2D array: it's now a row vector
Out[37]:
In [38]:
[1 2 3;
4 5 6;
7 8 9] # spaces and semicolons make a matrix
Out[38]:
Arrays are indexed using square brackets starting from 1.
Yes, from 1.
Don't worry, it will be okay.
In [39]:
a = [1,2,3,4,5]
a[1], a[3]
Out[39]:
In [40]:
a[0]
Slices can be made using the colon, :
. They include both ends (unlike Python where 1:2 would only capture one number). To get the end of an array, use the keyword end
.
In [41]:
a[end]
Out[41]:
In [42]:
a[2:end-1]
Out[42]:
Arrays are also indexed in column-major (Fortran) order. That means that you specify the row first, then the column. I believe this is the same as IDL, Fortran, and Matlab, but different to C/C++ and Python.
In [43]:
c = [1 2 3;
4 5 6;
7 8 9]
c[2, 3]
Out[43]:
Because Julia handles its arrays in a mathematically consistent fashion, you can't multiply matrices of incompatible shapes.
In [44]:
v1 = [1, 2, 5]
v2 = [3, 6, 4]
v1 + v2 # adding two matching-sized vectors is fine
Out[44]:
In [45]:
# dot product (\dot): x⋅y == dot(x,y) == x * y'
x ⋅ y
Out[45]:
In [46]:
v1 * v2 # but this will cause an error
Since we're often working with things of equal shape and want to do elementwise operations, we can use dotted versions of operators. There are dotted arithmetic operators:
.*
.^
etc, as well as dotted comparisons:
.==
.>
.!=
and so on. These will apply the operation to matching elements in the arrays. (The dotted operators also do broadcasting)
In [47]:
v1 .* v2
Out[47]:
In [48]:
v1 .>= v2
Out[48]:
Mixing types in arrays produces "Any" arrays, which are less efficient. Arrays are generally nicer to work with when the things inside them are the same type. If you want to force an array to be a particular type, put the type in front of it: Float64[1,2,3]
In [49]:
[3, "hello", 'a']
Out[49]:
For storing disparate sets of values you might want to use tuples instead. The type of a tuple matches the types of the things inside it.
In [50]:
(3, "hello", 'a')
Out[50]:
In [51]:
typeof(ans)
Out[51]:
In [52]:
println("hi")
println("hello")
print("hey")
print("sup")
In Julia, double quotes are used for strings. Single quotes are reserved for referring to single characters. You can't write 'hello'
, for instance.
Use $ signs to interpolate values into strings.
In [54]:
name = "Scott"
"Hi $name"
Out[54]:
You can also join strings by using the *
symbol between them.
In [55]:
data = readdlm("data/data.txt")
Out[55]:
In [56]:
scott_actual_height = 1.85
data[2,2] = scott_actual_height
file = open("data/corrected_data.txt", "w")
writedlm(file, data)
close(file)
In [57]:
;cat data/corrected_data.txt
In [58]:
using PyCall
@pyimport scipy.optimize as so
@pyimport scipy.interpolate as si
g(x) = cos(x) - x
so.newton(g, 1.)
Out[58]:
In [59]:
interp = si.interp1d([1,2,3,4,5], [2,3,5,8,11])
pycall(interp, PyAny, [1.5, 2.5, 3.5, 4.5])
# calling the interp function requires some more jiggling (see PyCall docs)
# so really you'd want to use a Julia library for this
Out[59]:
In [60]:
using PyPlot
In [61]:
waves(x, y) = sin(x) * cos(y)
xs = linspace(-2, 2)
ys = linspace(-2 ,2)
zs = [waves(x, y) for x in xs, y in ys]
surf(xs, ys, zs, cmap="bone")
Out[61]:
Now let's use our knowledge of how to call Python from Julia to solve a simple least-squares fitting problem using standard Scipy functions.
In [62]:
poly(x) = x^2 - 7x + 3
Out[62]:
Or a more extensive syntax:
In [63]:
function long_poly(x)
y = x^2
z = 7x
y + z + 3
end
Out[63]:
Whichever you choose, the last thing you write is the return value (but you can use return
at any time to force an early return). There are also anonymous functions ("lambda functions") which are nice for including in one-liners. The syntax uses an arrow (->
).
In [64]:
x -> x^2
Out[64]:
In [65]:
map(x -> x^2, 1:5)
Out[65]:
In [66]:
# if you're familiar with "logical indexing", you can do
a = [-2, 3, 6, -8, -1]
a[a .> 0]
# to get numbers in an array larger than zero.
# But you can also use a filter function:
gzero(x) = x > 0
filter(gzero, a)
# to avoid defining and naming a function, make it anonymous
filter(x -> x>0, a)
Out[66]:
Finally, Julia has tuple unpacking like Python, so you can return multiple values by just returning a tuple. These can be made without parentheses if you just put a comma in.
In [67]:
switch(x, y) = y, x
switch(3, 7)
Out[67]:
In [68]:
a, b = switch(3, 7)
a
Out[68]:
In [69]:
# @evalpoly rewrites polynomials to use the computationally efficient Horner's method
poly1(p) = @evalpoly p 9 2 6 4 7 1 8 3
poly2(p) = 9 + 2p + 6p^2 + 4p^3 + 7p^4 + 1p^5 + 8p^6 + 3p^7
@time p1 = [poly1(x) for x in 1:1000]
@time p2 = [poly2(x) for x in 1:1000]
@assert p1 == p2
Here we used a macro called @evalpoly
to rewrite a polynomial expression into a more efficient form. We also used @time
to check how long something took and how much memory it used, as well as @assert
to check that something was true. If the two results weren't the same, @assert
would have raised an error for us. Using assertions is therefore a useful way to check your assumptions in the middle of code.
Other useful macros include:
@parallel
: make a for loop into a parallel for loop@__FILE__
: the path to the current file@profile
: see which bits of code take the longest@which
: tells you which version of a particular function is being used when you call it in a particular way@edit
: edit the relevant function definition@vectorize_1arg
and @vectorize_2arg
: make a function of a single variable automatically work on arrays (and they're fast) @pyimport
once PyCall
is loaded: make a Python package available in Julia@show
: show the value of something on screen as it's done. Nice for debugging, and also check out...@debug
and @bp
once Debug
is loaded, for stopping inside code and examining what's going onControl works like any other language. Key features:
end
if
... elseif
... else
... end
for
loops work with either x in collection
or x=collection
. This looks more natural in some cases like x=1:10
begin
/end
for multi-statement code blocks, or you can put them on one line with semicolons:x = 3; sx = sin(x); cx = cos(x)
module ModuleName
module_code_goes_here
end
and access them by using
or import
ing them.
Let's now try an exercise which has us working with FITS files, and do a simple performance check to see whether it's faster to call a Python library or to use a Julia library to read in FITS.
Writing as much as your code as possible in functions is always a good idea (and this doesn't just go for Julia). The JIT compiler specialises code based on functions. If you write small functions that do one thing each then they are more easily understood by you, more easily optimized by the compiler, and more re-usable.
To make functions fast, there are two important rules.
Here's a contrived example of how type stability affects things.
In [70]:
# an array of ten million random numbers
A = rand(10000000)
# this function is not type stable
function badround(x)
if x >= 0.5
return 1.0 # a floating point number
else
return 0 # an integer
end
end
# this function is type-stable
function goodround(x)
if x >= 0.5
return 1
else
return 0
end
end
Out[70]:
In [71]:
# run them once to compile them
@show badround(0.7)
@show goodround(0.7);
In [72]:
# time each one
@time map(badround, A)
@time map(goodround, A);
Other things to do to ensure performance include:
const
like this:
In [73]:
const N_ITERATIONS = 100
Out[73]:
The compiler will have trouble optimizing code using global variables if you don't do this, as they could theoretically change types at any time if you don't declare them const
.
Other things that help:
In [74]:
typeof(3)
Out[74]:
In [75]:
typeof(7.8)
Out[75]:
But everything in Julia has a type.
In [76]:
typeof("hello")
Out[76]:
In [77]:
typeof([1,7,4])
Out[77]:
In [78]:
typeof(typeof)
Out[78]:
In [79]:
typeof(`echo hello`)
Out[79]:
In [80]:
typeof(quote
print("hello")
end)
Out[80]:
All of these types exist in a type hierarchy, where the top type is Any
. You will need just a few commands to use standard types:
typeof(x)
to get the type of things::
to annotate types<:
to check subtypesisa(x, type)
to check if something is a particular type convert(totype, x)
to convert between types
In [81]:
typeof(3)
Out[81]:
In [82]:
x = 3
x::Int # declares that x is an integer here; anything else will cause an error
Out[82]:
In [83]:
yell(x::String) = uppercase(x) * "!" # declares that this function will only work on strings
yell("yeehaw")
Out[83]:
In [84]:
yell(3)
In [85]:
Float64 <: FloatingPoint # checks if Float64 is a subtype of FloatingPoint
Out[85]:
In [86]:
isa(3.7, FloatingPoint) # checks if the value 3.7 is a FloatingPoint
Out[86]:
In [87]:
convert(FloatingPoint, 3) # converts the integer 3 to a FloatingPoint
Out[87]:
You often don't need to declare types at all, but there are some situations where they can help. These include when defining your own types (so they can be stored efficiently), in certain inner loops (so they can be iterated efficiently), and whenever you're getting a value from somewhere where it's not obvious what the type should be from the source.
This is all well and good, and you can use these commands to play with types, but the real magic happens when you write your own: suddenly you have a really nice way to handle doing the same things to lots of different types of data. The great benefit is that you can do this without sacrificing performance and without having to constantly convert between types or check what type you're using.
A type
in Julia is a bit like a struct
in C, or a class
in other languages. But unlike object oriented languages where you would write a class and many methods to go with it, in Julia the type just holds some values. Later on, you will write functions that behave differently on different types.
Abstract types, written abstract
, are like categories that you group other types into. You declare them like this:
In [88]:
abstract Thing
abstract Living <: Thing
abstract Dead <: Thing
When you define any type, you can mark its supertype (parent) using the <:
notation seen above. Abstract types are used only to help define the type hierarchy. You can't do much with them right away besides test if things are subtypes. However, that's pretty useful by itself, as we'll see.
A normal type
holds a number of fields that you define:
In [89]:
type Skeleton <: Dead end
type Plant <: Living
species
end
type Animal <: Living
species
noise
end
Finally, there are immutable types. Calling something immutable
means it's like a constant type: if you try to change it, you'll get an error. We won't worry about those at the moment.
You make instances of types by just calling the type name and listing the fields:
In [90]:
randy = Animal("squirrel", "scritch scritch")
palmy = Plant("palm tree")
skelly = Skeleton()
Out[90]:
In [91]:
typeof(randy)
Out[91]:
In [92]:
typeof(skelly)
Out[92]:
Then you can access the fields of an instance by using a dot:
In [93]:
palmy.species
Out[93]:
In [94]:
randy.species = "brown squirrel"
Out[94]:
That's quite useful for handling data. You can easily make types that handle collections of different data: for example, an equation and associated boundary values, or coordinates, or whatever. However, multiple dispatch makes it even more worthwhile to use your own types in your code.
If you've programmed in an object-oriented language before, you'll know that you can provide objects with "methods": functions that apply specifically to that object.
Multiple dispatch takes this idea and makes it way better. In Julia, you can write the same function many times and have it do different things depending on the types of the arguments.
In [95]:
double(x::Number) = 2x
double(s::String) = "$s$s"
@show typeof("ha")
double("ha")
Out[95]:
In [96]:
@show typeof(3)
double(3)
Out[96]:
Only the versions that you need are compiled when the program is run. That means that, above, only the version relevant to Int64 and ASCIIString types have been compiled. This is what I mean by Julia being able to "specialise" at function boundaries.
This becomes very powerful when you start to write functions in ways that do conceptually the same thing but procedurally something different. For example, Julia can be fast because all of its mathematical code uses this concept: it will choose the version of some mathematical procedure which is computationally optimal for a particular type. That could be something simple like just returning 1 immediately if you call length(v) where v is, say, a UnitVector
, rather than calculating it like the more general case.
Let's imagine we want two functions that work on our Thing
s from above.
say
, which will tell us what noise something makes.isalive
, which returns true
if something is alive and false
otherwise.How would we write these?
In [97]:
# isalive just checks if the thing is a Living type
isalive(l) = isa(l, Living)
Out[97]:
In [98]:
# using `say` on an animal makes the noise of the animal...
say(a::Animal) = println("The $(a.species) goes '$(a.noise)'.")
# but `say` on another living thing tells us that we don't know what noise it makes
say(l::Living) = println("I don't know what a $(l.species) sounds like.")
Out[98]:
In [99]:
isalive(randy)
Out[99]:
In [100]:
isalive(skelly)
Out[100]:
In [101]:
say(randy)
In [102]:
say(palmy)
Since we didn't define a method of say
that covers dead things, this shouldn't be a surprise:
In [103]:
say(skelly)
Julia's package management is super simple. From within Julia:
Pkg.init()
to set up your package directoryPkg.status()
to see what's installedPkg.add("PkgName")
and Pkg.rm("PkgName")
to add and remove packagesPkg.edit()
to edit your list of packages (add and remove several at once)Pkg.update()
updates all packagesIf for whatever reason you need to reset your Julia packages and start again, just remove ~/.julia
. Everything is installed into there.
The thing that is most likely to confuse you is that Julia's package manager is "declarative": it figures out what packages you need and installs only those. That means that if a Package A is required by Package B and Package B is updated to no longer require Package A, Package A will be removed from your system.
The way the Julia package manager works doesn't currently support system-wide installs very well. That means that you won't see the IoA-wide packages when you run Pkg.status(). You can still call them by doing using
or import
though, and they will work just fine. If you ever run into problems with the system-wide packages, you can always install your own versions. They will go into your ~/.julia
folder and be used in preference to the IoA version.
These packages are installed IoA-wide to get you started:
ipython notebook
Some of these are under active development and if your key concern is stability, you might prefer to avoid them. The Python interfaces are quite solid, though.
Julia could be for you if:
Julia is not for you if: