Purpose of Julia Study Circle

  • ### Become proficient enough to use Julia for bioinformatics work.
  • ### Identify when and for whom Julia is useful for bioinformatics.

Motto: Useful Julia

Practical

Topics

  • Introduction (two sessions), Rasmus
  • Parallelization, Gabriel
  • Running R code from Julia, J

Other possible topics to choose from:

  • Plotting
  • Parsing various (column-based) formats
  • Distributions
  • Data Frames
  • Unit testing
  • Julia JIT compiler
  • Interesting packages

Please propose your own!

Purpose of Julia

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments.

Overview:

  • Fast
  • Consistent design
  • Readable code
  • Complete programming language. Most of Julia is written in Julia. (They didn't reinvent the wheel. Julia relies on e.g. BLAS and LAPACK.)
  • Useful for scientific computing
  • More "precise" than e.g. python and R (more syntax errors, fewer incorrect results?)
  • Support for calling code written in other languages (C, python, R, etc.)
  • Package system

Some implications:

  • Understanding why Julia is fast, helps understanding language design choices.
  • Since most of Julia is written in Julia, your code will be just as fast as the standard library (if written correctly).

Benchmarks

Fortran Julia Python R Matlab Octave Mathematica JavaScript Go LuaJIT Java
gcc 5.1.1 0.4.0 3.4.3 3.2.2 R2015b 4.0.0 10.2.0 V8 3.28.71.19 go1.5 gsl-shell 2.3.1 1.8.0_45
fib 0.70 2.11 77.76 533.52 26.89 9324.35 118.53 3.36 1.86 1.71 1.21
parse_int 5.05 1.45 17.02 45.73 802.52 9581.44 15.02 6.06 1.20 5.77 3.35
quicksort 1.31 1.15 32.89 264.54 4.92 1866.01 43.23 2.70 1.29 2.03 2.60
mandel 0.81 0.79 15.32 53.16 7.58 451.81 5.13 0.66 1.11 0.67 1.35
pi_sum 1.00 1.00 21.99 9.56 1.00 299.31 1.69 1.01 1.00 1.00 1.00
rand_mat_stat 1.45 1.66 17.93 14.56 14.52 30.93 5.95 2.30 2.96 3.27 3.92
rand_mat_mul 3.48 1.02 1.14 1.57 1.12 1.12 1.30 15.07 1.42 1.16 2.36

Benchmark times relative to C (smaller is better, C performance = 1.0).

The Two Language Problem

Low-level languages:

  • Fast
  • Longer development times

Examples:

  • C
  • C++
  • Fortran
  • Java
  • etc.

High-level languages:

  • Slow
  • Shorter development times

Examples:

  • Python
  • R
  • Matlab
  • etc.

Since high-level languages tend to be slow, packages are often implemented in a low-level language.

Creates a huge wall between users and developers of the language.

Julia's approach:

By design, Julia allows you to range from tight low-level loops, up to a high-level programming style, while sacrificing some performance, but gaining the ability to express complex algorithms easily. This continuous spectrum of programming levels is a hallmark of the Julia approach to programming and is very much an intentional feature of the language.

More on the two language problem: https://www.youtube.com/watch?v=QTbPtKxDquc

Suggested reading before next session

Notable differences to other languages: http://docs.julialang.org/en/release-0.4/manual/noteworthy-differences/

Documentation (read or skim through introduction): http://docs.julialang.org/en/release-0.4/manual/introduction/

This session

  • Get familiar with Julia syntax.
  • How to think in Julia.
  • Various big and small things that I consider important in a programming language.
  • A little bit on what makes Julia stand out compared to other programming languages.

Some code

Simple expressions


In [56]:
1+5


Out[56]:
6

In [2]:
1/5


Out[2]:
0.2

In [3]:
"hello"


Out[3]:
"hello"

In [4]:
println("hello")


hello

In [5]:
a = 3
b = 2
a^b


Out[5]:
9

Help

Write ?function to get help


In [6]:
?println


search: 
Out[6]:
..  println(x)

Print (using :func:`print`) ``x`` followed by a newline.
println print_with_color print print_joined print_escaped

String interpolation

Use \$ to interpolate variables and expressions into strings.


In [57]:
a = 8
"a is equal to $a"


Out[57]:
"a is equal to 8"

In [8]:
a = "Hello"
b = "world"
println("$a, $(uppercase(b))")


Hello, WORLD

Arrays

1-indexed.


In [58]:
v = [1, 4, 9, 16]


Out[58]:
4-element Array{Int64,1}:
  1
  4
  9
 16

In [59]:
v[3]


Out[59]:
9

In [11]:
v[end]


Out[11]:
16

In [12]:
v[2:end]


Out[12]:
3-element Array{Int64,1}:
  4
  9
 16

In [13]:
sum(v)


Out[13]:
30

Loops

Standard for loops:


In [14]:
for i=1:3
    println(i)
end


1
2
3

Iterate over array:


In [15]:
for s in ["Hello", "Salaam aleekum", "Hola", "Hej", "Bonjour"]
    println(s)
end


Hello
Salaam aleekum
Hola
Hej
Bonjour

Convenient way if you want the index too:


In [16]:
for (i,s) in enumerate(["Hello", "Salaam aleekum", "Hola", "Hej", "Bonjour"])
    println("$i: $s")
end


1: Hello
2: Salaam aleekum
3: Hola
4: Hej
5: Bonjour

Nested loops:


In [17]:
for i=1:2, j=11:12
    println((i,j))
end


(1,11)
(1,12)
(2,11)
(2,12)

And while loops:


In [18]:
x = rand(linspace(-1,1))


Out[18]:
-0.9183673469387755

In [19]:
while -1 < x < 100
    x *= 2
end
x


Out[19]:
-1.836734693877551

Functions

Simple function definition


In [20]:
f(x,y) = x+y-1


Out[20]:
f (generic function with 1 method)

Function call


In [21]:
f(2,8)


Out[21]:
9

A second way to define functions


In [22]:
function g(x)
    sin(x)^2+cos(x)^2
end


Out[22]:
g (generic function with 1 method)

In [23]:
g(2)


Out[23]:
1.0

Default values for function arguments


In [24]:
normpdf(x, μ=0, σ=1) = 1/(σ*√(2π)) * exp(-(x-μ)^2/(2σ^2))


Out[24]:
normpdf (generic function with 3 methods)

Call with default parameters.


In [25]:
normpdf(2)


Out[25]:
0.05399096651318806

Call with μ = -1


In [26]:
normpdf(2,-1)


Out[26]:
0.0044318484119380075

Keyword arguments

Example - sort a list by reverse order:


In [27]:
?sort


search:
Out[27]:
sort(A, dim, [alg=<algorithm>,] [by=<transform>,] [lt=<comparison>,] [rev=false])

Sort a multidimensional array A along the given dimension.

sort(v, [alg=<algorithm>,] [by=<transform>,] [lt=<comparison>,] [rev=false])

Variant of sort! that returns a sorted copy of v leaving v itself unmodified.

 sort sort! sortrows sortperm sortcols sortperm! Cshort issorted


In [28]:
a = rand(1:100,5)


Out[28]:
5-element Array{Int64,1}:
 71
 27
 69
 48
 75

In [29]:
sort(a)


Out[29]:
5-element Array{Int64,1}:
 27
 48
 69
 71
 75

Specified as "keyword=value" in the argument list.


In [30]:
sort(a,rev=true)


Out[30]:
5-element Array{Int64,1}:
 75
 71
 69
 48
 27

Define them like this:

function f(x, y, z; something=2, somethingelse="hello")
    ...
end

Multiple return values


In [64]:
sincos(α) = (sin(α),cos(α))

x,y = sincos(π/3)


Out[64]:
(0.8660254037844386,0.5000000000000001)

More on return values

The last statement of a function is the return value.


In [32]:
function h(x)
    x = x+1
    x*x
end

h(3)


Out[32]:
16

It works for if statements too.


In [33]:
function sgn(x)
    if x<0
        -1
    elseif x>0
        1
    else
        0
    end
end


Out[33]:
sgn (generic function with 1 method)

In [34]:
println(sgn(-3))
println(sgn(5.2))
println(sgn(0))


-1
1
0

Functions as parameters


In [35]:
list = ["This", "is", "Julia"]


Out[35]:
3-element Array{ASCIIString,1}:
 "This" 
 "is"   
 "Julia"

map applies the function to every element in the array.


In [36]:
map(length,list)


Out[36]:
3-element Array{Int64,1}:
 4
 2
 5

Anonymous functions

Also called lambda calculus.


In [37]:
map( x->'s' in x, list )


Out[37]:
3-element Array{Bool,1}:
  true
  true
 false

Bonus - we can use mathematical symbols.


In [38]:
map( x->'s'x, list )


Out[38]:
3-element Array{Bool,1}:
  true
  true
 false

Operators are functions

Just with a convenient syntax.


In [39]:
1+4


Out[39]:
5

Is the same as


In [40]:
+(1,4)


Out[40]:
5

In [41]:
v = [1, 2, 3, 4]
u = [100, 200, 300, 400]

map(+,u,v)


Out[41]:
4-element Array{Int64,1}:
 101
 202
 303
 404

That is, we can use operators anywhere we can use functions.

Julia Coding style and conventions

More here: http://docs.julialang.org/en/release-0.4/manual/style-guide/

Naming conventions

Modules and types

module SparseMatrix
type MyType

Functions

Lowercase:

function maximum()

Multiple words together:

function isequal()

Underscores when necessary and for combinations of concepts:

function remotecall_fetch()

Avoid abbrevations:

function indexin()

(not idxin())

Functions modifying input variables

Functions with ! at the end of the name may make changes to the input variables.


In [42]:
u = [1, 2, 3]


Out[42]:
3-element Array{Int64,1}:
 1
 2
 3

In [43]:
fill!(u,-1)
u


Out[43]:
3-element Array{Int64,1}:
 -1
 -1
 -1

Example from Julia Base

function eye(T::Type, m::Integer, n::Integer)
    a = zeros(T,m,n)
    for i = 1:min(m,n)
        a[i,i] = one(T)
    end
    return a
end

Multiple Dispatch

A central part of the Julia language design. Several versions of the same function can coexist.

Relies on types, which we will cover next session...

Longer summary here: http://nbviewer.jupyter.org/gist/StefanKarpinski/b8fe9dbb36c1427b9f22

What is dispatch?

We are used to dispatch in many situations.

Compare $$\exp: \mathbb{R} \to \mathbb{R}$$ to $$\exp: \mathbb{C} \to \mathbb{C}$$

Mathematically, these are different functions, but we refer to them by the same name. Many programming languages handle this distinction well.


In [44]:
exp(2)


Out[44]:
7.38905609893065

In [45]:
exp(1+2im)


Out[45]:
-1.1312043837568135 + 2.4717266720048188im

Multiple dispatch generalizes this to multiple arguments. (This is not common in other programming languages.)

Example


In [46]:
f(a, b) = "fallback"
f(a::Number, b::Number) = "a and b are both numbers"
f(a::Number, b) = "a is a number"
f(a, b::Number) = "b is a number"
f(a::Integer, b::Integer) = "a and b are both integers"


Out[46]:
f (generic function with 5 methods)

In [47]:
f(0.3,5)


Out[47]:
"a and b are both numbers"

In [48]:
f(2,"hello")


Out[48]:
"a is a number"

In [49]:
f("abc",2)


Out[49]:
"b is a number"

In [50]:
f(10,12)


Out[50]:
"a and b are both integers"

In [51]:
f([1,2,3], 1.9)


Out[51]:
"b is a number"

In [52]:
f([1,2,3],"abc")


Out[52]:
"fallback"

Julia finds the most specific parameter pattern that applies and calls that function.

Each implementation of a function is called a method.


In [53]:
methods(f)


Out[53]:
5 methods for generic function f:
  • f(a::Integer, b::Integer) at In[46]:5
  • f(a::Number, b::Number) at In[46]:2
  • f(a::Number, b) at In[46]:3
  • f(a, b::Number) at In[46]:4
  • f(a, b) at In[46]:1

This is everywhere in Julia. Even + is a generic function.


In [55]:
methods(+)


Out[55]:
171 methods for generic function +:

How should we think about Multiple Dispatch?

Functions describe a protocol or an idea.

Packages

List of packages: http://pkg.julialang.org

Install packages:

Pkg.add("PackageName")

Import packages:

using PackageName

Some useful packages

  • Clustering
  • CSV
  • Convex
  • DataArrays
  • DataFrames
  • Distributions
  • ExcelReaders
  • FastaIO
  • MAT
  • MATLAB
  • MLBase
  • MixedModels
  • MultivariateStats
  • PValueAdjust
  • PyCall
  • RCall
  • RDatasets
  • Stan

Useful Collections of Packages

Development environments

Juliabox - https://www.juliabox.org

IJulia (locally)

Julia repl + good text editor (sublime?)

Juno (IDE for Julia)