Julia from Pythonic perspective

Julia and Python have a lot of similarities...

https://twitter.com/profjsb/status/523325641702117377

Julia has similar semantics to Python

One of the best things about coming to Julia from Python is that the languages are quite similar in semantics. Specifically, the way variables are assigned and passed to functions is identical. While you have to remember the surface syntax differences, you don't have to re-learn how to think about your code.

Assignment of names



In [5]:

    
a = [1.0, 2.0, 3.0, 4.0] # some array









    Out[5]:





4-element Array{Float64,1}:
 1.0
 2.0
 3.0
 4.0



In [4]:

    
b = a  # assign the name "b" to the same array that 'a' is pointing to.
b[1] = 5.0  # modify the first element in that array
a  # change is reflected in a









    Out[4]:





4-element Array{Float64,1}:
 5.0
 2.0
 3.0
 4.0



In [9]:

    
# define a function that modifies an array
function double!(x)
    for i=1:length(x)
        x[i] *= 2.0
    end
end









    Out[9]:





double! (generic function with 1 method)



In [11]:

    
a = [1.0, 2.0, 3.0, 4.0]









    Out[11]:





4-element Array{Float64,1}:
 1.0
 2.0
 3.0
 4.0



In [12]:

    
double!(a)
println(a)  # modification is reflected to caller, because there was only ever one array!









    



[2.0,4.0,6.0,8.0]



In [14]:

    
# do it again just for fun.
double!(a)
println(a)









    



[8.0,8.0,12.0,16.0]

Your hard work learning Python will transfer well to Julia.

For more on how both languages treat names and values, http://nedbatchelder.com/text/names1.html is a great reference.

Julia unifies Python "lists" and ndarrays

In Python, most of us are heavy users of numpy, which provides a ndarray class for homogenous arrays. On the other hand, we also have Python's built-in list type, which are heterogeneous 1-d arrays. It can sometimes be awkward dealing with two types that have such overlapping functionality. I end up starting a lot of functions with x = np.asarray(x).

In Julia, heterogeneous and homogeneous arrays are unified into a single (parameterized) type:



In [19]:

    
# equivalent of Python list or ndarray with dtype='object'
a = [1.0, 2, "three", 4+0im]









    Out[19]:





4-element Array{Any,1}:
  1.0     
  2       
   "three"
 4+0im



In [17]:

    
typeof(a)  # a is an array of heterogenous objects









    Out[17]:





Array{Any,1}



In [18]:

    
map(typeof, a)









    Out[18]:





4-element Array{Any,1}:
 Float64       
 Int64         
 ASCIIString   
 Complex{Int64}



In [22]:

    
# equivalent of Python ndarray with dtype=float64
b = [1.0, 2.0, 3.0, 4.0]
typeof(b)









    Out[22]:





Array{Float64,1}



In [24]:

    
typeof(b)









    Out[24]:





Array{Float64,1}



In [27]:

    
# array only takes up 4 * 8 bytes, just as a 
sizeof(b)









    Out[27]:





32

Arrays easily extensible to new "dtypes"

You can't do this efficiently in NumPy, though people are working on this.



In [29]:

    
immutable Point
    x::Float64
    y::Float64
end



In [31]:

    
x = [Point(0., 0.), Point(0., 0.), Point(0., 0.)]









    Out[31]:





3-element Array{Point,1}:
 Point(0.0,0.0)
 Point(0.0,0.0)
 Point(0.0,0.0)



In [33]:

    
sizeof(x)  # points are stored efficiently in-line









    Out[33]:





48

This often means that you can design the code much more naturally than in Python. For performance in Python, you'd have to do something like

class Points(object):
    """A container for two arrays giving x and y coordinates."""

    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __getattr__(self, i):
        return (self.x[i], self.y[i])

    # ... other methods that operate element-wise

What you really want is a Point object, but if you write classes that way in Python, performance will suffer.

Real-world example of this pattern: https://github.com/kbarbary/SkyCoords.jl

Sane built-in package manager

Packages and managing dependencies are super important. Julia's Pkg is declarative (like conda). It's not the mess that pip is!

Pkg.add("Cosmology")

would add "Cosmology" to the requirements:



In [36]:

    
;cat ~/.julia/v0.4/REQUIRE









    



IJulia
Cosmology
ERFA
ForwardDiff
Requests
HTTPClient
DocOpt
Example
Gadfly
Winston

Julia figures out dependencies and installs the optimal version of every package to satisfy dependencies minimally



In [42]:

    
;ls ~/.julia/v0.4









    



AperturePhotometry
ArrayViews
Benchmarks
BinDeps
Blosc
BufferedStreams
Cairo
Calculus
Celeste
Clustering
Codecs
Colors
ColorTypes
Compat
Compose
Conda
Contour
Cosmology
DataArrays
DataFrames
DataStructures
Dates
Dierckx
Distances
Distributions
Docile
DocOpt
DualNumbers
DustExtinction
ERFA
Example
FileIO
FITSIO
FixedPointNumbers
ForwardDiff
Gadfly
GaussianMixtures
Graphics
Grid
GZip
HDF5
Hexagons
HTTPClient
HttpCommon
HttpParser
IJulia
ImmutableArrays
IniFile
Iterators
JLD
JSON
KernelDensity
LibCURL
Libz
Loess
MbedTLS
Measures
META_BRANCH
METADATA
NaNMath
NestedSampling
Nettle
Optim
PDMats
PSFModels
Reexport
Requests
REQUIRE
SHA
Showoff
SkyCoords
SloanDigitalSkySurvey
SortingAlgorithms
SourceExtract
StatsBase
StatsFuns
TimeIt
Tk
URIParser
WCS
Winston
WoodburyMatrices
ZMQ

Writing performance-sensitive code (the big win)

Suppose you're doing some array operations, and it turns out to be a bottleneck:



In [43]:

    
# two 200 x 200 matricies
n = 200
A = rand(n, n)
B = rand(n, n);



In [44]:

    
f(A, B) = 2A + 3B + 4A.*A  # function we want to optimize









    Out[44]:





f (generic function with 2 methods)



In [45]:

    
using TimeIt



In [46]:

    
@timeit f(A, B);









    



1000 loops, best of 3: 312.71 µs per loop

Python version

We get similar performance in Python:

In [5]: n = 200

In [6]: from numpy.random import rand

In [7]: A = rand(n, n);

In [8]: B = rand(n, n);

In [9]: %timeit 2 * A + 3 * B + 4 * A * A
1000 loops, best of 3: 354 µs per loop

But if needed to optimize this further, we'd have to reach for a specialized tool such as cython, numba, ...

Optimize in Julia

Using loops



In [57]:

    
function f2(A, B)
    length(A) == length(B) || error("array length mismatch")
    C = similar(A, promote_type(eltype(A),eltype(B)))
    @inbounds for i=1:length(C)
        C[i] = 2A[i] + 3B[i] + 4A[i]*A[i]
    end
    return C
end









    Out[57]:





f2 (generic function with 1 method)



In [58]:

    
@timeit f2(A, B);









    



10000 loops, best of 3: 50.29 µs per loop

Using loops and pre-allocated memory



In [54]:

    
function f3!(A, B, C)
    length(A) == length(B) == length(C) || error("array length mismatch")
    @inbounds for i=1:length(C)
        C[i] = 2A[i] + 3B[i] + 4A[i]*A[i]
    end
end









    Out[54]:





f3! (generic function with 1 method)



In [56]:

    
C = similar(A, promote_type(eltype(A),eltype(B)))
@timeit f3!(A, B, C);









    



10000 loops, best of 3: 33.98 µs per loop

Julia Downsides

Less mature package ecosystem

But rapidly expanding. Plus, PyCall is pretty good.

Slower start-up

Julia is about 5x - 10x slower on startup than Python (but this will probably improve in the future). It's not great for very short running scripts.
Module loading is still generally slower than in Python (this is improving with "precompilation" and will probably improve more in the future).

Dynamically dispatched code is slower than Python

If Julia can't infer a concrete type, it can be quite a bit slower than Python.

Small arrays are slower than Python

If you write Python-style array-oriented code in Julia, it is likely going to be a bit slower.
Python's memory management is really very good.

Binary dependency story is pretty good, but not as foolproof as conda

However, it's easier to make Julia packages than conda packages, and there's no dealing with two separate package managers or separate conda channels.

The language is still changing

Be ready to update your code once every ~9 months for the next few years. (Usually straightforward)