Visualization and Learning in Julia

Tom Breloff

https://github.com/tbreloff

Outline

  • Background
  • Julia packages
  • Plots.jl
  • Live demos

My background

  • BA Mathematics and Economics (U. of Rochester)
  • MS Mathematics (NYU Courant Institute)
  • Trader, researcher, quant, developer
  • High speed algorithmic arbitrage trading and market making
  • Machine learning and visualization enthusiast
  • Lifelong programmer (since 4th grade)

My academic interests

  • Machine learning
  • Neural nets
  • Reservoir computing
  • Spiking neural models
  • Visualization
  • Software design
  • Data pipelines

Before Julia

  • Python and C/C++
  • MATLAB and Java (so many files!!)
  • Throughout the years: Mathematica, Go, R, C#, Javascript, Visual Basic/Excel, Lisp, Erlang, ...

Things I like

  • Python
    • Solid packages
    • Easy to get stuff done
  • C/C++
    • Fast (when you put in the effort)
  • MATLAB
    • Great matrix operations
    • Easy visualizations
  • Java
    • Hmmm...
      public static boolean DoTheFunctionNamesReallyNeedToBeLongerThanThatMaryPoppinsSong() {
        return true; 
      }

Why Julia?

  • Easy and concise
  • Fast with little effort
  • Solid vector/matrix support, but more flexible
  • Macros and staged functions
  • Multiple dispatch is powerful
  • so much more!

(Slow clap...)

Julia's Package Ecosystem

Top packages by stars

Package Github Stars 2-week change Type
Gadfly 732 14 Plotting
IJulia 732 11 Workflow
Mocha 496 36 Learning
DataFrames 230 12 Data Structures
PyCall 204 4 Language Wrapper
JuMP 182 5 Optimization
Escher 135 10 GUIs
Optim 131 4 Optimization
Morsel 128 -1 Web (deprecated)
Distributions 125 7 Statistics

Statistics and Learning in Julia

  • Stats (mostly in JuliaStats)
    • StatsBase
    • Distributions
    • DataFrames, DataArrays, NullableArrays
    • MultivariateStats, GLM
    • OnlineStats
    • many more...

Statistics and Learning in Julia

  • Optimization (mostly in JuliaOpt)
    • MathProgBase
    • JuMP
    • Optim
    • Convex
    • NLOpt

Statistics and Learning in Julia

  • Machine learning
    • Mocha
    • MXNet
    • GeneticAlgorithms
    • Orchestra
    • TextAnalysis
    • Clustering
    • many more...

Visualization in Julia

Gadfly, PyPlot, Vega, Winston, UnicodePlots, Qwt, Bokeh, Immerse, GLPlot ...

  • Interactive: Immerse, PyPlot, Qwt
  • Fast: GLPlot
  • Easy/concise: UnicodePlots, Winston, Qwt
  • Pretty: Gadfly, Vega, Bokeh
  • Native: Gadfly, Winston, UnicodePlots
  • Features: PyPlot

Why do I have to choose one?!?

What makes good code design?

Good design: AbstractArray

Many concrete array-types:

  • Dense arrays
  • Sparse arrays
  • Ranges
  • Distributed arrays
  • Shared arrays
  • GPU arrays
  • Custom data structures

Common code is implemented once for AbstractArray, and all concrete types get the benefit.


In [52]:
type ScaryVec <: AbstractArray{Int,1}
    boo::Int
    n::Int
    ScaryVec(n::Integer) = new(rand(1:n), n)
end
Base.size(sv::ScaryVec) = (sv.n,)
Base.getindex(sv::ScaryVec, i::Integer) = (i == sv.boo ? "BOO!" : i)

sv = ScaryVec(5)


Out[52]:
5-element ScaryVec:
 1      
  "BOO!"
 3      
 4      
 5      

In [53]:
filter(x -> isa(x, Number), sv)


Out[53]:
4-element Array{Int64,1}:
 1
 3
 4
 5

Good design: AbstractArray

  • Inheriting from AbstractArray gives you a lot "for free":
    • Iteration (map, for x in ..., filter, ...)
    • Operations
    • Printing
    • etc
  • Few methods to implement... only what's needed.
  • Abstractions put overlapping functionality in one place
    • Easy to code
    • Easy to maintain

Imagine if there were no AbstractArray...

Gadfly : ____________ :: ScaryVector : AbstractArray

Thinking of graphics packages as concrete types, we see that we have many different types, but no abstraction linking them together.

Plots.jl

The AbstractArray of plotting...


In [5]:
# setup... choose Gadfly as the backend, set some session defaults
using Plots
gadfly()
default(size=(600,500), legend=false)

# create parametric functions
fx(u) = 1.6sin(u)^3
fy(u) = 0.3 + 1.5cos(u) - 0.6cos(2u) - 0.25cos(3u) - cos(4u)/8

# plot and annotate
p = plot(fx, fy, 0, 2π, line=(5,:darkred), xlim=(-2,2), ylim=(-2,2))
annotate!(0, 0.25, text(" I ♡\nPlots", 45, -0.1π, :darkred));

In [2]:
p



-2 -1 0 1 2 I ♡Plots -2 -1 0 1 2
Out[2]:

In [3]:
# use the same parametric functions to create a custom marker shape
us = linspace(0, 2π, 100)
heart = Shape([(fx(u), fy(u)) for u in us])

# generate some data
n = 50
xy() = 4rand(2) - 2

# add a title
title!("Let me count the ways...")

# add a new series
scatter!(1, z=1:n, marker=(heart,15,:reds))

# animations!
anim = Animation()
for i in 1:n
    x, y = xy()
    
    # add to a series after creation
    push!(p, 2, x, y)
    
    # easy annotations
    annotate!(x, y, text(i))
    
    # save an animation frame
    frame(anim)
end

In [4]:
gif(anim, "iheartplots.gif", fps=3)


INFO: Saved animation to /Users/tom/.julia/v0.4/ExamplePlots/examples/meetup/iheartplots.gif
Out[4]:

One problem...

When the abstract comes after the concrete, it's a lot more work. Oops. Better late than never!!

OnlineStats/OnlineAI

  • What is "Online"?
  • Why is it important?

Fun with data - UCI Wine Quality Dataset

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

Come collaborate:

  • Plots.jl
  • OnlineStats.jl
  • OnlineAI.jl
  • LearnBase.jl
  • Unums.jl

or get in touch: