Julia Packages and Packaging System

Chase Coleman & Spencer Lyon

3-4-16

Managing Packages

One of the tools that Julia provides is a built in package manager.

All of the package manager commands are within the Pkg module, and are called by Pkg.command(arg)

Two types of packages

  • Registered: Mature package that has some sense of approval from the community
  • Unregistered: Less mature package that maybe is still developing basic functionality

Adding and Removing Registered Packages

Registered packages can be added and removed by using Pkg.add("PackageName") and Pkg.rm("PackageName")


In [ ]:
# Pkg.add("Distributions")

In [ ]:
# Pkg.rm("Distributions")

Adding and Removing Unregistered Packages

Unregistered packages can be added by using Pkg.clone("git_repo_url") and are removed with Pkg.rm("PackageName")


In [ ]:
Pkg.rm("PlotlyJS")

In [1]:
Pkg.clone("https://github.com/spencerlyon2/PlotlyJS.jl.git")


INFO: Cloning PlotlyJS from https://github.com/spencerlyon2/PlotlyJS.jl.git
INFO: Computing changes...
INFO: No packages to install, update or remove
INFO: Package database updated

In [ ]:

Updating Packages

Packages can be updated to their most recent version by using Pkg.update()

Running this command will update:

  • Your local METADATA, which tracks all versions of registered packages
  • Registered packages to latest version
  • Unregistered packages to most recent commit on active branch

Doesn't update "dirty" packages (git status $\neq$ clean)

Distributions.jl

This is THE package for dealing with distributions and random variables.

It is also an excellent example of how to properly leverage multiple dispatch over different types.

We will demonstrated how to use this packge.


In [3]:
using Distributions

What Distributions Are Included?

There are ~70 different distributions that are included in the package.

Hard to find distributions that are not included


In [ ]:
ncud = length(subtypes(Distributions.ContinuousUnivariateDistribution))
ntd = length(subtypes(Distributions.Distribution))

println("There are $(ncud) Continuous Univariate Distributions")
println("There are $(ntd) Total Distributions")

Using Distributions

Our example will not demonstrate everything that Distributions.jl can do.

It does much much more, but it will give you an idea of the types of things that you can do with it -- you can read the docs for more information.

Example Methods

You can find a more complete list of possible methods in the documentation, but we list a few of the methods below to give you an idea of what is included:

  • Parameter Retrieval: params, scale, shape, dof
  • Standard Statistics: mean, median, std, skewness, kurtosis, entropy, mgf
  • Probability Evaluation: insupport, pdf, cdf, likelihood, quantile

Below we create two distributions that we will play with


In [ ]:
nrv = Normal(0.0, 1.0)
tdist = TDist(5)

Evaluating Statistics


In [ ]:
for f in (:mean, :median, :std, :skewness, :kurtosis, :entropy)
    ftup = (eval(f)(nrv), eval(f)(tdist))
    println("Normal and T ", f, " are ", ftup)
end

In [ ]:
for f in (mean, median, std, skewness, kurtosis, entropy)
    ftup = (f(nrv), f(tdist))
    println("Normal and T ", f, " are ", ftup)
end

Evaluating Probabilities


In [ ]:
for f in (:pdf, :logpdf, :cdf, :logcdf, :quantile)
    ftup = (eval(f)(nrv, 0.25), eval(f)(tdist, 0.25))
    println("Normal and T ", f, " are ", ftup)
end

PlotlyJS.jl

Many usable plotting packages in Julia, but no standard package (yet).

One that seems promising (and feels natural) in Julia is PlotlyJS.jl.

Disclaimer: Spencer is involved in developing PlotlyJS.jl -- Chase is writing this so don't worry, this is an unbiased opinion.

Flexibility

We will only cover a fraction of what this library can do.

For more information see: The examples page and the plot attribute page

Other good plotting options include: PyPlot.jl, Gadfly, and Plots.jl


In [2]:
using PlotlyJS


INFO: Recompiling stale cache file /Users/Felipe/.julia/lib/v0.4/Blink.ji for module Blink.
INFO: Recompiling stale cache file /Users/Felipe/.julia/lib/v0.4/Mux.ji for module Mux.
INFO: Recompiling stale cache file /Users/Felipe/.julia/lib/v0.4/HttpServer.ji for module HttpServer.
INFO: Recompiling stale cache file /Users/Felipe/.julia/lib/v0.4/WebSockets.ji for module WebSockets.

Plotly javascript loaded.

Making a simple plot

We will first make a line plot because that will be required for your homework

In our next slide we will create a short function which takes a distribution and plots its pdf.


In [4]:
using Distributions
function plot_distribution(d::Distribution)
    p_001, p_999 = quantile(d, 1e-3), quantile(d, 1-1e-3)
    x = collect(linspace(p_001, p_999, 100))
    y = pdf(d, x)
    t1 = scatter(;x=x, y=y, showlegend=false)
    
    return t1
end


Out[4]:
plot_distribution (generic function with 1 method)

In [5]:
plot(plot_distribution(Normal(0, 1)))


Out[5]:

Making a Histogram

PlotlyJS also supports making histograms.

Below we will create a short function which takes a distribution and plots a histogram of random draws.


In [6]:
function hist_distribution(d::Distribution, N=10_000)
    y = rand(d, N)

    t2 = histogram(;x=y, histnorm="probability density",
                    showlegend=false, nbinsx=250, opacity=0.6)
    return t2
end


Out[6]:
hist_distribution (generic function with 2 methods)

In [7]:
plot(hist_distribution(Normal(0, 1)))


Out[7]:

In [8]:
function multiple_surface()
    z1 = Vector[[8.83, 8.89, 8.81, 8.87, 8.9, 8.87],
                [8.89, 8.94, 8.85, 8.94, 8.96, 8.92],
                [8.84, 8.9, 8.82, 8.92, 8.93, 8.91],
                [8.79, 8.85, 8.79, 8.9, 8.94, 8.92],
                [8.79, 8.88, 8.81, 8.9, 8.95, 8.92],
                [8.8, 8.82, 8.78, 8.91, 8.94, 8.92],
                [8.75, 8.78, 8.77, 8.91, 8.95, 8.92],
                [8.8, 8.8, 8.77, 8.91, 8.95, 8.94],
                [8.74, 8.81, 8.76, 8.93, 8.98, 8.99],
                [8.89, 8.99, 8.92, 9.1, 9.13, 9.11],
                [8.97, 8.97, 8.91, 9.09, 9.11, 9.11],
                [9.04, 9.08, 9.05, 9.25, 9.28, 9.27],
                [9, 9.01, 9, 9.2, 9.23, 9.2],
                [8.99, 8.99, 8.98, 9.18, 9.2, 9.19],
                [8.93, 8.97, 8.97, 9.18, 9.2, 9.18]]
    z2 = map(x->x+1, z1)
    z3 = map(x->x-1, z1)
    trace1 = surface(z=z1)
    trace2 = surface(z=z2, showscale=false, opacity=0.9)
    trace3 = surface(z=z3, showscale=false, opacity=0.9)
    plot([trace1, trace2, trace3])
end


Out[8]:
multiple_surface (generic function with 1 method)

In [9]:
multiple_surface()


Out[9]:

Setting Plot Attributes

We often want to add titles, labels, or other information to a plot.

Here it makes sense to mention that PlotlyJS constructs figures in two parts.

  • traces: Stores plot data and how it should be displayed
  • Layout: Figure wide settings

Let's write another function that combines two traces from the previous functions and adds layout information.


In [6]:
function full_plot_distribution(d::Distribution, N=10000;
                                xlim=(quantile(d, 1e-3), quantile(d, 1-1e-3)))
    # Create multiple traces which will go on plot
    t1 = plot_distribution(d)
    t2 = hist_distribution(d, N)

    # Create layout
    l = Layout(;title="$(typeof(d))", 
                xaxis_range=xlim, xaxis_title="x",
                yaxis_title="Probability Density of x",
                xaxis_showgrid=true, yaxis_showgrid=true,
                legend_y=1.15, legend_x=0.7)
    
    return plot([t1, t2], l)
end


Out[6]:
full_plot_distribution (generic function with 2 methods)

In [7]:
full_plot_distribution(Normal(0, 1))


Out[7]:

Subplots

Combine plots in the same way you would build an array.


In [ ]:
p1 = full_plot_distribution(Normal(0, 1), xlim=(-3, 3))
p2 = full_plot_distribution(TDist(5), xlim=(-3, 3))
[p1 p2]

Interpolations

It is important to interpolate. Interpolations.jl is an extremely fast interpolation package that is based around using splines.

Have a look at their benchmarks


In [ ]:
using Interpolations

Create Interpolator

There are multiple types of interpolators. We will focus on BSplines().

See the docs for information on the other types.

Create Interpolator

Interpolators by default are only defined on [1, Npts]

BSpline(Linear()) specifies the type of interpolation you want

OnGrid() specifies where the points lie


In [ ]:
x = linspace(-1.0, 1.0, 50)
y = sin(collect(x))
itp = interpolate(y, BSpline(Linear()), OnGrid())
diff = maxabs([itp[i] for i in 1:50] - y)
println("The max absolute difference is: ", diff)

Change Interpolator Scale

Since interpolators are defined by default on [1, Npts] we need to change it to our domain

We will use the scale function to do that.


In [ ]:
itp_scaled = scale(itp, x)
diff_scaled = maxabs([itp_scaled[el] for el in x] - y)
println("The max absolute difference is: ", diff_scaled)

Evaluate Derivatives

We can evaluate the derivatives of splines


In [ ]:
gradient(itp_scaled, 0.0)

Interpolations.jl Speed

Evaluate a linear spline on 1,000,000 points

  • scipy.InterpolatedUnivariateSpline : 12.4 ms
  • Interpolations.jl : 1.2 $\mu s$

Here are some other recommended packages

Some are useful, others fun

They appear in no particular order

  • [DataFrames.jl](https://github.com/JuliaStats/DataFrames.jl)
  • [NLopt.jl](https://github.com/JuliaOpt/NLopt.jl)
  • [NLsolve.jl](https://github.com/EconForge/NLsolve.jl)
  • [Optim.jl](https://github.com/JuliaOpt/Optim.jl)
  • [HDF5.jl](https://github.com/JuliaLang/HDF5.jl)
  • [JLD.jl](https://github.com/JuliaLang/JLD.jl)
  • [QuantEcon.jl](https://github.com/QuantEcon/QuantEcon.jl)
  • [Gadfly.jl](https://github.com/dcjones/Gadfly.jl)
  • [PyPlot.jl](https://github.com/stevengj/PyPlot.jl)
  • [Distances.jl](https://github.com/JuliaStats/Distances.jl)
  • [IJulia.jl](https://github.com/JuliaLang/IJulia.jl)
  • [Interact.jl](https://github.com/JuliaLang/Interact.jl)
  • [DistributedArrays.jl](https://github.com/JuliaParallel/DistributedArrays.jl)

In [ ]: