Julia Packages and Packaging System

Chase Coleman & Spencer Lyon

3-4-16

Managing Packages

One of the tools that Julia provides is a built in package manager.

All of the package manager commands are within the Pkg module, and are called by Pkg.command(arg)

Two types of packages

  • Registered: Mature package that has some sense of approval from the community
  • Unregistered: Less mature package that maybe is still developing basic functionality

Adding and Removing Registered Packages

Registered packages can be added and removed by using Pkg.add("PackageName") and Pkg.rm("PackageName")


In [ ]:
# Pkg.add("Distributions")

In [ ]:
# Pkg.rm("Distributions")

Adding and Removing Unregistered Packages

Unregistered packages can be added by using Pkg.clone("git_repo_url") and are removed with Pkg.rm("PackageName")


In [ ]:
# Pkg.rm("PlotlyJS")

In [ ]:
# Pkg.clone("https://github.com/spencerlyon2/PlotlyJS.jl.git")

Updating Packages

Packages can be updated to their most recent version by using Pkg.update()

Running this command will update:

  • Your local METADATA, which tracks all versions of registered packages
  • Registered packages to latest version
  • Unregistered packages to most recent commit on active branch

Doesn't update "dirty" packages (git status $\neq$ clean)

Distributions.jl

This is THE package for dealing with distributions and random variables.

It is also an excellent example of how to properly leverage multiple dispatch over different types.

We will demonstrated how to use this packge.


In [1]:
using Distributions

What Distributions Are Included?

There are ~70 different distributions that are included in the package.

Hard to find distributions that are not included


In [2]:
ncud = length(subtypes(Distributions.ContinuousUnivariateDistribution))
ntd = length(subtypes(Distributions.Distribution))

println("There are $(ncud) Continuous Univariate Distributions")
println("There are $(ntd) Total Distributions")


There are 46 Continuous Univariate Distributions
There are 66 Total Distributions

Using Distributions

Our example will not demonstrate everything that Distributions.jl can do.

It does much much more, but it will give you an idea of the types of things that you can do with it -- you can read the docs for more information.

Example Methods

You can find a more complete list of possible methods in the documentation, but we list a few of the methods below to give you an idea of what is included:

  • Parameter Retrieval: params, scale, shape, dof
  • Standard Statistics: mean, median, std, skewness, kurtosis, entropy, mgf
  • Probability Evaluation: insupport, pdf, cdf, likelihood, quantile

Below we create two distributions that we will play with


In [3]:
nrv = Normal(0.0, 1.0)
tdist = TDist(5)


Out[3]:
Distributions.TDist(ν=5.0)

Evaluating Statistics


In [4]:
for f in (:mean, :median, :std, :skewness, :kurtosis, :entropy)
    ftup = (eval(f)(nrv), eval(f)(tdist))
    println("Normal and T ", f, " are ", ftup)
end


Normal and T mean are (0.0,0.0)
Normal and T median are (0.0,0.0)
Normal and T std are (1.0,1.2909944487358056)
Normal and T skewness are (0.0,0.0)
Normal and T kurtosis are (0.0,6.0)
Normal and T entropy are (1.4189385332046727,1.6275026724143997)

Evaluating Probabilities


In [5]:
for f in (:pdf, :logpdf, :cdf, :logcdf, :quantile)
    ftup = (eval(f)(nrv, 0.25), eval(f)(tdist, 0.25))
    println("Normal and T ", f, " are ", ftup)
end


Normal and T pdf are (0.38666811680284924,0.36572004265594327)
Normal and T logpdf are (-0.9501885332046728,-1.0058871490503956)
Normal and T cdf are (0.5987063256829237,0.5937329346279383)
Normal and T logcdf are (-0.5129840754094305,-0.521325665725598)
Normal and T quantile are (-0.6744897501960817,-0.7266868438004229)

PlotlyJS.jl

Many usable plotting packages in Julia, but no standard package (yet).

One that seems promising (and feels natural) in Julia is PlotlyJS.jl.

Disclaimer: Spencer is involved in developing PlotlyJS.jl -- Chase is writing this so don't worry, this is an unbiased opinion.

Flexibility

We will only cover a fraction of what this library can do.

For more information see: The examples page and the plot attribute page

Other good plotting options include: PyPlot.jl, Gadfly, and Plots.jl


In [6]:
using PlotlyJS


Plotly javascript loaded.

Making a simple plot

We will first make a line plot because that will be required for your homework

In our next slide we will create a short function which takes a distribution and plots its pdf.


In [7]:
function plot_distribution(d::Distribution)
    p_001, p_999 = quantile(d, 1e-3), quantile(d, 1-1e-3)
    x = collect(linspace(p_001, p_999, 100))
    y = pdf(d, x)
    t1 = scatter(;x=x, y=y, showlegend=false)
    
    return t1
end


Out[7]:
plot_distribution (generic function with 1 method)

In [8]:
plot(plot_distribution(Normal(0, 1)))


Out[8]:

Making a Histogram

PlotlyJS also supports making histograms.

Below we will create a short function which takes a distribution and plots a histogram of random draws.


In [9]:
function hist_distribution(d::Distribution, N=10_000)
    y = rand(d, N)

    t2 = histogram(;x=y, histnorm="probability density",
                    showlegend=false, nbinsx=250, opacity=0.6)
    return t2
end


Out[9]:
hist_distribution (generic function with 2 methods)

In [10]:
plot(hist_distribution(Normal(0, 1)))


Out[10]:

Setting Plot Attributes

We often want to add titles, labels, or other information to a plot.

Here it makes sense to mention that PlotlyJS constructs figures in two parts.

  • traces: Stores plot data and how it should be displayed
  • Layout: Figure wide settings

Let's write another function that combines two traces from the previous functions and adds layout information.


In [11]:
function full_plot_distribution(d::Distribution, N=10000;
                                xlim=(quantile(d, 1e-3), quantile(d, 1-1e-3)))
    # Create multiple traces which will go on plot
    t1 = plot_distribution(d)
    t2 = hist_distribution(d, N)

    # Create layout
    l = Layout(;title="$(typeof(d))", 
                xaxis_range=xlim, xaxis_title="x",
                yaxis_title="Probability Density of x",
                xaxis_showgrid=true, yaxis_showgrid=true,
                legend_y=1.15, legend_x=0.7)
    
    return plot([t1, t2], l)
end


Out[11]:
full_plot_distribution (generic function with 2 methods)

In [12]:
full_plot_distribution(Normal(0, 1))


Out[12]:

Subplots

Combine plots in the same way you would build an array.


In [13]:
p1 = full_plot_distribution(Normal(0, 1), xlim=(-3, 3))
p2 = full_plot_distribution(TDist(5), xlim=(-3, 3))
[p1 p2]


Out[13]:

Interpolations

It is important to interpolate. Interpolations.jl is an extremely fast interpolation package that is based around using splines.

Have a look at their benchmarks


In [16]:
using Interpolations

Create Interpolator

There are multiple types of interpolators. We will focus on BSplines().

See the docs for information on the other types.

Create Interpolator

Interpolators by default are only defined on [1, Npts]

BSpline(Linear()) specifies the type of interpolation you want

OnGrid() specifies where the points lie


In [17]:
x = linspace(-1.0, 1.0, 50)
y = sin(collect(x))
itp = interpolate(y, BSpline(Linear()), OnGrid())
diff = maxabs([itp[i] for i in 1:50] - y)
println("The max absolute difference is: ", diff)


The max absolute difference is: 0.0

Change Interpolator Scale

Since interpolators are defined by default on [1, Npts] we need to change it to our domain

We will use the scale function to do that.


In [18]:
itp_scaled = scale(itp, x)
diff_scaled = maxabs([itp_scaled[el] for el in x] - y)
println("The max absolute difference is: ", diff_scaled)


The max absolute difference is: 0.0

Evaluate Derivatives

We can evaluate the derivatives of splines


In [19]:
gradient(itp_scaled, 0.0)


Out[19]:
1-element Array{Float64,1}:
 0.999931

Interpolations.jl Speed

Evaluate a linear spline on 1,000,000 points

  • scipy.InterpolatedUnivariateSpline : 12.4 ms
  • Interpolations.jl : 1.2 $\mu s$

Here are some other recommended packages

Some are useful, others fun

They appear in no particular order

  • [DataFrames.jl](https://github.com/JuliaStats/DataFrames.jl)
  • [NLopt.jl](https://github.com/JuliaOpt/NLopt.jl)
  • [NLsolve.jl](https://github.com/EconForge/NLsolve.jl)
  • [Optim.jl](https://github.com/JuliaOpt/Optim.jl)
  • [HDF5.jl](https://github.com/JuliaLang/HDF5.jl)
  • [JLD.jl](https://github.com/JuliaLang/JLD.jl)
  • [QuantEcon.jl](https://github.com/QuantEcon/QuantEcon.jl)
  • [Gadfly.jl](https://github.com/dcjones/Gadfly.jl)
  • [PyPlot.jl](https://github.com/stevengj/PyPlot.jl)
  • [Distances.jl](https://github.com/JuliaStats/Distances.jl)
  • [IJulia.jl](https://github.com/JuliaLang/IJulia.jl)
  • [Interact.jl](https://github.com/JuliaLang/Interact.jl)
  • [DistributedArrays.jl](https://github.com/JuliaParallel/DistributedArrays.jl)

In [ ]: