Packages and Julia: a case study using ApproXD.jl

  • What are packages?
  • What are modules?
  • Who can write a package?
    • Should you write a package?
  • Where can I find them?

Agenda for today

  1. What are modules?
  2. What is ApproXD.jl good for?
  3. using ApproXD.jl
  4. Another great interpolation package: Grid.jl

1. What are modules?

  • Think of code units: a module groups functions together.
  • Good way to organize code.
    • Some operations (functions) may not be useful for the user to see/use.
    • Variable names: each module is a separate namespace, thus less scope for name clashes
    • keep your workspace tidy
  • Julia packages are modules.

In [1]:
# our little Test module: Testmod
module Testmod
    
    # has a function
    f(x) = x + 23
    function f(x::ASCIIString) 
        println("you say $x?")
        return x
    end

    # maybe a custom type?
    type Testtype
        number::Float64
        word::ASCIIString
    end

    # a function for that type maybe?
    function g(t::Testtype)
    #apply f for both number and word and return both
        num = f(t.number)
        word = f(t.word)
        return (num,word)
    end

end

In [5]:
# let's use it
# can do either 
# "using Testmod" or
# prefix with module name:

#Testtype(2.2,"WHAT")  # does that work?

t = Testmod.Testtype(2.2,"WHAT");   
Testmod.g(t)


you say WHAT?
Out[5]:
(25.2,"WHAT")
  • The julia manual is an excellent source to consult on more info regarding modules.
  • It's very instructive to look at other people's modules. I recommend DataFrames.jl.
  • We'll look at ApproXD.jl instead.

2. What is ApproXD.jl good for?

The Lininterp object

  • linear interpolation up to 4D
  • caching mechanim for repeat evaluations

In [8]:
# you need to install the package:
# Pkg.clone("https://github.com/floswald/ApproXD.jl")
using ApproXD

# our true function
f(x) = exp(-x.^2./2)

using Gadfly
plot(f,-3,3)


Out[8]:
x -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 -9.0 -8.8 -8.6 -8.4 -8.2 -8.0 -7.8 -7.6 -7.4 -7.2 -7.0 -6.8 -6.6 -6.4 -6.2 -6.0 -5.8 -5.6 -5.4 -5.2 -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8 9.0 -10 -5 0 5 10 -9.0 -8.5 -8.0 -7.5 -7.0 -6.5 -6.0 -5.5 -5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 f(x) -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 -1.00 -0.95 -0.90 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 -1 0 1 2 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

In [9]:
# our grid of points 
g = linspace(-3,3,11)
# our set of values
v = f(g)

# i.e. we actually "know" this
plot(x=g,y=v,Geom.point)


Out[9]:
x -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 -9.0 -8.8 -8.6 -8.4 -8.2 -8.0 -7.8 -7.6 -7.4 -7.2 -7.0 -6.8 -6.6 -6.4 -6.2 -6.0 -5.8 -5.6 -5.4 -5.2 -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8 9.0 -10 -5 0 5 10 -9.0 -8.5 -8.0 -7.5 -7.0 -6.5 -6.0 -5.5 -5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 y -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 -1.00 -0.95 -0.90 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 -1 0 1 2 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

In [10]:
# and so now lets connect the dots
# must give an array of grids, even if just one
gs = Array{Float64,1}[]
push!(gs,g)
L = Lininterp(v,gs)


Out[10]:
linear interpolation object
dimensions: [11]
approximates 1 functions at given point
has active cache: false
currently evaluates functions [1]
accelerator hits 0% of attemps

In [11]:
newx = linspace(-3,3,100)
newv = zeros(100)
for i in 1:100 
    newv[i] = getValue(L,[newx[i]])[1]   # we want the "first" returned value
end;

# let's see:
plot(layer(f,-3,3),
layer(x=g,y=v,Geom.point,Theme(default_color=color("red"))),
layer(x=newx,y=newv,Geom.line,Theme(default_color=color("red"))))


Out[11]:
x -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 -9.0 -8.8 -8.6 -8.4 -8.2 -8.0 -7.8 -7.6 -7.4 -7.2 -7.0 -6.8 -6.6 -6.4 -6.2 -6.0 -5.8 -5.6 -5.4 -5.2 -5.0 -4.8 -4.6 -4.4 -4.2 -4.0 -3.8 -3.6 -3.4 -3.2 -3.0 -2.8 -2.6 -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8 9.0 -10 -5 0 5 10 -9.0 -8.5 -8.0 -7.5 -7.0 -6.5 -6.0 -5.5 -5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 y -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 -1.00 -0.95 -0.90 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 -1 0 1 2 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

In [13]:
g


Out[13]:
11-element Array{Float64,1}:
 -3.0
 -2.4
 -1.8
 -1.2
 -0.6
  0.0
  0.6
  1.2
  1.8
  2.4
  3.0

In [14]:
# what about more dimensions?

f(x,y) = 0.75.*x + 1.13.*y    # choose a linear function so that we know true values everywhere
xg,yg  = linspace(-3,3,11),linspace(0,3,14)
v      = Float64[f(i,j) for i in xg, j in yg]
gs     = Array{Float64,1}[]

push!(gs,xg,yg)
L = Lininterp(v,gs)


Out[14]:
linear interpolation object
dimensions: [11,14]
approximates 1 functions at given point
has active cache: false
currently evaluates functions [1]
accelerator hits 0% of attemps

In [15]:
using FactCheck
facts("here i'm testing getValue") do
    randx = rand() * (3-(-3)) + (-3)  # a random number in [-3,3]
    randy = rand() * (3-(-0)) + (-0)  # a random number in [0,3]
    @fact getValue(L,[randx,randy])[1] => roughly(f(randx,randy),atol=1e-6)
end


here i'm testing getValue
1 fact verified.
Out[15]:
delayed_handler (generic function with 4 methods)

What about more dimensions?

Suppose you have the following Bellman equation:

$$V(a,z,y,p), \\ (a,z,y,p) \in \mathbb{R}^4$$
  • you compute an approximation to this continuous function on a discrete set of points
  • but you want to then evaluate the function away from those grid points.
  • you will choose 4 uni-dimensional grids and compute the function at their tensor grid (or at all combinations)
  • additional to that, your solution produces optimal consumption and savings functions $$c^*(a,z,y,p) \\ s^*(a,z,y,p) \\ (a,z,y,p) \in \mathbb{R}^4$$ which are defined on the same space.
  • The costliest operation in linear interpolation is to find the bracket of grid points that contains a point of interest $x$.
  • The lininterp object has a cache mechanism. It remembers the bracket of the last evaluation and starts to search in that bracket.
  • Additionally, if you want to know the value of several functions at a point, i.e. the values of $V,c^*,s^*$ at $x=(a,z,y,p)$, lininterp allows you to
  • store all function values for $V,c^*,s^*$
  • locate the bracket for $x=(a,z,y,p)$ just once
  • return all three function values

In [16]:
# example from test/test_lininterp.jl
# evaluate 3 4D functions at x
using FactCheck
testfun = function()
	lbs = [1.0,2.0,-1,4]
	ubs = [3.0,5.0,3,18.0]

	gs = Array{Float64,1}[]
	push!(gs, linspace(lbs[1],ubs[1],3))
	push!(gs, linspace(lbs[2],ubs[2],4))
	push!(gs, linspace(lbs[3],ubs[3],5))
	push!(gs, linspace(lbs[4],ubs[4],9))

	myfun1(i1,i2,i3,i4) = 0.5*i1 + 2*i2 + 3*i3 + i4/0.98
	myfun2(i1,i2,i3,i4) = i1 + 0.2*i2 + 3*i3 + i4
	myfun3(i1,i2,i3,i4) = 0.7*i1 + 1.5*i2 + 2*i3 + i4/0.2
	
	vs1 = Float64[ myfun1(i,j,k,m) for i in gs[1], j in gs[2], k in gs[3], m in gs[4] ]
	vs2 = Float64[ myfun2(i,j,k,m) for i in gs[1], j in gs[2], k in gs[3], m in gs[4] ]
	vs3 = Float64[ myfun3(i,j,k,m) for i in gs[1], j in gs[2], k in gs[3], m in gs[4] ]

	vs = Array{Float64}[]
    push!(vs,vs1,vs2,vs3)

	L = Lininterp(vs,gs)
    
    # test
    facts("testing 4D interpolation") do
        x = rand() * (ubs[1]-lbs[1]) + lbs[1]
		y = rand() * (ubs[2]-lbs[2]) + lbs[2]
		z = rand() * (ubs[3]-lbs[3]) + lbs[3]
		w = rand() * (ubs[4]-lbs[4]) + lbs[4]

        v = getValue(L,[x,y,z,w]) # returns all 3 function values!
        
		@fact v[1] - myfun1(x,y,z,w) => roughly(0.0,atol=1e-6)
		@fact v[2] - myfun2(x,y,z,w) => roughly(0.0,atol=1e-6)
		@fact v[3] - myfun3(x,y,z,w) => roughly(0.0,atol=1e-6)
    end
end
testfun()


testing 4D interpolation
3 facts verified.
Out[16]:
delayed_handler (generic function with 4 methods)

The FSpaceXD approximation

  • the package also contains a Spline approximator.
  • Given a set of data on a grid,
    • compute the spline basis functions or arbitrary degree $\Phi$
    • compute the approximating coefficients $c$ such that $$\hat{f}(x_1,x_2,\dots,x_d) = \Phi(x_1,x_2,\dots,x_d) \times c$$
  • under the restriction that spline basis functions are square (i.e. number of knots = number of evaluation points), there is a highly efficient way to compute the otherwise prohibitively expensive tensor product of spline basis functions
    • need to solve equation $$f = \Phi c$$
    • key result: $$ c = [\phi_d^{-1} \otimes \phi_{d-1}^{-1} \otimes \dots \phi_1^{-1}] f $$
  • see package documentation
  • see Miranda and Fackler page 130

In [ ]:
# from test/test_FSpaceXD.jl
tfun = function()
    
    facts("testing FSpaceXD") do
        ndims = 4

        # bounds
        lb = [-1.1,-1.5,-0.9,-1.0]
        ub = [1.2,1.6,0.9,1]

        # number of eval points and basis functions:
        # we require square basis matrices!
        npoints = [13,13,13,13]

        # number of basis funcs
        nbasis = npoints

        # splien degrees
        degs = [3,3,3,3]

        # implies a number of knots for each spline
        # remember the restriction that nknots == ncoefs
        nknots = {i => nbasis[i] - degs[i] + 1 for i=1:ndims}

        # eval points
        points = {i => linspace(lb[i],ub[i],npoints[i]) for i=1:ndims}

        # set up ApproXD
        bsp = Dict{Integer,BSpline}()
        for i in 1:ndims
            bsp[i] = BSpline(nknots[i],degs[i],lb[i],ub[i])
        end

        # set of basis functions
        d = Dict{Integer,Array{Float64,2}}()
        for i=1:ndims
            d[i] = full(getBasis(points[i],bsp[i]))
        end

        # set of INVERSE basis functions
        id = Dict{Integer,Array{Float64,2}}()
        for k in collect(keys(d))
            id[k] = inv(d[k])
        end

        #  get a function
        function f(x,y,z,w) 
            sin(sqrt(x^2+y^2)) + (z-w)^3
        end

        y = Float64[f(i,j,k,w) for i in points[1], j in points[2], k in points[3], w in points[4]]

        yvec = y[:]

        # get coefs using the function
        mycoef = getTensorCoef(id,yvec)

        # setup the FSpace
        fx = FSpaceXD(ndims,mycoef,bsp)

        rval1 = lb[1] + 0.3 
        rval2 = lb[2] + 0.23
        rval3 = lb[3] + 0.111
        rval4 = lb[4] + 0.099
        println("approx value = $(getValue(fx,[rval1,rval2,rval3,rval4]))")
        println("true value = $(f(rval1,rval2,rval3,rval4))")
        @fact getValue(fx,[rval1,rval2,rval3,rval4]) => roughly(f(rval1,rval2,rval3,rval4),atol=3e-3)
    end
end
# run it
tfun()

Another possibility: Grid.jl


In [ ]: