Parametric bootstrap in Julia

Fabrizzio Sanchez suggested bootstrapping a statistical model as a non-trivial task on which to demonstrate the speed of Julia.

In this notebook I will demonstrate a parametric bootstrap, which means that samples are drawn from the probability model with parameters set at their estimated values from the original model, and the model refit.

A simple example is a linear model which can be fit using the GLM package.



In [1]:

using GLM, RDatasets
ls = dataset("datasets","LifeCycleSavings")




WARNING: Base.String is deprecated, use AbstractString instead.
likely near /home/bates/.julia/v0.4/RDatasets/src/dataset.jl:1
WARNING: Base.String is deprecated, use AbstractString instead.
likely near /home/bates/.julia/v0.4/RDatasets/src/dataset.jl:1
WARNING: Base.String is deprecated, use AbstractString instead.
likely near /home/bates/.julia/v0.4/RDatasets/src/datasets.jl:1

Out[1]:




In [2]:

m1 = lm(SR ~ Pop15 + Pop75 + DPI + DDPI, ls)




Out[2]:

DataFrames.DataFrameRegressionModel{GLM.LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}},Float64}:

Coefficients:
Estimate   Std.Error   t value Pr(>|t|)
(Intercept)       28.5661     7.35452   3.88416   0.0003
Pop15           -0.461193    0.144642  -3.18851   0.0026
Pop75             -1.6915      1.0836    -1.561   0.1255
DPI          -0.000336902 0.000931107 -0.361829   0.7192
DDPI             0.409695    0.196197   2.08818   0.0425



We will take the fitted values and the estimate of the scale parameter from this model as the mean and scale parameter of a multivariate normal distribution from which to simulate new response vectors.

Getting the fitted values turned out to be more complicated than I thought it would be because I hadn't written the appropriate method. For now we will use the somewhat mysterious extractor m1.model.rr.mu to get this.

We will use the IsoNormal type of multivariate normal distribution from the Distributions package for simulating the responses.



In [6]:

fitted(m1)




LoadError: fitted is not defined for DataFrames.DataFrameRegressionModel{GLM.LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredQR{Float64}},Float64}.

in error at ./error.jl:21
in fitted at /home/bates/.julia/v0.4/StatsBase/src/statmodels.jl:18




In [3]:

d = IsoNormal(m1.model.rr.mu,scale(m1.model))




LoadError: MethodError: convert has no method matching convert(::Type{PDMats.ScalMat{Float64}}, ::Float64)
This may have arisen from a call to the constructor PDMats.ScalMat{Float64}(...),
since type constructors fall back to convert methods.
Closest candidates are:
PDMats.ScalMat{T<:AbstractFloat}(::Any, !Matched::Any, !Matched::Any)
call{T}(::Type{T}, ::Any)
convert{T}(::Type{T}, !Matched::T)

in call at /home/bates/.julia/v0.4/Distributions/src/multivariate/mvnormal.jl:64



Getting the coefficients.

A simple method of obtaining the coefficients from fitting the model matrix to a new simulated response is the lm method for a model matrix and a response.



In [4]:

X = copy(m1.model.pp.X) # copy the X matrix from the predictor field, pp




Out[4]:

50x5 Array{Float64,2}:
1.0  29.35  2.87  2329.68   2.87
1.0  23.32  4.41  1507.99   3.93
1.0  23.8   4.43  2108.47   3.82
1.0  41.89  1.67   189.13   0.22
1.0  42.19  0.83   728.47   4.56
1.0  31.72  2.85  2982.88   2.43
1.0  39.74  1.34   662.86   2.67
1.0  44.75  0.67   289.52   6.51
1.0  46.64  1.06   276.65   3.08
1.0  47.64  1.14   471.24   2.8
1.0  24.42  3.93  2496.53   3.99
1.0  46.31  1.19   287.77   2.19
1.0  27.84  2.37  1681.25   4.32
⋮
1.0  21.44  4.54  3299.49   3.01
1.0  23.49  3.73  2630.96   2.7
1.0  43.42  1.08   389.66   2.96
1.0  46.12  1.21   249.87   1.13
1.0  23.27  4.46  1813.93   2.01
1.0  29.81  3.43  4001.89   2.45
1.0  46.4   0.9    813.39   0.53
1.0  45.25  0.56   138.33   5.14
1.0  41.12  1.73   380.47  10.23
1.0  28.13  2.72   766.54   1.88
1.0  43.69  2.07   123.58  16.71
1.0  47.2   0.66   242.69   5.08




In [5]:

srand(1234321)
lm(X,rand(d))




Out[5]:

LinearModel{DensePredQR{Float64}}:

Coefficients:        Estimate  Std.Error  t value Pr(>|t|)
x1       28.5583    8.82106  3.23751   0.0023
x2     -0.445019   0.173485 -2.56518   0.0137
x3      -2.39471    1.29968 -1.84255   0.0720
x4   0.000373479 0.00111678 0.334426   0.7396
x5      0.449379    0.23532  1.90965   0.0626



Suppose that we want the coefficient estimates, $\widehat{\beta}$, and the scale parameter estimate, $\widehat{\sigma}$, from, say, 10000 simulated responses for this model.

For a single fit we can extract both of these and concatenate them as a vector



In [6]:

sm1 = lm(X,rand(d))
[coef(sm1), scale(sm1)]




Out[6]:

6-element Array{Float64,1}:
30.153
-0.509894
-1.85234
-0.00174832
0.737757
4.1995



Let's express this as a function



In [7]:

samp1(X::Matrix,d) = (sm = lm(X,rand(d)); [coef(sm), scale(sm)])




Out[7]:

samp1 (generic function with 1 method)




In [8]:

samp1(X,d)




Out[8]:

6-element Array{Float64,1}:
38.8541
-0.657691
-2.45837
-0.00162179
0.277361
3.63202



Creating the parametric bootstrap sample

The simplest way to create a parametric bootstrap sample of these parameter estimates is as a comprehension.



In [9]:

srand(1234321)
bss = [samp1(X,d) for i in 1:10000]




Out[9]:

10000-element Array{Any,1}:
[28.5583,-0.445019,-2.39471,0.000373479,0.449379,4.56095]
[35.944,-0.621564,-2.35589,-0.0012479,0.598529,4.14837]
[21.9009,-0.343171,-0.86067,-9.93608e-5,0.338381,4.10657]
[33.1855,-0.601662,-1.77152,-0.000105169,0.398659,3.27316]
[17.9789,-0.231693,-0.386903,-0.000723807,0.371328,3.69092]
[31.442,-0.516094,-2.32955,0.000473851,0.274775,3.83084]
[17.6364,-0.292026,-0.440675,-0.000368788,0.828189,3.94549]
[33.2162,-0.51244,-2.8588,8.82325e-5,0.373106,4.17757]
[12.508,-0.148361,-0.244755,-0.000232026,0.877372,4.6156]
[33.1391,-0.538018,-3.09807,-0.000228105,0.362138,3.62479]
[29.9126,-0.476695,-1.169,-0.000781248,0.175824,4.16407]
[30.0718,-0.479997,-2.0634,-0.000215239,0.455825,4.23013]
[21.1611,-0.370234,-1.25906,0.000697624,0.827424,3.91826]
⋮
[29.7568,-0.503739,-1.68198,-9.8295e-5,0.514151,4.23564]
[40.0399,-0.673688,-2.69232,-0.00111253,0.312692,3.84436]
[26.1465,-0.357683,-1.81777,-0.000553048,0.157578,4.28691]
[32.3954,-0.548962,-2.91014,0.000659737,0.51586,2.50244]
[23.1756,-0.327187,-0.69978,7.07523e-5,0.20963,3.70527]
[29.086,-0.471942,-1.49713,-0.000662799,0.399184,2.99092]
[37.6333,-0.625821,-3.35728,0.000467882,0.255695,4.19263]
[28.9012,-0.483732,-2.56316,0.000625508,0.708579,3.96592]
[26.5054,-0.439392,-2.0225,0.000442202,0.61712,4.27044]
[17.5618,-0.255279,-1.47908,0.00119981,0.885358,3.06848]
[21.5199,-0.329395,-1.27995,0.000940134,0.233601,3.5673]
[28.1422,-0.431297,-1.82568,-0.000431981,0.423766,3.95991]




In [10]:

@time [samp1(X,d) for i in 1:10000];




elapsed time: 2.263262407 seconds (329511888 bytes allocated, 16.73% gc time)



As we see, this produces the sample in the form of an array of vectors, as opposed to, say, a matrix. Furthermore this process is not blazingly fast, allocates a lot of memory and spends about 1/6 of the time in garbage collection.

What is happening here is that all the work of creating the model object and factoring the model matrix is (unnecessarily) being done for every sample.

Using the factorization

As indicated by the rather long name of the object type for the fitted models, a dense QR decomposition is used to obtain the coefficients.



In [11]:

qrd = m1.model.pp.qr;
typeof(qrd)




Out[11]:

QRCompactWY{Float64} (constructor with 1 method)




In [12]:

srand(1234321)
y1 = rand(d)
β = qrd\y1




Out[12]:

5-element Array{Float64,1}:
28.5583
-0.445019
-2.39471
0.000373479
0.449379



We see we get the same coefficient estimates using the pre-computed QR decomposition of the model matrix. To get $\widehat{\sigma}$ we could use these coefficient estimates, generate the fitted values and residuals and assumulate the sum-of-squared residuals. However, there is a faster way using the vector $\bf Q^\prime y$. The coefficient estimates satisfy $\bf R\widehat{\beta}=Q_1^\prime y$ and the residual sum of squares is $\|\bf Q_2^\prime y\|^2$ where $\bf Q_1$ is the first $p$ columns of the $n\times n$ matrix $\bf Q$ and $\bf Q_2$ is the last $n-p$ columns.

This method would not be practical for large $n$ if we needed to generate and store $\bf Q$ explicitly. But we don't.



In [13]:

Q = qrd[:Q]
typeof(Q)




Out[13]:

QRCompactWYQ{Float64} (constructor with 1 method)




In [14]:

R = Triangular(qrd[:R],:U)




Out[14]:

5x5 Triangular{Float64,Array{Float64,2},:U,false}:
-7.07107  -248.121   -16.214    -7825.96  -26.5702
0.0        64.0621   -8.20847  -5244.98   -0.960775
0.0         0.0      -3.77617  -1659.93    0.871339
0.0         0.0       0.0       4224.22   -5.12174
0.0         0.0       0.0          0.0   -19.3819



We can create both $\widehat{\sigma}$ and $\widehat{\beta}$ in a simple function.



In [15]:

function betasigmahat(qrd,y::Vector)
n,p = size(qrd)
length(y) == n || throw(DimensionMismatch(""))
qty = qrd[:Q]'y
sigmahat = sqrt(Base.sumabs2(sub(qty,(p+1):n))/(n-p))
Triangular(qrd[:R],:U)\sub(qty,1:p), sigmahat
end




Out[15]:

betasigmahat (generic function with 1 method)




In [16]:

betasigmahat(qrd,y1)




Out[16]:

([28.5583,-0.445019,-2.39471,0.000373479,0.449379],4.560948146915455)



We can wrap all this in a function to create the bootstrap sample



In [17]:

function boot(m::LinearModel,nsim::Integer)
qrd = m.pp.qr
qmat = qrd[:Q]
rmat = Triangular(qrd[:R],:U)
n,p = size(qrd)
df = float(n-p)
d = IsoNormal(m.rr.mu, scale(m))
betas = Array(Float64,(size(qrd,2),nsim))
sigmas = Array(Float64,nsim)
for j in 1:nsim
qty = qmat'rand(d)
betas[:,j] = rmat\sub(qty,1:p)
sigmas[j] = sqrt(Base.sumabs2(sub(qty,(p+1):n))/df)
end
betas, sigmas
end




Out[17]:

boot (generic function with 1 method)




In [18]:

srand(1234321)
boot(m1.model,1)  # check that the results are calculated properly




Out[18]:

(
5x1 Array{Float64,2}:
28.5583
-0.445019
-2.39471
0.000373479
0.449379   ,

[4.56095])




In [19]:

srand(1234321)
boot(m1.model,10000)




Out[19]:

(
5x10000 Array{Float64,2}:
28.5583       35.944      21.9009      …  21.5199       28.1422
-0.445019     -0.621564   -0.343171       -0.329395     -0.431297
-2.39471      -2.35589    -0.86067        -1.27995      -1.82568
0.000373479  -0.0012479  -9.93608e-5      0.000940134  -0.000431981
0.449379      0.598529    0.338381        0.233601      0.423766   ,

[4.56095,4.14837,4.10657,3.27316,3.69092,3.83084,3.94549,4.17757,4.6156,3.62479  …  4.28691,2.50244,3.70527,2.99092,4.19263,3.96592,4.27044,3.06848,3.5673,3.95991])




In [20]:

gc();@time boot(m1.model,10000);




elapsed time: 0.105056267 seconds (24635424 bytes allocated)



Reducing memory allocation

We are doing much better in the boot function except that we are allocating a lot of memory. We can avoid this by using mutating functions. For example each call to rand(d) allocates a vector of length n which eventually must be garbage collected. This is not a terrible burden if $n=50$ but when $n=100000$ you don't want to have this wasted allocation occurring nsim times.

The mutating version of rand avoids this.



In [21]:

function boot1(m::LinearModel,nsim::Integer)
qrd = m.pp.qr
qmat = qrd[:Q]
rmat = Triangular(qrd[:R],:U)
n,p = size(qrd)
df = float(n-p)
d = IsoNormal(m.rr.mu, scale(m))
betas = Array(Float64,(size(qrd,2),nsim))
sigmas = Array(Float64,nsim)
y = Array(Float64,n)
for j in 1:nsim
qty = qmat'rand!(d,y)
betas[:,j] = rmat\sub(qty,1:p)
sigmas[j] = sqrt(Base.sumabs2(sub(qty,(p+1):n))/df)
end
betas, sigmas
end




Out[21]:

boot1 (generic function with 1 method)




In [22]:

srand(1234321)
boot1(m1.model,1)




Out[22]:

(
5x1 Array{Float64,2}:
28.5583
-0.445019
-2.39471
0.000373479
0.449379   ,

[4.56095])




In [23]:

srand(1234321);gc();@time boot1(m1.model,10000)




elapsed time: 0.103414702 seconds (20153008 bytes allocated)

Out[23]:

(
5x10000 Array{Float64,2}:
28.5583       35.944      21.9009      …  21.5199       28.1422
-0.445019     -0.621564   -0.343171       -0.329395     -0.431297
-2.39471      -2.35589    -0.86067        -1.27995      -1.82568
0.000373479  -0.0012479  -9.93608e-5      0.000940134  -0.000431981
0.449379      0.598529    0.338381        0.233601      0.423766   ,

[4.56095,4.14837,4.10657,3.27316,3.69092,3.83084,3.94549,4.17757,4.6156,3.62479  …  4.28691,2.50244,3.70527,2.99092,4.19263,3.96592,4.27044,3.06848,3.5673,3.95991])



We have cut down on the memory allocated (24 Mb total down to 20 Mb total) but only by about 1/6. The other allocations are occurring in the multiplication by $\bf Q$ and the solutions with respect to $\bf R$. There are mutating versions of both these operations. A multiplication of the form $\bf Q^\prime y$ ends up calling the relevant method for At_mul_B. The general function is Ac_mul_B indicating the conjugate transpose. The two functions are identical for real matrices and we will use the more general name Ac_mul_B and its mutating version Base.Ac_mul_B!. Similarly, R\qty ends up calling A_ldiv_B with mutating version Base.A_ldiv_B!.



In [24]:

function boot2(m::LinearModel,nsim::Integer)
qrd = m.pp.qr
qmat = qrd[:Q]
rmat = Triangular(qrd[:R],:U)
n,p = size(qrd)
df = float(n-p)
d = IsoNormal(m.rr.mu, scale(m))
betas = Array(Float64,(size(qrd,2),nsim))
sigmas = Array(Float64,nsim)
y = Array(Float64,n)
for j in 1:nsim
Base.Ac_mul_B!(qmat,rand!(d,y))
Base.A_ldiv_B!(rmat,copy!(sub(betas,:,j),sub(y,1:p)))
sigmas[j] = sqrt(Base.sumabs2(sub(y,(p+1):n))/df)
end
betas, sigmas
end




Out[24]:

boot2 (generic function with 1 method)




In [25]:

srand(1234321)
boot2(m1.model,1)




Out[25]:

(
5x1 Array{Float64,2}:
28.5583
-0.445019
-2.39471
0.000373479
0.449379   ,

[4.56095])




In [26]:

srand(1234321); gc(); @time samp2 = boot2(m1.model,10000);




elapsed time: 0.099737582 seconds (14835392 bytes allocated)



It doesn't look like we have saved a lot but remember that we are allocating 4 Mb of that 14 Mb for the result.

Preallocating the result

A common idiom in Julia is to create a mutating version of a function that overwrites one or more arguments and a non-mutating version that allocates the object to be overwritten and calls the mutating version.

We first create the mutating boot2! method then define the non-mutation version to allocate the storage and call the mutating version.



In [27]:

function boot2!(betas::Matrix, sigmas::Vector, m::LinearModel)
qrd = m.pp.qr
n,p = size(qrd)
size(betas,2) == length(sigmas) && size(betas,1) == p || throw(DimensionMismatch(""))
qmat = qrd[:Q]
rmat = Triangular(qrd[:R],:U)
df = float(n-p)
d = IsoNormal(m.rr.mu, scale(m))
y = Array(Float64,n)
for j in 1:length(sigmas)
Base.Ac_mul_B!(qmat,rand!(d,y))
Base.A_ldiv_B!(rmat,copy!(sub(betas,:,j),sub(y,1:p)))
sigmas[j] = sqrt(Base.sumabs2(sub(y,(p+1):n))/df)
end
betas, sigmas
end
function boot2(m::LinearModel, nsim::Integer)
boot2!(Array(Float64,(size(m.pp.qr,2),nsim)),Array(Float64,nsim),m)
end




Out[27]:

boot2 (generic function with 1 method)




In [28]:

srand(1234321)
boot2!(zeros(5,1),zeros(1),m1.model)




Out[28]:

(
5x1 Array{Float64,2}:
28.5583
-0.445019
-2.39471
0.000373479
0.449379   ,

[4.56095])




In [29]:

betas = zeros(5,10000)
sigmas = zeros(10000)
srand(1234321); gc(); @time boot2!(betas,sigmas,m1.model);




elapsed time: 0.099889196 seconds (14355200 bytes allocated)



So despite all these heroic efforts we seem to be stuck allocating about 14 Mb in this case.

Profiling to find the memory allocation bottlenecks.

We should profile to see what is happening. To get a reasonable execution time we up the number of samples from 10,000 to 100,000.



In [30]:

betas = zeros(5,100_000)
sigmas = zeros(100_000)
srand(1234321)
gc()
@profile boot2!(betas,sigmas,m1.model);




In [31]:

using ProfileView



At this point we would typically run ProfileViews.view() but the resulting plot doesn't display well in an IJulia notebook. Also, when @profile is called through IJulia there are several functions specific to the IJulia interface that show up.

Running this in the REPL does produce a profile flame plot which shows, as expected, that essentially all the time is being used in the inner loop. Roughly 1/3 in the Ac_mul_B! call, nearly 2/3 in the A_ldiv_B! call and a small amount in the sumabs2 call.

Furthermore, only a small fraction of the time in the A_ldiv_B! call is actually spent in the calculation. Most of the time is spent in the sub function. I misunderstood how it worked. This frequently happens to me, which is why @profile and ProfileView.view() are so valuable.

It is likely that view from the ArrayViews package will be faster and allocate less memory. (Because the capabilities in ArrayViews are likely to break existing code, they will not be introduced into the base system until after release 0.3.0 is out.)



In [32]:

using ArrayViews
function boot3!(betas::Matrix, sigmas::Vector, m::LinearModel)
qrd = m.pp.qr
n,p = size(qrd)
size(betas,2) == length(sigmas) && size(betas,1) == p || throw(DimensionMismatch(""))
qmat = qrd[:Q]
rmat = Triangular(qrd[:R],:U)
df = float(n-p)
d = IsoNormal(m.rr.mu, scale(m))
y = Array(Float64,n)
for j in 1:length(sigmas)
Base.Ac_mul_B!(qmat,rand!(d,y))
Base.A_ldiv_B!(rmat,copy!(view(betas,:,j),view(y,1:p)))
sigmas[j] = sqrt(Base.sumabs2(view(y,(p+1):n))/df)
end
betas, sigmas
end
function boot3(m::LinearModel, nsim::Integer)
boot3!(Array(Float64,(size(m.pp.qr,2),nsim)),Array(Float64,nsim),m)
end




Out[32]:

boot3 (generic function with 1 method)




In [33]:

srand(1234321)
boot3!(zeros(5,1),[0.],m1.model)




Out[33]:

(
5x1 Array{Float64,2}:
28.5583
-0.445019
-2.39471
0.000373479
0.449379   ,

[4.56095])




In [34]:

betas = zeros(5,10_000)
sigmas = zeros(10_000)
srand(1234321)
gc()
@time boot3!(betas,sigmas,m1.model);




elapsed time: 0.062589559 seconds (5361040 bytes allocated)



The profile indicates that we are unlikely to make this much faster as almost all the time is being spent in the BLAS or LAPACK functions, plus the simulation from the multivariate normal. We are still allocating over 5 Mb during the execution but I'm not sure exactly where this is occuring.

Parallel execution

In addition to allowing memory allocation to be more finely controlled, having a mutating version of the simulation function can help when parallelizing the simulation using distributed arrays or memory-mapped arrays.

A memory-mapped array corresponds to a stored file. Numbers are stored in their native binary representation.

Suppose that we were simulating a huge number of replications for this model or for more complex models.



In [35]:

nsim = 10_000_000
n, p = size(m1.model.pp.X)
s = open("/tmp/mmapbetas.bin","w+")
betas = mmap_array(Float64,(p,nsim),s)
typeof(betas)




Out[35]:

Array{Float64,2}