Fabrizzio Sanchez suggested bootstrapping a statistical model as a non-trivial task on which to demonstrate the speed of Julia.

In this notebook I will demonstrate a *parametric bootstrap*, which means that samples are drawn from the probability model with parameters set at their estimated values from the original model, and the model refit.

A simple example is a *linear model* which can be fit using the `GLM`

package.

```
In [1]:
```using GLM, RDatasets
ls = dataset("datasets","LifeCycleSavings")

```
Out[1]:
```

```
In [2]:
```m1 = lm(SR ~ Pop15 + Pop75 + DPI + DDPI, ls)

```
Out[2]:
```

We will take the fitted values and the estimate of the scale parameter from this model as the mean and scale parameter of a multivariate normal distribution from which to simulate new response vectors.

Getting the fitted values turned out to be more complicated than I thought it would be because I hadn't written the appropriate method. For now we will use the somewhat mysterious extractor `m1.model.rr.mu`

to get this.

We will use the `IsoNormal`

type of multivariate normal distribution from the `Distributions`

package for simulating the responses.

```
In [6]:
```fitted(m1)

```
```

```
In [3]:
```d = IsoNormal(m1.model.rr.mu,scale(m1.model))

```
```

`lm`

method for a model matrix and a response.

```
In [4]:
```X = copy(m1.model.pp.X) # copy the X matrix from the predictor field, pp

```
Out[4]:
```

```
In [5]:
```srand(1234321)
lm(X,rand(d))

```
Out[5]:
```

Suppose that we want the coefficient estimates, $\widehat{\beta}$, and the scale parameter estimate, $\widehat{\sigma}$, from, say, 10000 simulated responses for this model.

For a single fit we can extract both of these and concatenate them as a vector

```
In [6]:
```sm1 = lm(X,rand(d))
[coef(sm1), scale(sm1)]

```
Out[6]:
```

Let's express this as a function

```
In [7]:
```samp1(X::Matrix,d) = (sm = lm(X,rand(d)); [coef(sm), scale(sm)])

```
Out[7]:
```

```
In [8]:
```samp1(X,d)

```
Out[8]:
```

```
In [9]:
```srand(1234321)
bss = [samp1(X,d) for i in 1:10000]

```
Out[9]:
```

```
In [10]:
```@time [samp1(X,d) for i in 1:10000];

```
```

As we see, this produces the sample in the form of an array of vectors, as opposed to, say, a matrix. Furthermore this process is not blazingly fast, allocates a lot of memory and spends about 1/6 of the time in garbage collection.

What is happening here is that all the work of creating the model object and factoring the model matrix is (unnecessarily) being done for every sample.

```
In [11]:
```qrd = m1.model.pp.qr;
typeof(qrd)

```
Out[11]:
```

```
In [12]:
```srand(1234321)
y1 = rand(d)
β = qrd\y1

```
Out[12]:
```

We see we get the same coefficient estimates using the pre-computed QR decomposition of the model matrix. To get $\widehat{\sigma}$ we could use these coefficient estimates, generate the fitted values and residuals and assumulate the sum-of-squared residuals. However, there is a faster way using the vector $\bf Q^\prime y$. The coefficient estimates satisfy $\bf R\widehat{\beta}=Q_1^\prime y$ and the residual sum of squares is $\|\bf Q_2^\prime y\|^2$ where $\bf Q_1$ is the first $p$ columns of the $n\times n$ matrix $\bf Q$ and $\bf Q_2$ is the last $n-p$ columns.

This method would not be practical for large $n$ if we needed to generate and store $\bf Q$ explicitly. But we don't.

```
In [13]:
```Q = qrd[:Q]
typeof(Q)

```
Out[13]:
```

```
In [14]:
```R = Triangular(qrd[:R],:U)

```
Out[14]:
```

We can create both $\widehat{\sigma}$ and $\widehat{\beta}$ in a simple function.

```
In [15]:
```function betasigmahat(qrd,y::Vector)
n,p = size(qrd)
length(y) == n || throw(DimensionMismatch(""))
qty = qrd[:Q]'y
sigmahat = sqrt(Base.sumabs2(sub(qty,(p+1):n))/(n-p))
Triangular(qrd[:R],:U)\sub(qty,1:p), sigmahat
end

```
Out[15]:
```

```
In [16]:
```betasigmahat(qrd,y1)

```
Out[16]:
```

We can wrap all this in a function to create the bootstrap sample

```
In [17]:
```function boot(m::LinearModel,nsim::Integer)
qrd = m.pp.qr
qmat = qrd[:Q]
rmat = Triangular(qrd[:R],:U)
n,p = size(qrd)
df = float(n-p)
d = IsoNormal(m.rr.mu, scale(m))
betas = Array(Float64,(size(qrd,2),nsim))
sigmas = Array(Float64,nsim)
for j in 1:nsim
qty = qmat'rand(d)
betas[:,j] = rmat\sub(qty,1:p)
sigmas[j] = sqrt(Base.sumabs2(sub(qty,(p+1):n))/df)
end
betas, sigmas
end

```
Out[17]:
```

```
In [18]:
```srand(1234321)
boot(m1.model,1) # check that the results are calculated properly

```
Out[18]:
```

```
In [19]:
```srand(1234321)
boot(m1.model,10000)

```
Out[19]:
```

```
In [20]:
```gc();@time boot(m1.model,10000);

```
```

We are doing much better in the boot function except that we are allocating a lot of memory. We can avoid this by using mutating functions. For example each call to `rand(d)`

allocates a vector of length `n`

which eventually must be garbage collected. This is not a terrible burden if $n=50$ but when $n=100000$ you don't want to have this wasted allocation occurring `nsim`

times.

The mutating version of `rand`

avoids this.

```
In [21]:
```function boot1(m::LinearModel,nsim::Integer)
qrd = m.pp.qr
qmat = qrd[:Q]
rmat = Triangular(qrd[:R],:U)
n,p = size(qrd)
df = float(n-p)
d = IsoNormal(m.rr.mu, scale(m))
betas = Array(Float64,(size(qrd,2),nsim))
sigmas = Array(Float64,nsim)
y = Array(Float64,n)
for j in 1:nsim
qty = qmat'rand!(d,y)
betas[:,j] = rmat\sub(qty,1:p)
sigmas[j] = sqrt(Base.sumabs2(sub(qty,(p+1):n))/df)
end
betas, sigmas
end

```
Out[21]:
```

```
In [22]:
```srand(1234321)
boot1(m1.model,1)

```
Out[22]:
```

```
In [23]:
```srand(1234321);gc();@time boot1(m1.model,10000)

```
Out[23]:
```

`At_mul_B`

. The general function is `Ac_mul_B`

indicating the conjugate transpose. The two functions are identical for real matrices and we will use the more general name `Ac_mul_B`

and its mutating version `Base.Ac_mul_B!`

. Similarly, `R\qty`

ends up calling `A_ldiv_B`

with mutating version `Base.A_ldiv_B!`

.

```
In [24]:
```function boot2(m::LinearModel,nsim::Integer)
qrd = m.pp.qr
qmat = qrd[:Q]
rmat = Triangular(qrd[:R],:U)
n,p = size(qrd)
df = float(n-p)
d = IsoNormal(m.rr.mu, scale(m))
betas = Array(Float64,(size(qrd,2),nsim))
sigmas = Array(Float64,nsim)
y = Array(Float64,n)
for j in 1:nsim
Base.Ac_mul_B!(qmat,rand!(d,y))
Base.A_ldiv_B!(rmat,copy!(sub(betas,:,j),sub(y,1:p)))
sigmas[j] = sqrt(Base.sumabs2(sub(y,(p+1):n))/df)
end
betas, sigmas
end

```
Out[24]:
```

```
In [25]:
```srand(1234321)
boot2(m1.model,1)

```
Out[25]:
```

```
In [26]:
```srand(1234321); gc(); @time samp2 = boot2(m1.model,10000);

```
```

A common idiom in Julia is to create a mutating version of a function that overwrites one or more arguments and a non-mutating version that allocates the object to be overwritten and calls the mutating version.

We first create the mutating `boot2!`

method then define the non-mutation version to allocate the storage and call the mutating version.

```
In [27]:
```function boot2!(betas::Matrix, sigmas::Vector, m::LinearModel)
qrd = m.pp.qr
n,p = size(qrd)
size(betas,2) == length(sigmas) && size(betas,1) == p || throw(DimensionMismatch(""))
qmat = qrd[:Q]
rmat = Triangular(qrd[:R],:U)
df = float(n-p)
d = IsoNormal(m.rr.mu, scale(m))
y = Array(Float64,n)
for j in 1:length(sigmas)
Base.Ac_mul_B!(qmat,rand!(d,y))
Base.A_ldiv_B!(rmat,copy!(sub(betas,:,j),sub(y,1:p)))
sigmas[j] = sqrt(Base.sumabs2(sub(y,(p+1):n))/df)
end
betas, sigmas
end
function boot2(m::LinearModel, nsim::Integer)
boot2!(Array(Float64,(size(m.pp.qr,2),nsim)),Array(Float64,nsim),m)
end

```
Out[27]:
```

```
In [28]:
```srand(1234321)
boot2!(zeros(5,1),zeros(1),m1.model)

```
Out[28]:
```

```
In [29]:
```betas = zeros(5,10000)
sigmas = zeros(10000)
srand(1234321); gc(); @time boot2!(betas,sigmas,m1.model);

```
```

So despite all these heroic efforts we seem to be stuck allocating about 14 Mb in this case.

```
In [30]:
```betas = zeros(5,100_000)
sigmas = zeros(100_000)
srand(1234321)
gc()
@profile boot2!(betas,sigmas,m1.model);

```
In [31]:
```using ProfileView

At this point we would typically run `ProfileViews.view()`

but the resulting plot doesn't display well in an IJulia notebook. Also, when `@profile`

is called through IJulia there are several functions specific to the IJulia interface that show up.

Running this in the REPL does produce a profile *flame plot* which shows, as expected, that essentially all the time is being used in the inner loop. Roughly 1/3 in the `Ac_mul_B!`

call, nearly 2/3 in the `A_ldiv_B!`

call and a small amount in the `sumabs2`

call.

Furthermore, only a small fraction of the time in the `A_ldiv_B!`

call is actually spent in the calculation. Most of the time is spent in the `sub`

function. I misunderstood how it worked. This frequently happens to me, which is why `@profile`

and `ProfileView.view()`

are so valuable.

It is likely that `view`

from the `ArrayViews`

package will be faster and allocate less memory. (Because the capabilities in `ArrayViews`

are likely to break existing code, they will not be introduced into the base system until after release 0.3.0 is out.)

```
In [32]:
```using ArrayViews
function boot3!(betas::Matrix, sigmas::Vector, m::LinearModel)
qrd = m.pp.qr
n,p = size(qrd)
size(betas,2) == length(sigmas) && size(betas,1) == p || throw(DimensionMismatch(""))
qmat = qrd[:Q]
rmat = Triangular(qrd[:R],:U)
df = float(n-p)
d = IsoNormal(m.rr.mu, scale(m))
y = Array(Float64,n)
for j in 1:length(sigmas)
Base.Ac_mul_B!(qmat,rand!(d,y))
Base.A_ldiv_B!(rmat,copy!(view(betas,:,j),view(y,1:p)))
sigmas[j] = sqrt(Base.sumabs2(view(y,(p+1):n))/df)
end
betas, sigmas
end
function boot3(m::LinearModel, nsim::Integer)
boot3!(Array(Float64,(size(m.pp.qr,2),nsim)),Array(Float64,nsim),m)
end

```
Out[32]:
```

```
In [33]:
```srand(1234321)
boot3!(zeros(5,1),[0.],m1.model)

```
Out[33]:
```

```
In [34]:
```betas = zeros(5,10_000)
sigmas = zeros(10_000)
srand(1234321)
gc()
@time boot3!(betas,sigmas,m1.model);

```
```

In addition to allowing memory allocation to be more finely controlled, having a mutating version of the simulation function can help when parallelizing the simulation using distributed arrays or memory-mapped arrays.

A memory-mapped array corresponds to a stored file. Numbers are stored in their native binary representation.

Suppose that we were simulating a huge number of replications for this model or for more complex models.

```
In [35]:
```nsim = 10_000_000
n, p = size(m1.model.pp.X)
s = open("/tmp/mmapbetas.bin","w+")
betas = mmap_array(Float64,(p,nsim),s)
typeof(betas)

```
Out[35]:
```