Case 2: Properties of OLS and simulation methods

by Milan Van den Heuvel, Ken Bastiaensen, Gonzalo Villa *Advanced Econometrics 2016-2017.

Strict set of GM assumptions:

  • X is deterministic, x is thus fixed over repeated samples
  • errors $\mu$ are normally distributed with assumed homoscedastic errors

Question: Give the small sample and asymptotic properties of the OLS estimator for $\beta$ and for the estimator of the standard errors.

Small sample properties:

  • OLS is the best unbiased estimator
  • The estimator is normally distributed (stems from the fact that $\hat{\beta}$ is linear function of the disturbance vector $\mu$)
  • The covariance matrix $\sigma^2(X'X)^{-1}$ can be estimated with an unbiased estimator of $\sigma^2$ given by:
$$\hat{\sigma}^2 = \frac{\hat{\mu}'\hat{\mu}}{N-K} = \frac{y'My}{N-K}$$

Asymptotic properties:

  • same under the GM conditions
  • $\bar{x}_N$ assymptotically approaches $N(\mu,\frac{\sigma^2}{N})$

Part 1: Properties of Monte Carlo simulations


In [1]:
using Distributions: Normal, TDist, ccdf, fit
using Plots
gc()

In [2]:
# include functions from file
include("functions_lib.jl");

General normality test (Jarque-Bera)


In [3]:
#testing for normality
function JB_test(X)
    E_X = mean(X,1)[1]
    σ = std(X,1)[1]
    n = length(X)
    S = sum((X - E_X).^3/σ^3)/n
    K = sum((X - E_X).^4/σ^4)/n
    return n*(S^2/6 + (K-3)^2/24)
end


Out[3]:
JB_test (generic function with 1 method)

In [4]:
# Note that we can use unicode for identifiers by using latex and tab completion (e.g. \beta+<TAB>)
β₀ = 10
β₁ = 1
β  = [β₀, β₁]
σ² = 1
T  = 25  # sample size
runs = 10_000 # underscore for readability, doesn't affect number


Out[4]:
10000

Deterministic X

We create a function to run MC simulations:

  1. specify a population = N(5,2) and draw a sample once to have a deterministic sample.
  2. simulate y by simulating errors with variance $\sigma^2$ (= 1 here).
  3. run ols and store results.
  4. The 'true' standard errors is the standard deviation over all estimated $\hat\beta$ (True SE = $\frac{1}{runs}\sum_{run=0}^{runs} se(\hat\beta_{run}$)).
  5. return true value and mean of estimated for $\beta$ as well as for its standard error.

In [5]:
# simple implementation
function mc_simple(β, σ², T, runs)
    K = length(β)
    
    # simulate X once, deterministically
    X  = hcat(ones(T), rand(Normal(5, 2), T, K-1)) #concatenation of column of ones for the constant terms and the randomly drawn x's for the beta terms 
    
    # variables with mc results
    β_mc    = zeros(runs, K)
    β_var_mc= zeros(runs, K)

    # pre-allocate memory to speed up value-allocation process
     = X * β
    μ_dist = Normal(0, σ²) 
    
    for run = 1:runs
        y =  + rand(μ_dist, T)
        result = ols(y, X)
        
        β_mc[run, :] = result.coefs
        β_var_mc[run, :] = diag(result.vcv)
    end
    
    return β_mc, β, mean(β_mc,1), sqrt(mean(β_var_mc,1)), std(β_mc,1)
end


Out[5]:
mc_simple (generic function with 1 method)

In [6]:
function fill_jarque_bera_simple(T)
    jb = zeros(1000)
    for i = 1:1000
        β_mcs, True_β, Est_β, True_σ, Est_σ = mc_simple(β, σ², T, runs)
        jb[i] = JB_test(β_mcs[:,1])
        i+=1
    end
end


Out[6]:
fill_jarque_bera_simple (generic function with 1 method)

In [7]:
jb = fill_jarque_bera_simple(25)

Repeated Jarque-Bera to find out power of the test.


In [8]:
function JB_prob_mass(jb)
    pass = 0
    for i = 1:length(jb)
        if jb[i] <= 5.99
            pass +=1
        end
    end
    return pass/length(jb)
end


Out[8]:
JB_prob_mass (generic function with 1 method)

In [9]:
JB_prob_mass(jb)


MethodError: no method matching length(::Void)
Closest candidates are:
  length(::SimpleVector) at essentials.jl:168
  length(::Base.MethodList) at reflection.jl:256
  length(::MethodTable) at reflection.jl:322
  ...

 in JB_prob_mass(::Void) at ./In[8]:3

In [10]:
histogram(jb)


BoundsError: attempt to access 0-element Array{RecipesBase.RecipeData,1} at index [1]

 in macro expansion at /Users/iubere/.julia/v0.5/Plots/src/series.jl:230 [inlined]
 in apply_recipe(::Dict{Symbol,Any}, ::Void) at /Users/iubere/.julia/v0.5/RecipesBase/src/RecipesBase.jl:238
 in _process_userrecipes(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{Void}) at /Users/iubere/.julia/v0.5/Plots/src/pipeline.jl:73
 in _plot!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{Void}) at /Users/iubere/.julia/v0.5/Plots/src/plot.jl:171
 in (::Plots.#kw##plot)(::Array{Any,1}, ::Plots.#plot, ::Void) at ./<missing>:0
 in #histogram#360(::Array{Any,1}, ::Function, ::Void, ::Vararg{Void,N}) at /Users/iubere/.julia/v0.5/Plots/src/Plots.jl:139
 in histogram(::Void, ::Vararg{Void,N}) at /Users/iubere/.julia/v0.5/Plots/src/Plots.jl:139

Interlude: Julia speedups

You can profile the code to identify possible speedups. We see that most of the time is spent in solving OLS. Because X is deterministic and we only need to factorize it once. Changing this part of the code almost doubles the speed. See the bottom of this notebook.

Comparison to true standard errors

We see that the mean of the estimated standard errors are close to the 'true' standard errors, even when running only 100 simulations for 25 samples:


In [11]:
for T = [25, 50, 100, 500]
    β_mcs, True_β, Est_β, True_σ, Est_σ = mc_simple(β, σ², T, runs)
    println("For sample size: ", T, " True_β: ", True_β," Est_β: ", Est_β, " True_σ: ", True_σ, " Est_σ: ", Est_σ)
end


For sample size: 25 True_β: [10,1] Est_β: [9.99369 1.0014] True_σ: [0.421708 0.0777859] Est_σ: [0.418864 0.077747]
For sample size: 50 True_β: [10,1] Est_β: [9.99639 1.00057] True_σ: [0.381263 0.072868] Est_σ: [0.377425 0.0720451]
For sample size: 100 True_β: [10,1] Est_β: [9.99689 1.00091] True_σ: [0.2622 0.0493567] Est_σ: [0.262293 0.0494392]
For sample size: 500 True_β: [10,1] Est_β: [9.9978 1.00044] True_σ: [0.118758 0.0222236] Est_σ: [0.118951 0.0222358]

t-test

We now perform a t-test for several null hypothesis for $\beta_1 = 1; 0.9; 0.8$ and this for several sample sizes, we also report the p-values. Because we know that X is normally distributed we do not need to use the simulated t-stats since we know that the t-values will be student-t distributed and can immediately do a standard t-test.


In [12]:
runs = 10_000
for T = [25, 50, 100, 500, 10000]
    println("## T = ",T," ##")
    for β₁_hyp = [1, 0.9, 0.7, 0.5]
        β_mc, _, β_mean, _, β_se = mc_simple(β, σ², T, runs)
        K = size(β_mean)[1] #amount of estimated parameters = amount of d.o.f. lost
        ttest = (β_mean[2] - β₁_hyp) / β_se[2]
        pval  = 2 * ccdf(TDist(T-K), abs(ttest)) # what is the change that if you reject a correct null
        println("β₁=", β₁_hyp, "; T-test: ", ttest)
        println("β₁=", β₁_hyp, "; P-val: ", pval)
    end
end


## T = 25 ##
β₁=1.0; T-test: -0.0026523279194550897
β₁=1.0; P-val: 0.9979056746314129
β₁=0.9; T-test: 0.9179980575157168
β₁=0.9; P-val: 0.3677538892656238
β₁=0.7; T-test: 2.9506522096323557
β₁=0.7; P-val: 0.006975611222798675
β₁=0.5; T-test: 4.074471503173958
β₁=0.5; P-val: 0.00043644935223433084
## T = 50 ##
β₁=1.0; T-test: -0.0021686372898758197
β₁=1.0; P-val: 0.9982784842479009
β₁=0.9; T-test: 1.3621794614695921
β₁=0.9; P-val: 0.1793705323852936
β₁=0.7; T-test: 3.811654134756052
β₁=0.7; P-val: 0.00038625319716225037
β₁=0.5; T-test: 7.221560518999927
β₁=0.5; P-val: 3.0119369389036476e-9
## T = 100 ##
β₁=1.0; T-test: 0.010956475927006333
β₁=1.0; P-val: 0.9912802207803159
β₁=0.9; T-test: 1.88086777310573
β₁=0.9; P-val: 0.06292876687595954
β₁=0.7; T-test: 6.469380819162742
β₁=0.7; P-val: 3.775180166522832e-9
β₁=0.5; T-test: 9.679351750335796
β₁=0.5; P-val: 5.477792266100121e-16
## T = 500 ##
β₁=1.0; T-test: -0.005013061620867577
β₁=1.0; P-val: 0.9960021757355164
β₁=0.9; T-test: 4.6506278744199
β₁=0.9; P-val: 4.2434548662058795e-6
β₁=0.7; T-test: 12.769375988003297
β₁=0.7; P-val: 1.6482799316310413e-32
β₁=0.5; T-test: 22.331580688739827
β₁=0.5; P-val: 4.246560395511318e-77
## T = 10000 ##
β₁=1.0; T-test: -0.004862244259071587
β₁=1.0; P-val: 0.99612060265834
β₁=0.9; T-test: 20.212309027235296
β₁=0.9; P-val: 4.524613847163936e-89
β₁=0.7; T-test: 59.93188558831908
β₁=0.7; P-val: 0.0
β₁=0.5; T-test: 100.032201428183
β₁=0.5; P-val: 0.0

I (milan) am not sure if we need to take the 2,5% and 97,5% values from the t-values for every estimated parameter. Since we know X is normally distributed, we know the t-test is student-t distributed so I don't think this is necessary and the previous is enough


In [13]:
runs = 10_000
t_values = zeros(runs)
for T = [25, 50, 100, 500]
    println("## T = ",T," ##")
    for β₁_hyp = [1]
        β_mc, _, β_mean, _, β_se = mc_simple(β, σ², T, runs)
        t_values = (β_mc[:,2] - 1)/β_se[2]
        sort!(t_values)
        println("For β_hyp = ",β₁_hyp,", critical values: 2,5%: ",t_values[250], "97,5%: ", t_values[9750])
    end
end


## T = 25 ##
For β_hyp = 1, critical values: 2,5%: -2.011175672023387897,5%: 1.9550479941387047
## T = 50 ##
For β_hyp = 1, critical values: 2,5%: -1.931279321789643397,5%: 1.946246863031847
## T = 100 ##
For β_hyp = 1, critical values: 2,5%: -1.958797136030247697,5%: 1.9778901230705677
## T = 500 ##
For β_hyp = 1, critical values: 2,5%: -1.993299871164567397,5%: 1.9349167387552961

In [14]:
histogram(t_values)


Out[14]:
-2.5 0.0 2.5 0 200 400 600 800 1000 y1

In [15]:
function tstat_prob_mass(t_values)
    pass = 0
    for i = 1:length(t_values)
        if abs(t_values[i]) <= 1.96
            pass +=1
        end
    end
    return pass/length(t_values)
end


Out[15]:
tstat_prob_mass (generic function with 1 method)

In [16]:
tstat_prob_mass(t_values)


Out[16]:
0.9502

Conclusion

From these results we clearly see that under the strict assumptions of Gauss-Markov the OLS estimator for $\beta$ ($\hat{\beta}$) and the estimator for the standard deviation ($\hat{\sigma}$) of this estimator are unbiased and, even for small samples, very close to the true values. The estimated standard deviations can also be seen to decline with increasing sample size, thus the distribution of the estimated parameters grows more peaked around the real values. Since the error terms are pulled from a normal distribution and $\hat{\beta}$ is a weighted sum of these, where the weigting is deterministic, it is itself also normally distributed. This is proven by the Jarque-Bera test.

MC with stochastic variables

X now changes over each run, this causes some properties to change.

We know X and $\mu$ are still independent.

OLS estimator is still:

  • unbiased
  • efficient

but in small samples no longer necessarily normally distributed and the standard covariance matrix should be interpreted as being conditional on X. In large samples, the estimator will still be normally distributed.


In [17]:
function mc_stoch(β, σ², T, runs)
    K = length(β)
    
    # variables with mc results
    β_mc    = zeros(runs, length(β))
    β_var_mc= zeros(runs, length(β))

    # pre-allocate
    μ_dist = Normal(0, σ²)
    X_dist = Normal(5, 2)
    
    X = ones(T, K)
    for run = 1:runs
        # simulate inside the loop
        X[:, 2:end] = rand(X_dist, T, K-1)
        y = X*β + rand(μ_dist, T)
        result = ols(y, X)
        
        β_mc[run, :] = result.coefs
        β_var_mc[run,:] = diag(result.vcv)
    end
    
    return β_mc, β, mean(β_mc,1), sqrt(mean(β_var_mc,1)), std(β_mc,1)
end


Out[17]:
mc_stoch (generic function with 1 method)

In [18]:
β_mcs2, True_β, Est_β, True_σ, Est_σ = mc_stoch(β, σ², 25, 5000)
histogram(β_mcs2[:,1])


Out[18]:
8 9 10 11 12 0 100 200 300 400 500 y1

In [19]:
function fill_jarque_bera_stoch(β, σ², T, runs)
    jb = zeros(1000)
    for i = 1:1000
        β_mcs, True_β, Est_β, True_σ, Est_σ = mc_stoch(β, σ², T, runs)
        jb[i] = JB_test(β_mcs[:,1])
        i+=1
    end
end


Out[19]:
fill_jarque_bera_stoch (generic function with 1 method)

In [20]:
jb = fill_jarque_bera_stoch(β, σ², 25, runs)

In [21]:
JB_prob_mass(jb)


MethodError: no method matching length(::Void)
Closest candidates are:
  length(::SimpleVector) at essentials.jl:168
  length(::Base.MethodList) at reflection.jl:256
  length(::MethodTable) at reflection.jl:322
  ...

 in JB_prob_mass(::Void) at ./In[8]:3

In [22]:
histogram(jb)


BoundsError: attempt to access 0-element Array{RecipesBase.RecipeData,1} at index [1]

 in macro expansion at /Users/iubere/.julia/v0.5/Plots/src/series.jl:230 [inlined]
 in apply_recipe(::Dict{Symbol,Any}, ::Void) at /Users/iubere/.julia/v0.5/RecipesBase/src/RecipesBase.jl:238
 in _process_userrecipes(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{Void}) at /Users/iubere/.julia/v0.5/Plots/src/pipeline.jl:73
 in _plot!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{Void}) at /Users/iubere/.julia/v0.5/Plots/src/plot.jl:171
 in (::Plots.#kw##plot)(::Array{Any,1}, ::Plots.#plot, ::Void) at ./<missing>:0
 in #histogram#360(::Array{Any,1}, ::Function, ::Void, ::Vararg{Void,N}) at /Users/iubere/.julia/v0.5/Plots/src/Plots.jl:139
 in histogram(::Void, ::Vararg{Void,N}) at /Users/iubere/.julia/v0.5/Plots/src/Plots.jl:139

In [23]:
runs = 10_000
for T = [25, 50, 100, 500]
    println("## T = ",T," ##")
    for β₁_hyp = [1, 0.9, 0.7, 0.5]
        _, β_mean, _, β_se = mc_stoch(β, σ², T, runs)
        K = size(β_mean)[1] #amount of estimated parameters = amount of d.o.f. lost
        ttest = (β_mean[2] - β₁_hyp) / β_se[2]
        pval  = 2 * ccdf(TDist(T-K), abs(ttest)) # what is the change that you reject a correct null
        println("β₁=", β₁_hyp, "; T-test: ", ttest)
        println("β₁=", β₁_hyp, "; P-val: ", pval)
    end
end


## T = 25 ##
β₁=1.0; T-test: 0.0
β₁=1.0; P-val: 1.0
β₁=0.9; T-test: 0.9384321697343082
β₁=0.9; P-val: 0.3577734874005212
β₁=0.7; T-test: 2.8120572268784727
β₁=0.7; P-val: 0.009891984889532036
β₁=0.5; T-test: 4.703698986384477
β₁=0.5; P-val: 9.741597156309991e-5
## T = 50 ##
β₁=1.0; T-test: 0.0
β₁=1.0; P-val: 1.0
β₁=0.9; T-test: 1.3718328428597035
β₁=0.9; P-val: 0.17649367298259364
β₁=0.7; T-test: 4.110531881357213
β₁=0.7; P-val: 0.00015338351447699385
β₁=0.5; T-test: 6.841761440659729
β₁=0.5; P-val: 1.2856780197548639e-8
## T = 100 ##
β₁=1.0; T-test: 0.0
β₁=1.0; P-val: 1.0
β₁=0.9; T-test: 1.9723408470946888
β₁=0.9; P-val: 0.05138891842744003
β₁=0.7; T-test: 5.902158978369037
β₁=0.7; P-val: 5.1532300787019635e-8
β₁=0.5; T-test: 9.86403718602204
β₁=0.5; P-val: 2.387783656908781e-16
## T = 500 ##
β₁=1.0; T-test: 0.0
β₁=1.0; P-val: 1.0
β₁=0.9; T-test: 4.459050075099171
β₁=0.9; P-val: 1.0182053643818617e-5
β₁=0.7; T-test: 13.362993555244458
β₁=0.7; P-val: 5.038132691737849e-35
β₁=0.5; T-test: 22.266904675867814
β₁=0.5; P-val: 9.643895036686288e-77

In [24]:
runs = 10_000
t_values = zeros(runs)
for T = [25, 50, 100, 500]
    println("## T = ",T," ##")
    for β₁_hyp = [1]
        β_mc, _, β_mean, _, β_se = mc_simple(β, σ², T, runs)
        t_values = (β_mc[:,2] - 1)/β_se[2]
        sort!(t_values)
        println("For β_hyp = ",β₁_hyp,", critical values: 2,5%: ",t_values[250], "97,5%: ", t_values[9750])
    end
end


## T = 25 ##
For β_hyp = 1, critical values: 2,5%: -1.981448866630208497,5%: 1.941091151635701
## T = 50 ##
For β_hyp = 1, critical values: 2,5%: -1.957006385221384497,5%: 1.9219808053886271
## T = 100 ##
For β_hyp = 1, critical values: 2,5%: -1.97577767987211797,5%: 1.9778019853475124
## T = 500 ##
For β_hyp = 1, critical values: 2,5%: -1.947045201791443597,5%: 1.9248063894784406

In [25]:
histogram(t_values)


Out[25]:
-2 0 2 0 250 500 750 y1

In [26]:
tstat_prob_mass(t_values)


Out[26]:
0.9525

Conclusion

The Jarque-Bera test shows that the power of it is lower because it is clear that there is a lot more probability mass beyond the critical value of 5,99 than in the deterministic case. The estimators are still unbiased and consistent.

Lagged Dependent Variable

Introducing lagged dependent variables makes it so that the assumption "X and $\mu$ are independent" has to be relaxed to $E[\mu_t|x_t] = 0$ or thus that the errors are contemporaneously independent with any explanatory variables.

The OLS estimator becomes:

  • Biased: $E[\hat{\beta}|X] = \beta + (X'X)^{-1}X'E[\mu|X]$ => $E[\hat{\beta}] = E_X(E[\hat{\beta}|X]) \neq \beta$
  • Consistent and asymptotically normally distributed: $plim\hat{\beta} = \beta + plim \frac{X'X}{T}^{-1} plim\frac{X'\mu}{T}$ = 0 because $plim\frac{X'\mu}{T} = E(x_t\mu_t) = 0$
  • $\hat{\sigma}^2 = \frac{\hat{\mu}'\hat{\mu}}{T-k}$ is still a consistent estimator for $\sigma^2$

In [27]:
# AR1 MC simulation
function mc_ar1(β, σ², T, runs)
    K = length(β)
    β₀, β₁ = β
    σ = σ² # = sqrt(σ²)
    
    # variables with mc averages
    β_mc     = zeros(runs, K)
    β_var_mc = zeros(runs, K)

    # pre-allocate
    y = zeros(T)
    X = ones(T, K) # fill second column with y_{t-1}
    y₀_dist = Normal(β₀/(1-β₁), sqrt(σ²/(1-β₁^2)))
    
    for run = 1:runs
        # simulate y
        y₀ = rand(y₀_dist) 
        y[1] = β₀ + β₁*y₀ + σ*randn() 
        for t = 2:T
            y[t] = β₀ + β₁*y[t-1] + σ*randn() 
        end
        # copy into X
        X[1,2] = y₀
        X[2:end, 2] = y[1:end-1]
        
        # ols
        result = ols(y, X)
        β_mc[run,:]    = result.coefs
        β_var_mc[run,:]= diag(result.vcv)
    end
    
    return β, mean(β_mc,1), sqrt(mean(β_var_mc,1)), std(β_mc,1)
end


Out[27]:
mc_ar1 (generic function with 1 method)

In [28]:
β₀, β₁ = 10, 0.1
σ² = 1
T = 1000
runs = 10_000
@time mc_ar1([β₀, β₁], σ², T, runs)


  0.832919 seconds (1.17 M allocations: 1.151 GB, 19.67% gc time)
Out[28]:
([10.0,0.1],
[10.0151 0.0987158],

[0.351278 0.0314841],

[0.352814 0.0316019])

Let's plot the bias


In [29]:
using Plots
gr();

In [30]:
Ts  = vcat(collect(10:10:90), collect(100:25:500))
β₁s = [0, 0.5, 0.9]
β̂ = [mc_ar1([β₀, β₁], σ², T, runs)[2][1] for T in Ts, β₁ in β₁s]


Out[30]:
26×3 Array{Float64,2}:
 10.9941  14.6665  47.8153
 10.5055  12.4088  29.8518
 10.3148  11.6217  23.7099
 10.2518  11.2978  20.117 
 10.2218  10.9851  17.9989
 10.1579  10.8395  16.6644
 10.1601  10.7212  15.8342
 10.1311  10.6556  15.0327
 10.1308  10.5689  14.2742
 10.0905  10.5009  13.9066
 10.0744  10.3914  13.0921
 10.0775  10.3431  12.5733
 10.0605  10.3021  12.2275
 10.0538  10.233   11.9413
 10.038   10.22    11.6707
 10.0423  10.1991  11.5436
 10.0482  10.1861  11.3962
 10.0357  10.1614  11.2768
 10.0464  10.1619  11.1753
 10.0286  10.1502  11.0631
 10.0229  10.1195  10.9764
 10.0223  10.1397  10.9571
 10.023   10.1125  10.8679
 10.0239  10.0953  10.846 
 10.0203  10.1026  10.7828
 10.0174  10.0975  10.7319

In [31]:
plot(Ts, β̂, label=string.(β₁s'))


Out[31]:
100 200 300 400 500 20 30 40 0.0 0.5 0.9

Given a certain sample size and estimated AR(1) coefficient, you can use the matrix for $\hat\beta$ (or the graph) to estimate the bias for $\beta_1$ (note that reported values are relative to 10).

Appendix: Julia performance profiling


In [32]:
# around 4s on my laptop
@time mc_simple(β, σ², T, 100_000);


  8.109794 seconds (11.30 M allocations: 12.243 GB, 23.10% gc time)

In [33]:
using ProfileView #run `Pkg.add("ProfileView")` if not yet installed


ArgumentError: Module ProfileView not found in current path.
Run `Pkg.add("ProfileView")` to install the ProfileView package.

 in require(::Symbol) at ./loading.jl:365
 in require(::Symbol) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?

In [34]:
Profile.clear()
@profile mc_simple(β, σ², T, 1_000)
ProfileView.view() #interactive graph with mouse over,  scroll and drag


UndefVarError: ProfileView not defined

In [35]:
# implementation that factorizes X only once
function mc_fact(β, σ², T, runs)
    
    # simulate X once, deterministically
    X = hcat(ones(T), rand(Normal(5, 2), T))
    
    # variables with mc results    
    β_mc    = zeros(runs, length(β))
    β_var_mc= zeros(runs) #only keep σ̂²T = dot(μ̂, μ̂) = σ̂²*(T-K) per run

    # pre-allocate
         = X * β
    μ_dist = Normal(0, σ²)
    x_fact = factorize(X)
    XtXinvd= diag(inv(X'*X))
    
    for run = 1:runs
        y =   + rand(μ_dist, T)
        β̂ = x_fact \ y #factorization already done now
        μ̂ = y - X * β̂
        σ̂²T = dot(μ̂, μ̂) #put factor /(T-K) outside of loop
        
        β_mc[run, :]    = β̂
        β_var_mc[run,:] = σ̂²T
    end
    se_true = std(β_mc, 1)
    se_mc   = sqrt(mean(β_var_mc) / (T - length(β)) * XtXinvd)
    return β, mean(β_mc, 1), se_true, se_mc
end


Out[35]:
mc_fact (generic function with 1 method)

In [36]:
# runs in about 2s on my laptop
mc_fact(β, σ², 25, 1) # first run includes JIT compilation
@time mc_fact(β, σ², T, 100_000)


  5.325076 seconds (6.80 M allocations: 10.332 GB, 29.86% gc time)
Out[36]:
([10,1],
[9.99992 1.00003],

[0.0841076 0.0155401],

[0.0837594,0.0154727])