In [1]:
using Distributions: TDist, ccdf
type FE_results
beta
SE
tstat
pval
end
function FE(Y, X, time_ind; time_effects=false)
T = Int(maximum(time_ind)-minimum(time_ind)+1) #amount of timesteps
nT = size(X) #total number of observations
N = Int(nT[1]/T) #amount of individuals
k = Int(nT[2]) #amount of parameters to be estimated
D = kron(eye(N),ones(T,1))
P_D = D*inv(D'*D)*D'
M_D = eye(nT[1]) - P_D
y_dev = M_D*Y #deviation from the mean for y
x_dev = M_D*X #deviation from the mean for x
if time_effects #time effects
time_effect = kron(ones(N,1), eye(T)) #per individual there can be an effect per timestep
x_dev = hcat(x_dev, time_effect)
k = size(x_dev)[2]
end
dof = nT[1]-N-k #degrees of freedom left over after having individual effects and estimating the k explanatory variables
β̂ = inv(x_dev'*x_dev)*x_dev'*y_dev
residuals = y_dev - x_dev*β̂
σ̂² = (residuals'*residuals)/dof
SE = sqrt(diag(σ̂²[1]*inv(x_dev'*x_dev)))
tstat = β̂ ./ SE
pval = 2 * ccdf(TDist(dof), abs(tstat))
FE_results(β̂, SE, tstat, pval)
end
Out[1]:
In [2]:
data, header = readcsv("Data_Baltagi.csv", header=true)
println.(header);
In [3]:
Y_temp = data[:, 10]
X_temp = data[:, 11:13]
year_temp = data[:,2]
state_temp = data[:,1]
observations = size(Y_temp)[1]
T = maximum(year_temp) - minimum(year_temp)+1 #amount of timesteps
N = observations/T
#make empty arrays to fill
lagged_obs = Int(observations - N) #removing the N*1 observation to get the amount of lagged observations
Y = zeros(lagged_obs, 1)
Y_lag = zeros(lagged_obs,1)
X = zeros(lagged_obs, size(X_temp)[2])
year = zeros(lagged_obs)
state = zeros(lagged_obs);
In [4]:
#we fill the data-arrays with the first time period dropped
new_i = 1
lag_i = 1
for i = 1:observations
if year_temp[i] != minimum(year_temp)
Y[new_i] = Y_temp[i]
X[new_i,:] = X_temp[i,:]
year[new_i] = year_temp[i]
state[new_i] = state_temp[i]
new_i += 1
end
if year_temp[i] != maximum(year_temp)
Y_lag[lag_i] = Y_temp[i]
lag_i += 1
end
end
X = hcat(Y_lag,X);
In [6]:
FE(Y,X,year; time_effects= false).beta
Out[6]:
In [5]:
FE(Y,X,year; time_effects= true).beta
Out[5]:
We see that the estimates are an exact match with those reported by Baltagi.
Here we have a lagged dependent variable as regressor, for large N and fixed T, this causes the estimates to be:
Inconsistency however decreases with T, this is known as the Nickel-bias. This comes from the correlation of the lagged dependent variable with the lagged residual, which is transfered to the residual of the present period because of autocorrelation.
??? The most efficient since RE and pooled OLS are both inconsistent both for T and N to infinity ???
This is not necessary because the first differenced transformation does exactly the same as the transformation used for the within estimator. Namely they both get rid of any individual-specific, time constant heterogeneity. The former does it by subtracting the previous period from a variable, the latter by subtracting the individual mean. So if the assumptions underlying the estimators are valid, they both give approximately the same estimates.
In [ ]: