This notebook loads some data, reports some simple descriptive statistics (means, standard deviations etc) and shows a number of useful plots (scatter plots, histograms, time series plots).
Most of the descriptive stats use the standard package Statistics. The plots rely on the Plots package and the pdf and quantiles are from the Distributions package.
For more stat functions, see the StatsBase package. (Not used here.)
In [1]:
using Statistics, Dates, LinearAlgebra, DelimitedFiles, Distributions
include("printmat.jl")
include("printTable.jl") #a function for prettier table printing
Out[1]:
In [2]:
using Plots
#pyplot(size=(600,400))
gr(size=(480,320))
default(fmt = :svg)
In [3]:
x = readdlm("Data/MyData.csv",',',skipstart=1) #reading the csv file
#skip 1st line
println("\nfirst four lines of x:")
printmat(x[1:4,:])
In [4]:
ym = round.(Int,x[:,1]) #yearmonth, like 200712
(Rme,Rf,R) = (x[:,2],x[:,3],x[:,4]) #creating variables from columns of x
Re = R - Rf #do R .- Rf if R has several columns
println("first 4 obs of Rme and Re")
printmat([Rme[1:4,:] Re[1:4,:]])
dN = Date.(string.(ym),"yyyymm") #convert to string and then to Julia Date
printmat(dN[1:4])
In [5]:
μ = mean([Rme Re],dims=1) #,dims=1 to calculate average along a column
σ = std([Rme Re],dims=1) #do \sigma[Tab] to get σ
printTable([μ;σ],["Rme","Re"],["mean","std"])
In [6]:
println("\n","cov([Rme Re]): ")
printmat(cov([Rme Re]))
println("\n","cor([Rme Re]): ")
printmat(cor([Rme Re]))
A linear regression $ y_t = x_t'b + u_t $, where $x_t=[1;R^e_{m,t}]$.
Clearly, the first element of $b$ is the intercept and the second element is the slope coefficient.
The code below creates a $Tx2$ matrix x with $x'_t$ on row $t$. $y$ is a T-element vector (or $Tx1$).
The GLM package (not used here) has powerful regression methods. See https://github.com/JuliaStats/GLM.jl.
In [7]:
c = ones(size(Rme,1)) #a vector with ones, no. rows from variable
x = [c Rme] #x is a Tx2 matrix
y = copy(Re) #to get standard OLS notation
b2 = inv(x'x)*x'y #OLS according to a textbook
b = x\y #also OLS, quicker and numerically more stable
u = y - x*b #OLS residuals
R2a = 1 - var(u)/var(y) #R2, but that name is already taken
println("OLS coefficients, regressing Re on constant and Rme")
printTable([b b2],["Calc1","Calc2"],["c","Rme"])
printlnPs("R2: ",R2a)
printlnPs("no. of observations: ",size(Re,1))
In [8]:
Covb = inv(x'x)*var(u) #covariance matrix of b estimates
Stdb = sqrt.(diag(Covb)) #std of b estimates
tstat = (b .- 0)./Stdb #t-stats, replace 0 with your null hypothesis
println("\ncoeffs and t-stats")
printTable([b tstat],["Coef","t-stat"],["c","Rme"])
In [9]:
T = 100
x = randn(T,2) #T x 2 matrix, N(0,1) distribution
println("\n","mean and std of random draws: ")
μ = mean(x,dims=1)
σ = std(x,dims=1)
printTable([μ;σ],["series 1","series 2"],["mean","std"])
println("covariance matrix:")
printmat(cov(x))
println("correlation matrix:")
printmat(cor(x))
In [10]:
μ = [-1,10] #vector of means
Σ = [1 0.5; #covariance matrix
0.5 2]
T = 100
x = rand(MvNormal(μ,Σ),T)' #random numbers, T x 2, drawn from bivariate N(μ,Σ)
println("covariance matrix")
printmat(cov(x))
println("correlation matrix:")
printmat(cor(x))
In [11]:
N05 = quantile(Normal(0,1),0.05) #from the Distributions package
Chisq05 = quantile(Chisq(5),0.95)
println("\n","5th percentile of N(0,1) and 95th of Chisquare(5)") #lots of statistics functions
printmat([N05 Chisq05])
In [12]:
xTicksLoc = Dates.value.([Date(1980);Date(1990);Date(2000);Date(2010)])
xTicksLab = ["1980";"1990";"2000";"2010"] #crude way of getting the tick marks right
plot( dN,Rme,
linecolor = :blue,
legend = false,
xticks = (xTicksLoc,xTicksLab),
ylim = (-25,25),
title = "Time series plot: monthly US equity market excess return",
titlefont = font(10),
ylabel = "%" )
Out[12]:
In [13]:
scatter( Rme,Re,
fillcolor = :blue,
legend = false,
xlim = (-40,40),
ylim = (-40,60),
title = "Scatter plot: two monthly return series (and 45 degree line)",
titlefont = font(10),
xlabel = "Market excess return, %",
ylabel = "Excess returns on small growth stocks, %",
guidefont = font(8) )
plot!([-40;60],[-40;60],color=:black,linewidth=0.5) #easier to keep this outside plot()
Out[13]:
In [14]:
xGrid = -25:0.1:15
pdfX = pdf.(Normal(mean(Rme),std(Rme)),xGrid) #"Distributions" wants σ, not σ^2
histogram( Rme,bins = -25:1:15,
normalized = true, #normalized to have area=1
label = "histogram",
title = "Histogram: monthly US equity market excess return",
titlefont = font(10),
xlabel = "Market excess return, %",
legend = :topleft,
guidefont = font(8) ) #font size of axis labels
plot!(xGrid,pdfX,linewidth=3,label="fitted N()")
Out[14]:
In [ ]: