The initial code was checked in to the github repository.
There are several update steps that are expressed as functions. One is to take a matrix of linear predictor values and convert them to probabilities.
In [1]:
#computes predicted probabilities for multinomial regression, given X*beta
function multProbPrecompute(Xbeta)
numerator = exp.(Xbeta)
numerator ./ sum(numerator, dims = 2) # normalize rows to unit length
end
Out[1]:
This function copies the matrix Xbeta
in the call to exp(Xbeta)
, and creates another copy in the evaluation of prob
. This is how one would approach the task in R. In Julia you can do this in place.
It is a good practice to write tests as you go along so that you can verify the results even for very simple functions.
In [2]:
const n = 100
const num = 200
using Random
Random.seed!(1234321) # set the random number seed for reproducibility
const X = rand(n, num)
X[:, 1] .= 1
X
Out[2]:
In [3]:
using Distributions
const k = 4
const β = zeros(num, k)
rand!(Uniform(-1.5, 1.5), view(β, 1:10, :))
β[1:11, :]
Out[3]:
Notice that the const
declaration doesn't mean that the contents of the array must be constant. It just means that the type, size and location of the array cannot be changed.
In [4]:
const Xβ = X * β # linear predictor
probs = multProbPrecompute(Xβ)
size(probs)
Out[4]:
In [5]:
probs[1:10, :]
Out[5]:
The array of probabilities is oriented so that each row adds to 1.
In [6]:
extrema(sum(probs, dims=2))
Out[6]:
In languages like R and Julia where matrices are stored in column-major order it is an advantage to work with the transpose of this matrix.
Exponentiating and normalizing a vector can be combined into two loops. We exponentiate and accumulate the sum in one loop, and in the second loop normalize the probabilities. In Julia it is okay to use loops if that makes sense for the operation.
In [7]:
function expnorm!(v)
ss = sum(map!(exp, v, v))
v ./= ss
end
Out[7]:
In [8]:
vv = vec(Xβ[1, :])
Out[8]:
In [9]:
pr = expnorm!(vv)
Out[9]:
In [10]:
sum(pr)
Out[10]:
In [11]:
pr ≈ vec(probs[1,:])
Out[11]:
In [12]:
struct BCD{T<:AbstractFloat}
X::Matrix{T}
β::Matrix{T}
probs::Matrix{T}
z::Matrix{T}
end
The operation of updating the probabilities is done in-place.
In [13]:
using LinearAlgebra
function updateprobs!(bcd::BCD)
probs = bcd.probs
LinearAlgebra.mul!(probs, bcd.β', bcd.X')
for j in 1:size(probs, 2)
expnorm!(view(probs, :, j))
end
bcd
end
Out[13]:
The constructor for the type only requires X
and z
. The sizes of probs
and β can be inferred
In [14]:
function BCD(X::Matrix{T}, z::Matrix{T}) where {T<:AbstractFloat} # constructor
n, num = size(X)
k, r = size(z) # z is also transposed
if r ≠ n
throw(DimensionMismatch())
end
res = BCD(X, zeros(T, (num, k)), similar(z), z)
updateprobs!(res)
end
Out[14]:
To create such an object we need to simulate the data. Recall that the array probs
should be transposed to fit our new schema.
In [15]:
pr = probs'
Out[15]:
In [16]:
z = similar(pr)
Out[16]:
In [17]:
for j in 1:size(z, 2)
rand!(Multinomial(1, view(pr, :, j)), view(z, :, j))
end
Unfortunately, this doesn't work because the vector of probabilities for the Multinomial must be an Vector
. A SubArray
won't do.
In [18]:
for j in 1:size(z, 2)
rand!(Multinomial(1, vec(pr[:, j])), view(z, :, j))
end
In [19]:
z[:,1:4] # the probabilities for the first category are large
Out[19]:
In [20]:
sum(z, dims=2)
Out[20]:
In [21]:
bcd = BCD(X, z);
In [22]:
bcd.probs
Out[22]:
In [23]:
bcd.z
Out[23]:
Currently the function to evaluate the log-likelihood is
#computes multinomial log likelihood given X, beta, z
function loglikelihood(X,beta,z)
p=size(X)[2]
k=size(z)[2]
beta=reshape(beta,p,k)
probs=multProb(X,beta,k)
val=0
for i = 1:(size(X)[1])
val=val+log(probs[i,find(z[i,:].==1)])
end
beta=vec(beta)
return(val)
end
The find(z[i, :] .== 1)
call checks which element of each row is non-zero every time the log-likelihood is evaluated. But that never changes. Thus we can evaluate it once only and be done. The best option here is to change the BCD type and its constructor but, for illustration I will simply create a vector in the global workspace. (To change the type I would need to restart the session.)
In [42]:
const y = Int[]
for j in 1:size(z, 2)
append!(y, findfirst(view(z, :, j)))
end
In [41]:
length(y)
Out[41]:
In [26]:
using StatsBase
In [27]:
counts(y, 1:4)
Out[27]:
There is already a loglikelihood
function in the StatsBase
package so we will add a method to it rather than overwriting the name.
In [28]:
function StatsBase.loglikelihood(bcd::BCD{T}) where {T}
ss = zero(T)
probs = bcd.probs # in a productiuon version we would store y as bcd.y
for j in 1:length(y)
ss += log(probs[y[j], j])
end
ss
end
In [29]:
loglikelihood(bcd)
Out[29]:
In [30]:
100 * log(0.25)
Out[30]:
Note that, because updateprobs!
returns its argument, you can compose updating the probabilities and evaluating the loglikelihood, as is done in the existing function.
In [31]:
updateprobs!(bcd) |> loglikelihood
Out[31]:
which is equivalent to
In [32]:
loglikelihood(updateprobs!(bcd))
Out[32]:
In [33]:
@time loglikelihood(updateprobs!(bcd));
In [34]:
function multProb(X,beta,k)
p=size(X)[2]
n=size(X)[1]
beta=reshape(beta,p,k)
denominator=zeros(n)
numerator=exp(X*beta)
denominator=sum(numerator,2)
prob=numerator./denominator
beta=vec(beta)
return(prob)
end
function ll(X,beta,z)
p=size(X)[2]
k=size(z)[2]
beta=reshape(beta,p,k)
probs=multProb(X,beta,k)
val=0
for i = 1:(size(X)[1])
val=val+log(probs[i,find(z[i,:].==1)])
end
beta=vec(beta)
return(val)
end
Out[34]:
In [35]:
size(X), size(β)
Out[35]:
In [36]:
const zz = z'
Out[36]:
In [37]:
fill!(β, 0.25);
In [38]:
ll(X, β, zz)
In [39]:
@time ll(X, β, zz)