This Julia code is designed to exactly mimic the code written in cov.h, so that the output may be tested and verified. The following functionality should be tested
In [11]:
using PyPlot
A stationary gaussian with a 'squared exponential' taper
$$k(x, x^\prime | h, a, \mu, \sigma) = \exp \left ( \frac{-( x - x^\prime)^2 }{2 h^2} \right ) \frac{a^2}{2 \pi \sigma} \exp \left ( - \frac{[(x - \mu)^2 + (x^\prime - \mu)^2]}{2 \sigma^2}\right )$$here $h$ is a "bandwidth" that controls the power of the oscillations. If $h$ is small, then there will be high-frequency structure. If $h$ is large, then only low-frequency structure will remain.
The Haan window is
$$ w(r) = 0.5 + 0.5 \cos \left( \frac{\pi r}{r_0} \right) $$both the Matern kernel and the stationary Gaussian are tapered by the Haan window.
In [34]:
c_kms = 2.99792458e5 #km s^-1
function k_3_2(r::Array{Float64, 1}, a::Float64, l::Float64, r0::Float64)
taper = (0.5 .+ 0.5 * cos(pi * r/r0))
return taper .* a^2 .* (1 .+ sqrt(3) * r/l) .* exp(-sqrt(3) * r/l)
end
function k_3_2(r::Float64, a::Float64, l::Float64, r0::Float64)
taper = (0.5 + 0.5 * cos(pi * r/r0))
return taper * a^2 * (1 + sqrt(3) * r/l) * exp(-sqrt(3) * r/l)
end
function k_region(x0::Float64, x1::Float64, a::Float64, mu::Float64, sigma::Float64)
r_mu = c_kms/mu * sqrt((x0 - mu)^2 + (x1 - mu)^2)
r0 = 4.0 * sigma
r = c_kms/mu * abs(x0 - x1)
taper = (0.5 + 0.5cos(pi * r_mu/r0))
taper * a^2 /(2. * pi * sigma^2) * exp(-0.5 * (c_kms/mu)^2 * ((x0 - mu)^2 + (x1 - mu)^2)/sigma^2);
end
function k_region(x0::Array{Float64, 1}, x1::Array{Float64, 1}, a::Float64, mu::Float64, sigma::Float64)
r_mu = c_kms/mu * sqrt((x0 .- mu).^2 .+ (x1 .- mu).^2)
r0 = 4.0 * sigma
r = c_kms/mu * abs(x0 .- x1)
taper = (0.5 .+ 0.5cos(pi * r_mu/r0))
taper .* a^2 ./(2. * pi * sigma^2) .* exp(-0.5 * (c_kms/mu)^2 * ((x0 .- mu).^2 .+ (x1 .- mu).^2)/sigma^2);
end
Out[34]:
In [32]:
r = linspace(0., 5., 100);
#ys = Float64[k_3_2(r[i], 1., 1., 3.) for i = 1:length(r)]
ys = k_3_2(r, 1., 1., 3.)
plot(r, ys)
title("Matern k-3-2")
xlabel("r")
ylabel("k(r)")
Out[32]:
In [36]:
r = linspace(0., 5., 100);
#ys = Float64[k_region(2.5, r[i], 1., 1., 2.5, 3.) for i = 1:length(r)]
ys = k_region(2.5 * ones(Float64, (100,)), r, 1., 1., 2.5, 3.)
plot(r, ys)
title("Gaussian Region")
xlabel("r")
ylabel("k(r)")
Out[36]:
In [2]:
using NPZ
In [4]:
wls = npzread("../../data/WASP14/WASP-14_2009-06-15_04h13m57s_cb.spec.flux.wls.npy");
wl23 = wls[23,:]
N23 = length(wl23)
wl23 = reshape(wl23, (2298,))
Out[4]:
In [16]:
#Data arrays
N = 3000;
wl = linspace(5100., 5200., N);
We can also use the Julia function spdiagm(B, d[, m, n]), which functions nearly the same way as python's sparse.diags(). This is probably the quickest way to initialize a matrix with the Matern kernel.
However, in order to match the C code, we should try to initialize the matrix in the same way. First, this means determining the max_offset. Or, we could go through and find the max 1-index offset, then compute
max_offset = r0 * indexes
or something
In [12]:
max_sep = maximum(diff(wl))
max_offset = iceil(r0/max_sep)
#Compute what M will be
M = N + sum(Int64[2*(N - i) for i=1:max_offset])
Out[12]:
In [8]:
function create_sparse(wl::Array{Float64, 1}, a::Float64, l::Float64)
N = length(wl)
offset = 0
r0 = 6. * l
diags = {}
while offset < N
#Pairwise calculate rs
if offset == 0
rs = zeros(Float64, N)
else
rs = abs(wl[offset + 1:end] - wl[1:end-offset]) * c_kms ./ wl[offset + 1:end] #divide by the first number
end
k = similar(rs)
if minimum(rs) >= r0
break
end
k = k_3_2(rs, a, l, r0)
#k = (0.5 + 0.5cos(pi .* rs/r0)) .* amp .* (1 + sqrt(3) .* rs/l) .* exp(-sqrt(3) .* rs/l)
k[rs .>= r0] = 0
push!(diags, k)
offset += 1
end
#Mirror the diagonals
front = diags[end:-1:2]
diags = vcat(front, diags)
offsets = [i for i=(-offset + 1):(offset-1)]
return spdiagm(diags, offsets)
end
Out[8]:
In [35]:
function create_sparse_region(wl::Array{Float64, 1}, a::Float64, mu::Float64, sigma::Float64)
#First loop to determine the non-zero elements
r0 = 4.0 * sigma
M = 0
N = length(wl)
for i=1:N
for j=1:N
r_mu = c_kms/mu * sqrt((wl[i] - mu)^2 + (wl[j] - mu)^2)
if (r_mu < r0)
M += 1
end
end
end
#Empty arrays that will be filled with Indexes and Values
II = Array(Int, M)
JJ = Array(Int, M)
SS = Array(Float64, M)
ind = 1
for i=1:N
for j=1:N
r_mu = c_kms/mu * sqrt((wl[i] - mu)^2 + (wl[j] - mu)^2)
if (r_mu < r0)
II[ind] = i
JJ[ind] = j
SS[ind] = k_region(wl[i], wl[j], a, mu, sigma)
ind += 1
end
end
end
return sparse(II, JJ, SS, N, N)
end
Out[35]:
In [42]:
S = create_sparse(wl23, 1e-14, 8.) # + create_sparse_region(wl, 1., 2., 5150., 1.);
In [43]:
L = cholfact(S)
logdet(L)
Out[43]:
In [44]:
#Plot the results
imshow(S, origin="upper")
Out[44]:
In [27]:
#Show the 'sparsity' of the matrix
spy(S, markersize=1, marker=".")
Out[27]:
In [20]:
#For some reason ``is_ll`` will work correctly the first time I run the code after restarting the kernel,
#however after doing some runs with ``is_ll=true`` it will never go back.
#Data arrays
N = 30;
wl = linspace(5100., 5200., N);
ys = ones((N,));
S = create_sparse(wl, 1., 100.);
L = cholfact(S, false)
println(logdet(L));
print("is_ll=$(L.c.is_ll), is_super=$(L.c.is_super)")
In [21]:
#Data arrays
N = 30;
wl = linspace(5100., 5200., N);
ys = ones((N,));
S = create_sparse(wl, 1., 100.);
L = cholfact(S, true)
println(logdet(L));
print("is_ll=$(L.c.is_ll), is_super=$(L.c.is_super)")
In [22]:
#Data arrays
N = 3000;
wl = linspace(5100., 5200., N);
ys = ones((N,));
S = create_sparse(wl, 1., 100.);
L = cholfact(S, false)
println(logdet(L));
print("is_ll=$(L.c.is_ll), is_super=$(L.c.is_super)")
In [5]:
#Data arrays
N = 3000;
wl = linspace(5100., 5200., N);
ys = ones((N,));
S = create_sparse(wl, 1., 1.);
L = cholfact(S, true)
println(logdet(L));
print("is_ll=$(L.c.is_ll), is_super=$(L.c.is_super)")
We have methods for cholfact(A[, ll]) → CholmodFactor and solve and logdet. We can also specify the boolean argument ll to the cholfact method to describe whether we want in LL or LDL form. This should be helpful for comparing to the cov.h functions. Basically, we can select for is_super by choosing the size of the array and for is_ll=true by specifying it to the cholfact(S, true)
In [ ]:
N = 3000;
wl = linspace(5100., 5200., N);
S = create_sparse_region(wl, 5., 5150., 30.);
In [37]:
imshow(S, origin="upper")
Out[37]:
In [38]:
N = 3000;
wl = linspace(5100., 5200., N);
S = create_sparse(wl, 1., 100.) + create_sparse_region(wl, 5., 5150., 30.);
L = cholfact(S)
logdet(L)
Out[38]:
Being more proper about things, the multidimensional Gaussian is defined as
$$p(\textbf{y}) = \frac{1}{\sqrt{(2 \pi)^N \det(C)}} \exp\left ( -\frac{1}{2} \textbf{r}^T C^{-1} \textbf{r} \right ) $$Taking the log, we have
$$ \ln(p) = \ln \left ( \frac{1}{\sqrt{(2 \pi)^N \det(C)}} \right ) -\frac{1}{2} \textbf{r}^T C^{-1} \textbf{r} $$$$ \ln(p) = -\frac{1}{2} \textbf{r}^T C^{-1} \textbf{r} - \ln \sqrt{(2 \pi)^N \det(C) } $$$$ \ln(p) = -\frac{1}{2} \textbf{r}^T C^{-1} \textbf{r} - \ln (2 \pi)^N - \ln \det(C)^{1/2} $$$$ \ln(p) = -\frac{1}{2} \textbf{r}^T C^{-1} \textbf{r} - \frac{1}{2} \ln \det C - \frac{N}{2} \ln 2 \pi $$
In [40]:
#Create some fake residuals
ys = ones((N,));
S = create_sparse(wl, 1., 100.);
In [41]:
function chi2()
L = cholfact(S)
out = L \ ys;
lnp = dot(ys, out[:,1])
end
Out[41]:
In [42]:
chi2()
Out[42]:
In [35]:
function lnprob()
L = cholfact(S)
out = L \ ys;
lnp = -0.5 * (logdet(L) + dot(ys, out[:,1]) + N/2. * log(2pi))
end
Out[35]:
In [33]:
lnprob()
Out[33]:
In [27]:
function create_sparse_region_old(h, amp, mu, sigma, var=1, size=2300)
#For the given mu and sigma, find all of the indexes in the array that will be non-zero. A square aronud mup.
low = iceil(mu - 4sigma)
high = ifloor(mu + 4sigma)
len = high - low + 1
I = Int[i for i = low:high]
#Empty arrays that will be filled with Indexes and Values
II = Array(Int, len^2)
JJ = Array(Int, len^2)
SS = Array(Float64, len^2)
ind = 1
for i = low:high, j = low:high
II[ind] = i
JJ[ind] = j
SS[ind] = k(i, j, h, amp, mu, sigma)
ind += 1
end
return sparse(II, JJ, SS, size, size) + speye(Float64, size)
end
Out[27]: