We will store the data in data frames to aid with exporting to *.csv files
In [2]:
using DataFrames
using CSV
using Gadfly
To demonstrate the calculation of non-parametric rate estimator standard errors we will generate Makeham casualty events with Gompertz mortality.
From previous work we know that the distribution of length of stays in the health system can be reasonably model with and Erlang phase type (gamma) distribution, usually with 2 to 4 phases, each 16 to 32 times steps long. Samples can be easily generated by summing the appropriate number of exponential distributions. To fully illustrate occupancy and census we need a model that incorporates services that are unavailable. To do this we will use three distributions:
The three models will be mixed with ratios ${\frac{4}{7},\frac{2}{7},\frac{1}{7}}$ for occupied, unavailable, and vacant, corresponding to an average $\frac{4\times4\times32}{4\times4\times32+1\times2\times8}=97\%$ occupancy. Typical for health system the end time of the stay is included in the length of stay, so that $duration = 1+stop-start$
In [3]:
n = 8192;
stays = DataFrame(
event = collect(1:n),
consumer = rand([4,4,4,4,3,3,2], n),
producer = rand(1:16, n),
operation = ones(Int64, n)
);
stays[:duration] = ceil.(
Int64,
(2 .^(1 .+ stays[:consumer])) .* (cumsum(-log.(1 .- rand(n,4)), dims = 2)[CartesianIndex.(1:n, stays[:consumer])])
)
sort!(stays,[:producer,:event]);
stays[:start] = [1; 1 .+ cumsum(stays[:duration])[1:end-1]];
stays[:stop] = cumsum(stays[:duration]);
stays[:delta] = zeros(Int64, n);
stays[[true; stays[2:end,:producer] .!= stays[1:(end-1),:producer]], :delta] = [
0;
(
stays[[true; stays[2:end,:producer] .!= stays[1:(end-1),:producer]],:start][2:end] -
stays[[true; stays[2:end,:producer] .!= stays[1:(end-1),:producer]],:start][1:(end-1)]
)
];
stays[:delta] = cumsum(stays[:delta]);
stays[:start] = stays[:start] - stays[:delta];
stays[:stop] = stays[:stop] - stays[:delta];
stays[:time] = stays[:start];
stays
Out[3]:
Quickly plot the durations to qualitatively confirm the model
In [4]:
vstack(
plot(x = stays[:duration][stays[:consumer] .== 2], Geom.histogram),
plot(x = stays[:duration][stays[:consumer] .== 3], Geom.histogram),
plot(x = stays[:duration][stays[:consumer] .== 4], Geom.histogram)
)
Out[4]:
Copy the data to create the end events
In [17]:
temp = copy(stays);
temp[:operation] = -temp[:operation];
temp[:time] = temp[:stop]
stays = [stays; temp];
sort!(stays,[:producer,:time]);
stays[:record] = 1:(2*n);
stays
Out[17]:
Store the data off in the current working directory
In [18]:
CSV.write("stays.csv", stays)
Out[18]:
To illustrate linear smoothing denity estimation in Tableau we will generate $1024$ samples from a multimodal distribution composed of a mixture of binomial distributions on $\left\lbrace 0,\dots,128 \right\rbrace$, centred at $\left\lbrace 16,32,64\right\rbrace$, with weights $\left\lbrace \frac{2}{7},\frac{4}{7},\frac{1}{7} \right\rbrace$
In [16]:
samples = rand(1024, 128);
binomial = [64/128, 16/128, 16/128, 32/128, 32/128, 32/128, 32/128];
latent = binomial[convert.(Int64, ceil.(7*rand(1024, 1)))][:, 1];
density = DataFrame(
observation = collect(1:1024),
probability = latent,
value = sum(samples .<= latent, dims = 2)[:, 1]
)
Out[16]:
Store the data off in the current working directory.
In [68]:
CSV.write("density.csv", density)
Out[68]:
We begin with a small helper function to generate the midpoints of a randomly choosen partition of unity.
In [5]:
function randmid(samples::Int64, partitions::Int64)
temp = [zeros(Float64, samples) sort(rand(samples, partitions-1), 2) ones(Float64, samples)]
return (temp[:,1:partitions] + temp[:,2:(1 + partitions)])/2
end
Out[5]:
Quick call to test the generation of the matrix
In [76]:
randmid(8,3)
Out[76]:
We use a random set of midpoints for the transition probabilities so that we can use the minimum index function to find the next state. Rather than recording the full transition matrix, we use a matrix the represents either step forward, no change, or step back.
In [73]:
function cyclicmarkov(steps::Int64, states::Int64)
df = DataFrame(
step = 1:steps,
state = zeros(Int64, steps),
transition = zeros(Int64, steps),
probability = rand(steps),
map = zeros(Float64, steps)
)
transitions = randmid(states, 3)
for i = 1:(steps-1)
df[:transition][i] = indmin(abs.(transitions[df[:state][i]+1, :] - df[:probability][i])) - 2
df[:map][i] = transitions[df[:state][i]+1, df[:transition][i] + 2]
df[:state][i+1] = (states + df[:transition][i] + df[:state][i]) % states
end
return df
end
Out[73]:
Using the simulation function we generate 1024 samples and store them in a dataframe.
In [74]:
leadlag = cyclicmarkov(1024,8)
Out[74]:
Store the data in the current working directory
In [75]:
CSV.write("leadlag.csv", leadlag)
Out[75]:
We initialize the data frame with 3 columns and 1024 samples
subject, randomly selected from 64 identifiers.time, the cumulative sum of random increments from 1 to 8event, randomly selected from a central distribution on the letters A to E
In [45]:
longitudinal = DataFrame(
subject = sort(rand(1:64, 1024)),
time = cumsum(rand(1:8, 1024)),
event = rand(['A', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'E'], 1024)
);
longitudinal
Out[45]:
Notice that the time is increasing across all subjects. Instead we want to be the cumulative sum strictly within each subject. To accomplish this we use logical indexing to pull the last time from each subject and subtract it from the times of the next subject.
First find the logic index of the row before the subject changed.
In [46]:
longitudinal[:before] = [longitudinal[:subject][1:end-1] .!= longitudinal[:subject][2:end];false];
longitudinal
Out[46]:
Similarly find the logical index of the row after the subject changed.
In [47]:
longitudinal[:after] = [false; longitudinal[:before][1:end-1]];
longitudinal
Out[47]:
Next initialize a column to contain the shift of baseline time for each subject.
In [48]:
longitudinal[:shift] = zeros(Int, 1024);
longitudinal
Out[48]:
Into the baseline shift of each row following the change in subject place the difference between the successive start times of each subject.
In [49]:
longitudinal[:shift][longitudinal[:after]] =
longitudinal[:time][longitudinal[:before]] -
[0; longitudinal[:time][longitudinal[:before]][1:end-1]];
longitudinal
Out[49]:
Fill in the zeroes by running a cummulative sum over the shift column.
In [50]:
longitudinal[:shift] = cumsum(longitudinal[:shift]);
longitudinal
Out[50]:
Rectify the time for each subject by subtracting the baseline shift.
In [20]:
longitudinal[:time] = longitudinal[:time] - longitudinal[:shift];
longitudinal[1:8, :]
Out[20]:
We can now export the data, choosing the first 3 columns.
In [21]:
writetable("longitudinal.csv", longitudinal[[:subject, :time, :event]])
In [52]:
distributionone = [
["A" -1],
["A" -1],
["A" -1],
["A" 0],
["A" 0],
["A" 1],
["B" -1],
["B" 0],
["B" 0],
["B" 1],
["B" 1],
["B" 1]
];
distributiontwo = [
["C" -2],
["C" -1],
["C" -1],
["C" 0],
["C" 0],
["C" 0],
["C" 1],
["C" 1],
["C" 1],
["C" 1],
["C" 2],
["C" 2],
["C" 2],
["C" 2],
["C" 2],
["D" -2],
["D" -1],
["D" -1],
["D" 0],
["D" 0],
["D" 0],
["D" 1],
["D" 1],
["D" 2],
["E" -2],
["E" -2],
["E" -2],
["E" -1],
["E" -1],
["E" 0],
["E" 1],
["E" 1],
["E" 2],
["E" 2],
["E" 2],
["F" 2],
["F" 1],
["F" 1],
["F" 0],
["F" 0],
["F" 0],
["F" -1],
["F" -1],
["F" -1],
["F" -1],
["F" -2],
["F" -2],
["F" -2],
["F" -2],
["F" -2]
];
We then generate another 1024 samples to populate a data frame.
In [53]:
contingency = DataFrame(
sample = 1:1024,
preonetwo = rand(distributionone, 1024),
prethreefour = rand(distributiontwo, 1024)
);
contingency[1:8, :]
Out[53]:
Next we extract the individual column elements into dimension columns.
In [54]:
contingency[:dimensionone] = vcat(contingency[:, :preonetwo]...)[:, 1];
contingency[:dimensiontwo] = vcat(contingency[:, :preonetwo]...)[:, 2];
contingency[:dimensionthree] = vcat(contingency[:, :prethreefour]...)[:, 1];
contingency[:dimensionfour] = vcat(contingency[:, :prethreefour]...)[:, 2];
contingency[1:8, :]
Out[54]:
Finally we export our contingency table data.
In [55]:
writetable("contingency.csv", contingency[[:sample, :dimensionone, :dimensiontwo, :dimensionthree, :dimensionfour]])