Simple Data Generation for Tableau

We will store the data in data frames to aid with exporting to *.csv files


In [2]:
using DataFrames
using CSV
using Gadfly


┌ Info: Precompiling DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0]
└ @ Base loading.jl:1186
┌ Info: Precompiling CSV [336ed68f-0bac-5ca0-87d4-7b16caf5d00b]
└ @ Base loading.jl:1186
┌ Info: Loading DataFrames support into Gadfly.jl
└ @ Gadfly /home/aaron/.julia/packages/Gadfly/ew1SM/src/mapping.jl:228

Hazard Rate Data

To demonstrate the calculation of non-parametric rate estimator standard errors we will generate Makeham casualty events with Gompertz mortality.

Columnar Event Store Data

From previous work we know that the distribution of length of stays in the health system can be reasonably model with and Erlang phase type (gamma) distribution, usually with 2 to 4 phases, each 16 to 32 times steps long. Samples can be easily generated by summing the appropriate number of exponential distributions. To fully illustrate occupancy and census we need a model that incorporates services that are unavailable. To do this we will use three distributions:

  1. Occupied is an Erlang 4 phase with a phase size of 32
  2. Unavailable is an Erlang 3 phase with a phase size of 16
  3. Vacant is an Erlang 2 phase with a phase size of 8

The three models will be mixed with ratios ${\frac{4}{7},\frac{2}{7},\frac{1}{7}}$ for occupied, unavailable, and vacant, corresponding to an average $\frac{4\times4\times32}{4\times4\times32+1\times2\times8}=97\%$ occupancy. Typical for health system the end time of the stay is included in the length of stay, so that $duration = 1+stop-start$


In [3]:
n = 8192;
stays = DataFrame(
    event = collect(1:n),
    consumer = rand([4,4,4,4,3,3,2], n),
    producer = rand(1:16, n),
    operation = ones(Int64, n)
);
stays[:duration] = ceil.(
    Int64,
    (2 .^(1 .+ stays[:consumer])) .* (cumsum(-log.(1 .- rand(n,4)), dims = 2)[CartesianIndex.(1:n, stays[:consumer])])
)
sort!(stays,[:producer,:event]);
stays[:start] = [1; 1 .+ cumsum(stays[:duration])[1:end-1]];
stays[:stop] = cumsum(stays[:duration]);
stays[:delta] = zeros(Int64, n);
stays[[true; stays[2:end,:producer] .!= stays[1:(end-1),:producer]], :delta] = [
    0; 
    (
        stays[[true; stays[2:end,:producer] .!= stays[1:(end-1),:producer]],:start][2:end] -
        stays[[true; stays[2:end,:producer] .!= stays[1:(end-1),:producer]],:start][1:(end-1)]
    )
];
stays[:delta] = cumsum(stays[:delta]);
stays[:start] = stays[:start] - stays[:delta];
stays[:stop] = stays[:stop] - stays[:delta];
stays[:time] = stays[:start];
stays


Out[3]:
eventconsumerproduceroperationdurationstartstopdeltatime
Int64Int64Int64Int64Int64Int64Int64Int64Int64
193117917901
22441113480213080
326311552142680214
429311252692930269
553311272943200294
661411883214080321
799211174094250409
81004112724266970426
91264111126988090698
1012741121881010270810
11154211121028103901028
121834111051040114401040
13187411761145122001145
14190311561221127601221
152114111141277139001277
16232311911391148101391
172514111121482159301482
18260311291594162201594
192794111241623174601623
20285311231747176901747
21325311531770182201770
2234821141823182601823
233774111331827195901827
243974111761960213501960
254394111152136225002136
26449211122251226202251
274534111342263239602263
284734111102397250602397
29479411442507255002507
304824111012551265102551
&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip

Quickly plot the durations to qualitatively confirm the model


In [4]:
vstack(
    plot(x = stays[:duration][stays[:consumer] .== 2], Geom.histogram),
    plot(x = stays[:duration][stays[:consumer] .== 3], Geom.histogram),
    plot(x = stays[:duration][stays[:consumer] .== 4], Geom.histogram)
)


Out[4]:
x -700 -600 -500 -400 -300 -200 -100 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 -600 -580 -560 -540 -520 -500 -480 -460 -440 -420 -400 -380 -360 -340 -320 -300 -280 -260 -240 -220 -200 -180 -160 -140 -120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500 520 540 560 580 600 620 640 660 680 700 720 740 760 780 800 820 840 860 880 900 920 940 960 980 1000 1020 1040 1060 1080 1100 1120 1140 1160 1180 1200 -1000 0 1000 2000 -600 -550 -500 -450 -400 -350 -300 -250 -200 -150 -100 -50 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 1050 1100 1150 1200 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -200 -150 -100 -50 0 50 100 150 200 250 300 350 -150 -145 -140 -135 -130 -125 -120 -115 -110 -105 -100 -95 -90 -85 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 205 210 215 220 225 230 235 240 245 250 255 260 265 270 275 280 285 290 295 300 -200 0 200 400 -150 -140 -130 -120 -110 -100 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 x -250 -200 -150 -100 -50 0 50 100 150 200 250 300 350 400 450 -200 -190 -180 -170 -160 -150 -140 -130 -120 -110 -100 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 -200 0 200 400 -200 -180 -160 -140 -120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 160 180 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 -100 0 100 200 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 x -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 160 180 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 -100 0 100 200 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 160 180 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 -100 0 100 200 -80 -75 -70 -65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160

Copy the data to create the end events


In [17]:
temp = copy(stays);
temp[:operation] = -temp[:operation];
temp[:time] = temp[:stop]
stays = [stays; temp];
sort!(stays,[:producer,:time]);
stays[:record] = 1:(2*n);
stays


Out[17]:
eventconsumerproduceroperationdurationstartstopdeltatimerecord
114112081208011
2141-1208120802082
384119220930002093
4841-19220930003004
515211630130603015
61521-1630130603066
7233116930737503077
82331-16930737503758
9292112337639803769
102921-123376398039810
114441175399473039911
124441-175399473047312
1348411119474592047413
144841-1119474592059214
155241190593682059315
165241-190593682068216
177931165683747068317
187931-165683747074718
1985411184748931074819
208541-1184748931093120
21864111559321086093221
228641-115593210860108622
239541136108711220108723
249541-136108711220112224
25134411111112312330112325
2613441-1111112312330123326
27145411114123413470123427
2814541-1114123413470134728
2916531120134813670134829
3016531-120134813670136730
&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip

Store the data off in the current working directory


In [18]:
CSV.write("stays.csv", stays)


Out[18]:
CSV.Sink{Void,DataType}(    CSV.Options:
        delim: ','
        quotechar: '"'
        escapechar: '\\'
        missingstring: ""
        dateformat: nothing
        decimal: '.'
        truestring: 'true'
        falsestring: 'false'
        internstrings: true, IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1), "stays.csv", 72, true, String["event", "consumer", "producer", "operation", "duration", "start", "stop", "delta", "time", "record"], 10, false, Val{false})

Density Estimation Data

To illustrate linear smoothing denity estimation in Tableau we will generate $1024$ samples from a multimodal distribution composed of a mixture of binomial distributions on $\left\lbrace 0,\dots,128 \right\rbrace$, centred at $\left\lbrace 16,32,64\right\rbrace$, with weights $\left\lbrace \frac{2}{7},\frac{4}{7},\frac{1}{7} \right\rbrace$


In [16]:
samples = rand(1024, 128);
binomial = [64/128, 16/128, 16/128, 32/128, 32/128, 32/128, 32/128];
latent = binomial[convert.(Int64, ceil.(7*rand(1024, 1)))][:, 1];
density = DataFrame(
  observation = collect(1:1024),
  probability = latent,
  value = sum(samples .<= latent, dims = 2)[:, 1]
)


Out[16]:
observationprobabilityvalue
Int64Float64Int64
110.12510
220.2536
330.2538
440.2535
550.2539
660.2542
770.2528
880.12519
990.2532
10100.12515
11110.12517
12120.12518
13130.562
14140.2532
15150.564
16160.12521
17170.2520
18180.2532
19190.569
20200.559
21210.555
22220.2538
23230.2537
24240.2536
25250.2540
26260.563
27270.2527
28280.2537
29290.2538
30300.2532
&vellip&vellip&vellip&vellip

Store the data off in the current working directory.


In [68]:
CSV.write("density.csv", density)


Out[68]:
CSV.Sink{Void,DataType}(    CSV.Options:
        delim: ','
        quotechar: '"'
        escapechar: '\\'
        missingstring: ""
        dateformat: nothing
        decimal: '.'
        truestring: 'true'
        falsestring: 'false'
        internstrings: true, IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1), "density.csv", 30, true, String["observation", "probability", "value"], 3, false, Val{false})

Lead and Lag Data

To illustrate lead and lag level of detail calculations we will generate $1024$ steps from an $8$ state cyclic Markov process, with random transition probabilities.

We begin with a small helper function to generate the midpoints of a randomly choosen partition of unity.


In [5]:
function randmid(samples::Int64, partitions::Int64)
    temp = [zeros(Float64, samples) sort(rand(samples, partitions-1), 2) ones(Float64, samples)]
    return (temp[:,1:partitions] + temp[:,2:(1 + partitions)])/2
end


Out[5]:
randmid (generic function with 1 method)

Quick call to test the generation of the matrix


In [76]:
randmid(8,3)


Out[76]:
8×3 Array{Float64,2}:
 0.146474   0.433838  0.787363
 0.234415   0.514168  0.779753
 0.158424   0.519226  0.860802
 0.191161   0.536966  0.845804
 0.120408   0.396235  0.775827
 0.0387037  0.282678  0.743975
 0.0892761  0.484659  0.895383
 0.0582046  0.204756  0.646551

We use a random set of midpoints for the transition probabilities so that we can use the minimum index function to find the next state. Rather than recording the full transition matrix, we use a matrix the represents either step forward, no change, or step back.


In [73]:
function cyclicmarkov(steps::Int64, states::Int64)
    df = DataFrame(
        step = 1:steps,
        state = zeros(Int64, steps),
        transition = zeros(Int64, steps),
        probability = rand(steps),
        map = zeros(Float64, steps)
    )
    transitions = randmid(states, 3)
    for i = 1:(steps-1)
        df[:transition][i] = indmin(abs.(transitions[df[:state][i]+1, :] - df[:probability][i])) - 2
        df[:map][i] = transitions[df[:state][i]+1, df[:transition][i] + 2]
        df[:state][i+1] = (states + df[:transition][i] + df[:state][i]) % states
    end
    return df
end


Out[73]:
cyclicmarkov (generic function with 1 method)

Using the simulation function we generate 1024 samples and store them in a dataframe.


In [74]:
leadlag = cyclicmarkov(1024,8)


Out[74]:
stepstatetransitionprobabilitymap
11000.6233950.636166
22010.976920.97755
33100.6199940.565522
441-10.3034670.278459
55000.4078290.636166
66000.4470960.636166
77010.8644040.97755
88110.7899720.787063
99210.7601250.938649
1010300.8757060.784389
1111300.614570.784389
12123-10.2393610.286732
1313210.9415060.938649
1414300.5401560.784389
1515300.8822570.784389
1616300.7183740.784389
17173-10.1281920.286732
1818210.805530.938649
19193-10.233480.286732
2020200.3595160.468492
2121200.5983310.468492
2222200.3872960.468492
2323200.4391250.468492
2424210.9297390.938649
25253-10.1430370.286732
2626200.432440.468492
27272-10.04288060.0298436
2828110.8934670.787063
29292-10.008005790.0298436
3030110.9331190.787063
&vellip&vellip&vellip&vellip&vellip&vellip

Store the data in the current working directory


In [75]:
CSV.write("leadlag.csv", leadlag)


Out[75]:
CSV.Sink{Void,DataType}(    CSV.Options:
        delim: ','
        quotechar: '"'
        escapechar: '\\'
        missingstring: ""
        dateformat: nothing
        decimal: '.'
        truestring: 'true'
        falsestring: 'false'
        internstrings: true, IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1), "leadlag.csv", 38, true, String["step", "state", "transition", "probability", "map"], 5, false, Val{false})

Time to First Event Data

We initialize the data frame with 3 columns and 1024 samples

  1. The first column contains the subject subject, randomly selected from 64 identifiers.
  2. The second column contains the precursor to time, the cumulative sum of random increments from 1 to 8
  3. The third column contains the event, randomly selected from a central distribution on the letters A to E

In [45]:
longitudinal = DataFrame(
    subject = sort(rand(1:64, 1024)),
    time = cumsum(rand(1:8, 1024)),
    event = rand(['A', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'E'], 1024)
);
longitudinal


Out[45]:
subjecttimeevent
114'B'
219'B'
3110'C'
4117'C'
5121'B'
6127'B'
7135'A'
8138'C'
9141'B'
10147'B'
11153'C'
12158'B'
13159'C'
14166'C'
15171'B'
16178'B'
17181'A'
18184'D'
19186'C'
20193'B'
21196'B'
22199'D'
231100'C'
241102'D'
251106'A'
262109'C'
272110'E'
282118'D'
292122'D'
302125'B'
&vellip&vellip&vellip&vellip

Notice that the time is increasing across all subjects. Instead we want to be the cumulative sum strictly within each subject. To accomplish this we use logical indexing to pull the last time from each subject and subtract it from the times of the next subject.

First find the logic index of the row before the subject changed.


In [46]:
longitudinal[:before] = [longitudinal[:subject][1:end-1] .!= longitudinal[:subject][2:end];false];
longitudinal


Out[46]:
subjecttimeeventbefore
114'B'false
219'B'false
3110'C'false
4117'C'false
5121'B'false
6127'B'false
7135'A'false
8138'C'false
9141'B'false
10147'B'false
11153'C'false
12158'B'false
13159'C'false
14166'C'false
15171'B'false
16178'B'false
17181'A'false
18184'D'false
19186'C'false
20193'B'false
21196'B'false
22199'D'false
231100'C'false
241102'D'false
251106'A'true
262109'C'false
272110'E'false
282118'D'false
292122'D'false
302125'B'false
&vellip&vellip&vellip&vellip&vellip

Similarly find the logical index of the row after the subject changed.


In [47]:
longitudinal[:after] = [false; longitudinal[:before][1:end-1]];
longitudinal


Out[47]:
subjecttimeeventbeforeafter
114'B'falsefalse
219'B'falsefalse
3110'C'falsefalse
4117'C'falsefalse
5121'B'falsefalse
6127'B'falsefalse
7135'A'falsefalse
8138'C'falsefalse
9141'B'falsefalse
10147'B'falsefalse
11153'C'falsefalse
12158'B'falsefalse
13159'C'falsefalse
14166'C'falsefalse
15171'B'falsefalse
16178'B'falsefalse
17181'A'falsefalse
18184'D'falsefalse
19186'C'falsefalse
20193'B'falsefalse
21196'B'falsefalse
22199'D'falsefalse
231100'C'falsefalse
241102'D'falsefalse
251106'A'truefalse
262109'C'falsetrue
272110'E'falsefalse
282118'D'falsefalse
292122'D'falsefalse
302125'B'falsefalse
&vellip&vellip&vellip&vellip&vellip&vellip

Next initialize a column to contain the shift of baseline time for each subject.


In [48]:
longitudinal[:shift] = zeros(Int, 1024);
longitudinal


Out[48]:
subjecttimeeventbeforeaftershift
114'B'falsefalse0
219'B'falsefalse0
3110'C'falsefalse0
4117'C'falsefalse0
5121'B'falsefalse0
6127'B'falsefalse0
7135'A'falsefalse0
8138'C'falsefalse0
9141'B'falsefalse0
10147'B'falsefalse0
11153'C'falsefalse0
12158'B'falsefalse0
13159'C'falsefalse0
14166'C'falsefalse0
15171'B'falsefalse0
16178'B'falsefalse0
17181'A'falsefalse0
18184'D'falsefalse0
19186'C'falsefalse0
20193'B'falsefalse0
21196'B'falsefalse0
22199'D'falsefalse0
231100'C'falsefalse0
241102'D'falsefalse0
251106'A'truefalse0
262109'C'falsetrue0
272110'E'falsefalse0
282118'D'falsefalse0
292122'D'falsefalse0
302125'B'falsefalse0
&vellip&vellip&vellip&vellip&vellip&vellip&vellip

Into the baseline shift of each row following the change in subject place the difference between the successive start times of each subject.


In [49]:
longitudinal[:shift][longitudinal[:after]] = 
    longitudinal[:time][longitudinal[:before]] - 
    [0; longitudinal[:time][longitudinal[:before]][1:end-1]];
longitudinal


Out[49]:
subjecttimeeventbeforeaftershift
114'B'falsefalse0
219'B'falsefalse0
3110'C'falsefalse0
4117'C'falsefalse0
5121'B'falsefalse0
6127'B'falsefalse0
7135'A'falsefalse0
8138'C'falsefalse0
9141'B'falsefalse0
10147'B'falsefalse0
11153'C'falsefalse0
12158'B'falsefalse0
13159'C'falsefalse0
14166'C'falsefalse0
15171'B'falsefalse0
16178'B'falsefalse0
17181'A'falsefalse0
18184'D'falsefalse0
19186'C'falsefalse0
20193'B'falsefalse0
21196'B'falsefalse0
22199'D'falsefalse0
231100'C'falsefalse0
241102'D'falsefalse0
251106'A'truefalse0
262109'C'falsetrue106
272110'E'falsefalse0
282118'D'falsefalse0
292122'D'falsefalse0
302125'B'falsefalse0
&vellip&vellip&vellip&vellip&vellip&vellip&vellip

Fill in the zeroes by running a cummulative sum over the shift column.


In [50]:
longitudinal[:shift] = cumsum(longitudinal[:shift]);
longitudinal


Out[50]:
subjecttimeeventbeforeaftershift
114'B'falsefalse0
219'B'falsefalse0
3110'C'falsefalse0
4117'C'falsefalse0
5121'B'falsefalse0
6127'B'falsefalse0
7135'A'falsefalse0
8138'C'falsefalse0
9141'B'falsefalse0
10147'B'falsefalse0
11153'C'falsefalse0
12158'B'falsefalse0
13159'C'falsefalse0
14166'C'falsefalse0
15171'B'falsefalse0
16178'B'falsefalse0
17181'A'falsefalse0
18184'D'falsefalse0
19186'C'falsefalse0
20193'B'falsefalse0
21196'B'falsefalse0
22199'D'falsefalse0
231100'C'falsefalse0
241102'D'falsefalse0
251106'A'truefalse0
262109'C'falsetrue106
272110'E'falsefalse106
282118'D'falsefalse106
292122'D'falsefalse106
302125'B'falsefalse106
&vellip&vellip&vellip&vellip&vellip&vellip&vellip

Rectify the time for each subject by subtracting the baseline shift.


In [20]:
longitudinal[:time] = longitudinal[:time] - longitudinal[:shift];
longitudinal[1:8, :]


Out[20]:
subjecttimeeventbeforeaftershift
115Cfalsefalse0
217Bfalsefalse0
3115Dfalsefalse0
4120Bfalsefalse0
5124Cfalsefalse0
6132Cfalsefalse0
7140Cfalsefalse0
8145Cfalsefalse0

We can now export the data, choosing the first 3 columns.


In [21]:
writetable("longitudinal.csv", longitudinal[[:subject, :time, :event]])

Contingency Data

The goal is to generate four columns of dimensional classifiers, as two pairs of dependent dimensions. We start by initializing the two distributions to sample:


In [52]:
distributionone = [
    ["A" -1],
    ["A" -1],
    ["A" -1],
    ["A" 0],
    ["A" 0],
    ["A" 1],
    ["B" -1],
    ["B" 0],
    ["B" 0],
    ["B" 1],
    ["B" 1],
    ["B" 1]
];
distributiontwo = [
    ["C" -2],
    ["C" -1],
    ["C" -1],
    ["C" 0],
    ["C" 0],
    ["C" 0],
    ["C" 1],
    ["C" 1],
    ["C" 1],
    ["C" 1],
    ["C" 2],
    ["C" 2],
    ["C" 2],
    ["C" 2],
    ["C" 2],
    ["D" -2],
    ["D" -1],
    ["D" -1],
    ["D" 0],
    ["D" 0],
    ["D" 0],
    ["D" 1],
    ["D" 1],
    ["D" 2],
    ["E" -2],
    ["E" -2],
    ["E" -2],
    ["E" -1],
    ["E" -1],
    ["E" 0],
    ["E" 1],
    ["E" 1],
    ["E" 2],
    ["E" 2],
    ["E" 2],
    ["F" 2],
    ["F" 1],
    ["F" 1],
    ["F" 0],
    ["F" 0],
    ["F" 0],
    ["F" -1],
    ["F" -1],
    ["F" -1],
    ["F" -1],
    ["F" -2],
    ["F" -2],
    ["F" -2],
    ["F" -2],
    ["F" -2]
];

We then generate another 1024 samples to populate a data frame.


In [53]:
contingency = DataFrame(
    sample = 1:1024,
    preonetwo = rand(distributionone, 1024),
    prethreefour = rand(distributiontwo, 1024)
);
contingency[1:8, :]


Out[53]:
samplepreonetwoprethreefour
11Any["B" 0]Any["C" 0]
22Any["B" 0]Any["C" 0]
33Any["B" 1]Any["C" -1]
44Any["B" -1]Any["D" -1]
55Any["B" 1]Any["F" 0]
66Any["B" 0]Any["F" -2]
77Any["B" 1]Any["C" -1]
88Any["B" -1]Any["C" 1]

Next we extract the individual column elements into dimension columns.


In [54]:
contingency[:dimensionone] = vcat(contingency[:, :preonetwo]...)[:, 1];
contingency[:dimensiontwo] = vcat(contingency[:, :preonetwo]...)[:, 2];
contingency[:dimensionthree] = vcat(contingency[:, :prethreefour]...)[:, 1];
contingency[:dimensionfour] = vcat(contingency[:, :prethreefour]...)[:, 2];
contingency[1:8, :]


Out[54]:
samplepreonetwoprethreefourdimensiononedimensiontwodimensionthreedimensionfour
11Any["B" 0]Any["C" 0]B0C0
22Any["B" 0]Any["C" 0]B0C0
33Any["B" 1]Any["C" -1]B1C-1
44Any["B" -1]Any["D" -1]B-1D-1
55Any["B" 1]Any["F" 0]B1F0
66Any["B" 0]Any["F" -2]B0F-2
77Any["B" 1]Any["C" -1]B1C-1
88Any["B" -1]Any["C" 1]B-1C1

Finally we export our contingency table data.


In [55]:
writetable("contingency.csv", contingency[[:sample, :dimensionone, :dimensiontwo, :dimensionthree, :dimensionfour]])