Stochastically Generated Clinical Visits

A Multi-Graph Micro-Simulation Example

Motivation

This notebook demonstrates a micro-simulation of patients visiting clinical staff. We were motivated to develop this micro-simulation in response to a challenge during privacy and anonymity software course to provide an example of a Graph that would be expected to occur in the health sciences and be analogous to the Graphs that occur social media. Specifically, we where challenged to provide an example of self-referential or recursive, dataset of relationships between pairs of people. To this end, when clinical staff are allowed to be patients as well, and patients are allowed to have multiple visits over a time span, the record of patient visits with clinical staff is a self-referential data set. This dataset turns out to be a Multi-Graph with the Edge sets parameterized by time, under the constraint that at any moment in time a patient vertex is unique in the set of all edges that exist at that moment. This constraint amounts to requiring that every patient is involved in only one visit at a time.

Motivation

In developing the micro-simulation we also fortuitously generated a dataset that is at higher risk of re-identification then would be naively calculated based solely on the uniform probability assumptions used in the standard risk calculations. This arose from an effort to make the micro-simulation more realistic by organizing patients and clinical staff into clinics, representing the exclusive employment of the clinical staff at the clinic and the exclusive registration of the patient at the clinic. We further stipulated that patients can only visit the clinical staff of the clinic to which the patient is registered, and that every clinical staff must be registered as a patient at a clinic that is different from the one at which they are employed. In effect for risk calculations, this reduces the population size to that of the average size of a single clinic, because each clinic is a distinct and independent population of patients.

Mathematical Model

Consider a fixed population, where any member can be a patient at any time. The population forms the Vertexes of the Graph.

$$ \begin{align} V & = \left\lbrace \text{person} : \text{person} \in \text{population} \right\rbrace \end{align} $$

A subset of this population are also clinical staff.

$$ \begin{align} V_\text{staff} & = \left\lbrace \text{person} : \text{person} \in V \text{, person is clinical staff} \right\rbrace\\ \end{align} $$

The patients visiting clinical staff on each day forms a set of edges parameterized by time.

$$ \begin{align} E_\text{day} & = \left\lbrace \left( \text{day}, \text{patient}, \text{staff} \right) : \text{patient} \in V \text{, staff} \in V_\text{staff} \text{, patient visits staff on day} \right\rbrace \end{align} $$

Mathematical Model

The number of possible realizations of clinical visit event datasets, regardless of the assignments of clinics, or entailing sampling probabilities, is fixed.

$$ \begin{align} N & = \left( \sum_{d = 0}^{\#\text{days}} \binom{\#\text{days}}{d} \left( \#V_\text{staff} \right)^d \right)^{\#V - \#V_\text{staff}} \times \left( \sum_{d = 0}^{\#\text{days}} \binom{\#\text{days}}{d} \left( \#V_\text{staff} - 1\right)^d \right)^{\#V_\text{staff}}\\ & = \left( \#V_\text{staff} + 1 \right)^{\#\text{days} \cdot \left( \#V - \#V_\text{staff} \right)} \times \left( \#V_\text{staff} \right)^{\#\text{days} \cdot \#V_\text{staff}} \end{align} $$

The left hand term in the product is the number of combinations of visits among patients who are not clinical staff. Conversely, the right hand term in the product is the number of combinations of visits among patients who are clinical staff

Mathematical Model

Even the most rudimentary assumption that there is an average number of expected visits in a time frame,

$$ \begin{align} \mathbb{P}\left[\text{visit} \middle\| \text{patient} \right] & = \frac{\text{expected visits}}{\#\text{days}} \end{align} $$

will cause the dataset of all clinical visit events to be non-uniformly sampled from the all possible realizations of datasets.

$$ \begin{align} \mathbb{P}\left[E\right] & = \left( \prod_{v \notin V_\text{staff}} \left( \frac{1}{\#V_\text{staff}} \right)^{d_v} \left( \frac{\text{expected visits}}{\#\text{days}} \right)^{d_v} \left( 1 - \frac{\text{expected visits}}{\#\text{days}} \right)^{\#\text{days} - d_v} \right)\\ & \quad \times \left( \prod_{v \in V_\text{staff}} \left( \frac{1}{\#V_\text{staff} - 1} \right)^{d_v} \left( \frac{\text{expected visits}}{\#\text{days}} \right)^{d_v} \left( 1 - \frac{\text{expected visits}}{\#\text{days}} \right)^{\#\text{days} - d_v} \right) \end{align} $$

In the product $d_v$ is the number of events of the person represented by edge $v$.

Mathematical Model

This construction naively assumes that any patient can visit any clinician on any given day, with uniform probability, provided no patient has two visits, Edges, on the same day.

$$ \begin{align} \mathbb{P} \left[ \text{staff} \middle\| \text{day, patient} \notin V_\text{staff} \right] & = \frac{1}{\#V_\text{staff}}\\ \mathbb{P} \left[ \text{staff} \middle\| \text{day, patient} \in V_\text{staff} \right] & = \frac{1}{\#V_\text{staff} - 1} \end{align} $$

Without altering the assumption of an average number of expected visits in a time frame, we can alter the probability of a patient visiting a particular clinical staff, from the uniform probability. We accomplish this by organizing patients and clinical staff into clinics, and staff into shifts.

Mathematical Model

The clinics and shift schedules are assigned randomly following a set of rules.

  1. Every person, including clinical staff, is assigned to be a patient at exactly one clinic.
  2. Additionally, every clinical staff is also assigned to be employed at exactly one clinic.
  3. For every staff, there clinic of employment must be different from there clinic of care.
  4. Each clinic must employ at least one clinical staff.
  5. On any day the number of staff scheduled to work must be proportional to expected number of daily visits

These rules constrain both the number of possible clinical staff a patient can visit, and the number of clinical staff seeing patients on any day, thus the decrease the denominator in the probability.

Mathematical Model

Aside from the total population, the number of clinic staff, the number of clinics, and the date range, the model has two parameters that can be set from the empirical statistics:

  1. The expected number of visits a patient will have in the date range.
  2. The expected daily workload of patients and single clinical staff will see.

For each clinic the model calculates the number of staff to schedule on every day, using the number of patients registered at the clinic, the number of days simulated, the expected number of visits, and the expected workload.

$$ \begin{align} \text{daily scheduled staff} & = \frac{\text{number of patients} \times \text{expected visits per patient}}{\text{days simulated} \times \text{expected daily client load per staff}} \end{align} $$

This number is truncated to stay between at least one and the total number of clinical staff employed at the clinic.

Simulation

The engine of the simulation has been wrapped in a module called MicroSimulation, which can be found on GitHub here.


In [1]:
include("MicroSimulation.jl")


LoadError: syntax: extra token "Registration" after end of expression
while loading /home/aaron/Development/tableau-examples/MicroSimulation.jl, in expression starting on line 90

 in include_from_node1(::String) at ./loading.jl:488
 in include_string(::String, ::String) at ./loading.jl:441

Simulation

Before running the stochastic event simulation a population must be created, randomly assigning clinics to patients and staff. We do this using the Registration constructor.

Simulation


In [2]:
?MicroSimulation.Registrations


Out[2]:
Registrations(personcount, staffcount, cliniccount)

A list of persons generated by incrementally assigning a person identifier, randomly assigning a clinic identifier from 1:cliniccount to the first staffcount persons, and randomly assigning a second clinic identifier to every person. The clinic a staff is employed at will never be the same as the clinic at which the staff is a patient.

Fields

  • personidentifiers: identifiers of the people.
  • staffclinics: identifiers of the clinic the person provides care.
  • patientclinics: identifiers of the clinic at which the person is a patient.
  • personcount: number of people.
  • staffcount: number of clinical staff.
  • cliniccount: number of clinics employing staff, and caring for patients.

Example

julia> rs = MicroSimulation.Registrations(10, 4, 2)
MicroSimulation.Registrations([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 1, 1, 2, 0, 0, 0, 0, 0, 0], [2, 2, 2, 1, 2, 2, 1, 1, 1, 1], 10, 4, 2)

Simulation

We will bootstrap the simulation from a rough aggregation of the fiscal 2016/17 visits statistics.

Statistic Value
Population $4067175$
Patients $3578579$
Patients Prescribed $2569015$
Providers $12486$
Providers Prescribing $10675$
Clinics $2687$
Patient Days $24438296$
Patient Prescribed Days $8030584$
Provider Days $1423060$
Visits $26360227$
Visits Prescribed $8140983$

Simulation

Simulation

The theoretical single day probability of revisit conditioned on the ordinal of the previous visit is approximately quadratic.

$$ \begin{align} \mathbb{P} \left[ v \middle\| v - 1 \right] & = 23\% \times \frac{ v \left( 365 - v \right) }{182.5^2} \end{align} $$

Simulation

The simulation will consist of $4100000$ patients, of which $13000$ are also clinical staff, distributed across $2700$ clinics.


In [3]:
rs = MicroSimulation.Registrations(4100000, 13000, 2700)


Out[3]:
MicroSimulation.Registrations([1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  4099991, 4099992, 4099993, 4099994, 4099995, 4099996, 4099997, 4099998, 4099999, 4100000], [1, 1, 1, 2, 2, 3, 3, 3, 3, 4  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [154, 117, 2665, 1203, 288, 646, 1160, 94, 1412, 2372  …  255, 148, 1502, 779, 223, 174, 1352, 2034, 1732, 1457], 4100000, 13000, 2700)

Simulation

Given the randomly generated assignments of clinics to patients and staff we can now stochastically generate visit events by sequentially sampling the conditional probabilities. This is done by calling the Events constructor.

Simulation


In [4]:
?MicroSimulation.Events


Out[4]:
Events(daterange, registrations, expectedvisits, expectedload)

A list of events stochastically generated by the microsimulation at the time of instantiation of the object. The process is parametertized by expectedvisits, the mean expected number of clinic visits a single person will have in the daterange, and by expectedload the expected number of patients a single clinician can attend to in a single day. These parameters are used to calculate the number of staff that are rostered on a single day.

Fields

  • days: days of the patient visits.
  • patientidentifiers: identifiers of the patients.
  • staffidentifiers: identifiers of the staff caring for the patients.

Example

julia> es = MicroSimulation.Events(Date(2014, 01, 01):Date(2014, 12, 31), MicroSimulation.Registrations(400, 10, 2), 4, 28)
MicroSimulation.Events(Date[2014-01-01, 2014-01-01, 2014-01-01, 2014-01-01, 2014-01-01, 2014-01-02, 2014-01-02, 2014-01-02, 2014-01-02, 2014-01-03  …  2014-12-29, 2014-12-30, 2014-12-30, 2014-12-30, 2014-12-30, 2014-12-31, 2014-12-31, 2014-12-31, 2014-12-31, 2014-12-31], [47, 78, 188, 267, 298, 286, 302, 387, 395, 13  …  365, 3, 56, 121, 232, 5, 35, 187, 224, 252], [5, 5, 6, 6, 5, 7, 7, 7, 7, 6  …  9, 7, 7, 7, 1, 6, 8, 6, 8, 6])

Simulation

We simulate a single year's events, where each person has $6$ visits on average, and each clinical staff can attend to $19$ patients in a single day on average.


In [6]:
evs = MicroSimulation.Events(Date(2016, 4, 1):Date(2017, 3, 31), rs, 6, 19)


Out[6]:
MicroSimulation.Events(Date[2016-04-01, 2016-04-01, 2016-04-01, 2016-04-01, 2016-04-01, 2016-04-01, 2016-04-01, 2016-04-01, 2016-04-01, 2016-04-01  …  2017-03-31, 2017-03-31, 2017-03-31, 2017-03-31, 2017-03-31, 2017-03-31, 2017-03-31, 2017-03-31, 2017-03-31, 2017-03-31], [93, 108, 150, 182, 205, 259, 265, 316, 319, 493  …  4099438, 4099508, 4099554, 4099592, 4099649, 4099650, 4099675, 4099729, 4099763, 4099869], [446, 134, 2656, 2761, 3516, 72, 8843, 12187, 10953, 3814  …  371, 7477, 12411, 4224, 11316, 6242, 4010, 10537, 7040, 6267])

Simulation

Write the Registrations and the Events to *.csv files.


In [14]:
f = open("registrations.csv", "w")
writecsv(f, ["personidentifier" "patientclinic" "staffclinic"])
writecsv(f, rs)
close(f)

In [16]:
f = open("events.csv", "w")
writecsv(f, ["personidentifier" "patientidentifier" "staffidentifier"])
writecsv(f, evs)
close(f)

Appendix

The observed empirical estimation of the conditional probability of revisit is not an artifact of the sampling process.


In [21]:
function uniformvisitsimulate(pop::Int64,days::Int64,visits::Int64)
    prob = visits/days
    ps = Vector{Int64}()
    ds = Vector{Int64}()
    cs = Vector{Int64}()
    bs = Vector{Int64}()
    as = Vector{Int64}()
    for p = 1:pop
        c = 0
        push!(ps, p)
        push!(ds, 0)
        push!(cs, c)
        push!(bs, 0)
        push!(as, 365)
        for d = 1:days
            if prob > rand()
                c += 1
                b = d - ds[end]
                as[end] = b
                push!(ps, p)
                push!(ds, d)
                push!(cs, c)
                push!(bs, b)
                push!(as, 365 - d)
            end
        end
    end
    return [ps cs ds bs as]
end


Out[21]:
uniformvisitsimulate (generic function with 2 methods)

In [22]:
a = uniformvisitsimulate(4000000,365,6)


Out[22]:
27995098×5 Array{Int64,2}:
       1  0    0    0   14
       1  1   14   14   22
       1  2   36   22   17
       1  3   53   17   26
       1  4   79   26   43
       1  5  122   43   25
       1  6  147   25   56
       1  7  203   56   98
       1  8  301   98   58
       1  9  359   58    6
       2  0    0    0   32
       2  1   32   32  156
       2  2  188  156   99
       ⋮                  
 3999998  4  295  175   70
 3999999  0    0    0    7
 3999999  1    7    7   24
 3999999  2   31   24   35
 3999999  3   66   35   52
 3999999  4  118   52  196
 3999999  5  314  196   51
 4000000  0    0    0   76
 4000000  1   76   76  189
 4000000  2  265  189   23
 4000000  3  288   23    1
 4000000  4  289    1   76

In [23]:
f = open("uniformvisits.csv", "w")
writecsv(f, ["personidentifier" "visitorder" "visitday" "daysbefore" "daysafter"])
writecsv(f, a)
close(f)

In [1]:
using PlotlyJS


INFO: Recompiling stale cache file C:\Users\aaronsheldon\.julia\lib\v0.6\FixedPointNumbers.ji for module FixedPointNumbers.
INFO: Recompiling stale cache file C:\Users\aaronsheldon\.julia\lib\v0.6\ColorTypes.ji for module ColorTypes.
INFO: Recompiling stale cache file C:\Users\aaronsheldon\.julia\lib\v0.6\PlotlyJS.ji for module PlotlyJS.

Plotly javascript loaded.

To load again call

init_notebook(true)


In [66]:
ps = (1 .- exp.(-0.03125 .* collect(1:365)));
ts = diagm(1 .- [ps; 0]) .+ diagm(ps, 1);
sum((ts^365)[1, :])


Out[66]:
1.0000000000000013

In [67]:
plot((ts^365)[1, :])


Out[67]:

In [70]:
det([0.0 0.5 0.5;0.5 0.0 0.5; 0.5 0.5 0.0])


Out[70]:
0.25