Simulated Process

Patient Follow-up

Process Description

HIV patients are visiting a health center for regular appointments. At each visit, they are given an appointment for their next visit. The date for the $v^{th}$ visit of patient $i$ is noted $V_v^i$, and the date of the appointment that was given for this visit is noted $A_v^i$. Thus, if $V_v^i > A_v^i$, the patient came late to his appointment, if $V_v^i < A_v^i$, the patient came early. We note $l_v^i = A_v^i - V_v^i$, the lag between scheduled appointment and actual visit.

The time elapsed between a visit and the next scheduled appointment is set by the national norms regarding the frequency at which HIV patients should be evaluated, depending on their condition and medical history. Patients recently enrolled in care will be seen more frequently than patients with a longer follow-up and no complications. We note ($f_v^i$) the visit frequency regimen of patient $i$ at visit $v$. This unit is usually around a multiple of 28 days, as patients are likely to have a favorite week day for visit.

We can finally express the time between two visits as : $$V_{v+1}^i - V_v^i = f_v^i + l_v^i $$

Simulation Strategy

Patient $i$ comes for visit $V_0^i$
Patient $i$ is given an appointment at date $A_i$
For each time step $T$ after $V_0^i$ :
1. The probability $p^i_T$ that patient $i$ drops out is drawn, with $p^i_T \propto f(T)$ and $\frac{d}{dt}f < 0$
2. If $p^i_T < a$, patients drops out
3. If $p^i_T \geq a$ patients stays in care for the next time step
4. The probability $q^i_T$ that patient $i$ comes to the facility is drawn, with $q^i_T \propto g(T)$, ensuring a varying probability that the patient comes to a visit, with $\frac{d^2}{dt}g < 0$ and $\frac{d}{dt}g(A_i) = 0$.
5. If $q^i_T < b$ the patient does not come for a visit at this time step
6. If $q^i_T \geq b$ the patient comes for visit $V_1^i$ at this time step

Data Entry

Process Description

The date at which a visit is recorded in an EMR is $R(V_v^i)$. By definition, $R(V_v^i) \geq V_v^i$, and the delay in data entry is noted : $$R(V_v^i) - V_v^i = \delta_v^i \geq 0$$

$\delta$ may vary in a facility, depending on the workload, staffing or other factors. In some cases, the visit has not and will never be recorded. I will note this situation as $\delta \rightarrow \infty$.

Finally, data entry is interrupted at date $T_{close}$ before the data is used for analysis. The time elapsed between patient $i$'s last visit and the closing date is noted as $G_i = T_{close} - \max_v(A_v^i)$. For simplicity, we will equate the date of database closure with the date of analysis in a first step, and will relax this assumption when we will be measuring data maturity.

Metrics of interest

Loss to Follow Up definition

A central piece of the LTFU definition is the \textit{grace period} during which a patient, even if he did not return to a facility, is considered actively followed. This \textit{grace period} is denoted $G_0$.

A patient $i$ is considered LTFU if he is late to his latest scheduled appointment for more than $G_0$ days.

$$l_{v^{*}}^i > G_0$$

Looking closer at this definition, we can see it regroups three different situations :

$l_{v^{*}}^i \rightarrow \infty$ : The patient is LTFU and will never come back to the facility.
$\infty > l_{v^{*},i} > G_0$ : The patient is late to his appointment but will come back in the future.
$\delta_{v^*+1} > G_0$ : The patient came for his visit $v^{*} + 1$ but data entry took longer than the grace period and the visit was not entered at the time of database closure.

Using this definition, we can express the probability that a patient is identified as LTFU based on the data at hand. Let's $X = 1$ be the event that a patient is actively in care, and $X = 0$ the event that the patient is LTFU. We can get $p(X = 0 | l_{v^{*}}^i \leq G_0)$ as the combination of elements we can measure :

$$p(X = 0 | l_{v^{*}}^i > G_0) = 1 - p(\infty > l_{v^{*}}^i > G_0) - p(\delta_{v^*+1} > G_0) $$

We can understand $\infty > l_{v^{*}}^i \leq G_0$ as an intrinsic myopia of the health system, who can not predict the future, and $\delta_{v^*+1} > G_0$ as a data quality measure. Differentiating between these two terms is important in order to understand uncertainty in the LTFU rate and better measure retention in the cohort.

Quantities of interest

This simulated data will then be used to estimate our elements of interest

Measuring data quality impact : From the cohort simulations, I will measure the LTFU rate using different distributions of $\delta$. Different scenarios will be considered for data quality, varying both the mean and variance of $\delta$. Perfect data quality will be compared to situations with long delays of data entry, and situations with important data loss (high variance of $\delta$). The resulting observed variation in the LTFU rate will be described as the impact of data quality on the measure of retention.
Data maturity : As data is being entered in the EMR, or as missed visits are finally being made, the data for a given period will get completed, and patients actively on care are more and more considered so. As data maturity grows in the EMR, the data quality induced error is lowered. Varying $T_{close}$ can thus have an impact on the measure of retention of a patient on a given date. I will carry out the measure of retention using different closing dates for the database, and only using the data recorded before the closing date. These measures will allow me to define and test a Data Maturity metric, based on a combination of $f$, $l$ and $\delta$ that will allow us to identify the optimal minimum date of analysis to estimate retention rates in a program, and the optimal grace period $G_0$ to use for different levels of maturity.
Robust measures of retention Finally, we will consider more robust metrics that can be considered good proxies for retention. These metrics will include :
- The ratio of the corrected average number of registered visits on the expected number of followed patients
- The ratio of new to returning patients in the facility
- The probability that the rate of LTFU is higher than a given threshold

For each of these metrics, I will evaluate their capacity to measure retention in the cohort, by comparing with the reference measure of LTFU measured with perfect data. I will also evaluate the sensibility of these metrics to data quality and data maturity.



In [1]:

    
import numpy as np, pandas as pd, matplotlib.pyplot as plt
%matplotlib inline

#import ceam_public_health.components as cphc,\
#    ceam_public_health.components.base_population
import ceam_tests.util as ctu

from ceam.framework.event import listens_for
from ceam.framework.population import uses_columns
from ceam import config



In [2]:

    
np.random.seed(0)
n_simulants = 1000
n_days = 365
t_timestep = 1 # days
t_start = pd.Timestamp('2010-01-01')



In [3]:

    
@listens_for('initialize_simulants')
@uses_columns(['age', 'sex', 'hiv_in_care','next_appointment_date'  , 'initial_visit_date'])
def my_generate_base_population(event):
    population = pd.DataFrame(index=event.index)
    population['age'] = 0
    population['sex'] = '-'
    population['hiv_in_care'] = True
    population['next_appointment_date'] = pd.Timestamp('2010-01-01').date()
    population['initial_visit_date'] = pd.Timestamp('2010-01-01').date()

    event.population_view.update(population)



In [12]:

    
class HIVFollowUp:
    def __init__(self, drop_out_rate):
        self.drop_out_rate = drop_out_rate


    @listens_for('time_step' , priority=1)
    @uses_columns(['hiv_in_care'] , 'hiv_in_care == True')
    def drop_out(self, event):
        n = len(event.index)
        print(n)
        drop_out_prob = 1 - np.exp(-self.drop_out_rate / 365)
        drop_out_indicator = np.random.uniform(size=n) < drop_out_prob

        drop_out_index = event.index[drop_out_indicator] & event.index[event.population.hiv_in_care == True]

        event.population.hiv_in_care[drop_out_index] = False

        event.population_view.update(event.population)


    @listens_for('time_step' , priority=2)
    @uses_columns(['next_appointment_date' , 'hiv_in_care'] , 'hiv_in_care == True')
    def appointment(self , event):
        #TODO This will take a value for appointment in the __init__
        today = event.time.date()

        visit_index = (event.population.next_appointment_date == today) #& (event.population.hiv_in_care == True)

        event.population.next_appointment_date[visit_index] = today + pd.Timedelta(days=30.5)
        event.population.next_appointment_date[visit_index] = event.population.next_appointment_date[visit_index]
        event.population_view.update(event.population)
   
        
mu_drop_out_rate = 1.72
sigma_drop_out_rate = .12
drop_out_rate = np.random.normal(mu_drop_out_rate, sigma_drop_out_rate)




cases = {}
components = [my_generate_base_population,
              HIVFollowUp(drop_out_rate)]
simulation = ctu.setup_simulation(components, population_size=n_simulants, start=t_start.date())
ctu.pump_simulation(simulation , time_step_days=2 , iterations = 365)









    



---------------------------------------------------------------------------
FloatingPointError                        Traceback (most recent call last)
<ipython-input-12-f04fc676d283> in <module>()
     43               HIVFollowUp(drop_out_rate)]
     44 simulation = ctu.setup_simulation(components, population_size=n_simulants, start=t_start.date())
---> 45 ctu.pump_simulation(simulation , time_step_days=2 , iterations = 365)

/Users/gregoire/anaconda/lib/python3.5/site-packages/ceam_tests/util.py in pump_simulation(simulation, time_step_days, duration, iterations)
     53     while not should_stop():
     54         iteration_count += 1
---> 55         _step(simulation, time_step)
     56 
     57     return iteration_count

/Users/gregoire/anaconda/lib/python3.5/site-packages/ceam/framework/util.py in inner(*args, **kwargs)
     43             def inner(*args, **kwargs):
     44                 args, kwargs = injector[0](func, args, kwargs, *injector_args, **injector_kwargs)
---> 45                 return func(*args, **kwargs)
     46             return inner
     47         return wrapper

/Users/gregoire/anaconda/lib/python3.5/site-packages/ceam/framework/util.py in inner(*args, **kwargs)
     43             def inner(*args, **kwargs):
     44                 args, kwargs = injector[0](func, args, kwargs, *injector_args, **injector_kwargs)
---> 45                 return func(*args, **kwargs)
     46             return inner
     47         return wrapper

/Users/gregoire/anaconda/lib/python3.5/site-packages/ceam/framework/util.py in inner(*args, **kwargs)
     43             def inner(*args, **kwargs):
     44                 args, kwargs = injector[0](func, args, kwargs, *injector_args, **injector_kwargs)
---> 45                 return func(*args, **kwargs)
     46             return inner
     47         return wrapper

/Users/gregoire/anaconda/lib/python3.5/site-packages/ceam/framework/engine.py in _step(simulation, time_step, time_step_emitter, time_step__prepare_emitter, time_step__cleanup_emitter)
    110     _log.debug(simulation.current_time)
    111     time_step__prepare_emitter(Event(simulation.population.population.index))
--> 112     time_step_emitter(Event(simulation.population.population.index))
    113     time_step__cleanup_emitter(Event(simulation.population.population.index))
    114     simulation.current_time += time_step

/Users/gregoire/anaconda/lib/python3.5/site-packages/ceam/framework/event.py in emit(self, event)
     57         for priority_bucket in self.listeners:
     58             for listener in sorted(priority_bucket, key=lambda x: x.__name__):
---> 59                 listener(event)
     60 
     61     def __repr__(self):

/Users/gregoire/anaconda/lib/python3.5/site-packages/ceam/framework/util.py in inner(*args, **kwargs)
     42             @wraps(func)
     43             def inner(*args, **kwargs):
---> 44                 args, kwargs = injector[0](func, args, kwargs, *injector_args, **injector_kwargs)
     45                 return func(*args, **kwargs)
     46             return inner

/Users/gregoire/anaconda/lib/python3.5/site-packages/ceam/framework/population.py in _population_view_injector(self, func, args, kwargs, columns, query)
    270             for arg in args:
    271                 if isinstance(arg, Event):
--> 272                     arg = PopulationEvent.from_event(arg, view)
    273                     found = True
    274                 new_args.append(arg)

/Users/gregoire/anaconda/lib/python3.5/site-packages/ceam/framework/population.py in from_event(event, population_view)
    200     def from_event(event, population_view):
    201         if not population_view.manager.growing:
--> 202             population = population_view.get(event.index)
    203             return PopulationEvent(population.index, population, population_view, event.user_data, time=event.time)
    204         else:

/Users/gregoire/anaconda/lib/python3.5/site-packages/ceam/framework/population.py in get(self, index, omit_missing_columns)
     95 
     96         if self._query:
---> 97             pop = pop.query(self._query)
     98 
     99         if self._columns is None:

/Users/gregoire/anaconda/lib/python3.5/site-packages/pandas/core/frame.py in query(self, expr, inplace, **kwargs)
   2138         kwargs['level'] = kwargs.pop('level', 0) + 1
   2139         kwargs['target'] = None
-> 2140         res = self.eval(expr, **kwargs)
   2141 
   2142         try:

/Users/gregoire/anaconda/lib/python3.5/site-packages/pandas/core/frame.py in eval(self, expr, inplace, **kwargs)
   2207             kwargs['target'] = self
   2208         kwargs['resolvers'] = kwargs.get('resolvers', ()) + resolvers
-> 2209         return _eval(expr, inplace=inplace, **kwargs)
   2210 
   2211     def select_dtypes(self, include=None, exclude=None):

/Users/gregoire/anaconda/lib/python3.5/site-packages/pandas/computation/eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
    248         eng = _engines[engine]
    249         eng_inst = eng(parsed_expr)
--> 250         ret = eng_inst.evaluate()
    251 
    252         if parsed_expr.assigner is None and multi_line:

/Users/gregoire/anaconda/lib/python3.5/site-packages/pandas/computation/engines.py in evaluate(self)
     70         """
     71         if not self._is_aligned:
---> 72             self.result_type, self.aligned_axes = _align(self.expr.terms)
     73 
     74         # make sure no names in resolvers and locals/globals clash

/Users/gregoire/anaconda/lib/python3.5/site-packages/pandas/computation/align.py in _align(terms)
    135 
    136     # perform the main alignment
--> 137     typ, axes = _align_core(terms)
    138     return typ, axes
    139 

/Users/gregoire/anaconda/lib/python3.5/site-packages/pandas/computation/align.py in wrapper(terms)
     54             return _result_type_many(*term_values), None
     55 
---> 56         return f(terms)
     57     return wrapper
     58 

/Users/gregoire/anaconda/lib/python3.5/site-packages/pandas/computation/align.py in _align_core(terms)
     96                 reindexer_size = len(reindexer)
     97 
---> 98                 ordm = np.log10(abs(reindexer_size - term_axis_size))
     99                 if ordm >= 1 and reindexer_size >= 10000:
    100                     warnings.warn('Alignment difference on axis {0} is larger '

FloatingPointError: divide by zero encountered in log10



In [13]:

    
ctu.pump_simulation??



In [20]:

    
time_step_emitter(Event(simulation.population.population.index))









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-20-665adc8eddd2> in <module>()
----> 1 time_step_emitter(Event(simulation.population.population.index))

NameError: name 'time_step_emitter' is not defined



In [16]:









    Out[16]:





PopulationManager(_population=     next_appointment_date  hiv_in_care  age sex initial_visit_date
0                     NaT          0.0    0   -                NaT
1                     NaT          0.0    0   -                NaT
2                     NaT          0.0    0   -                NaT
3                     NaT          0.0    0   -                NaT
4                     NaT          0.0    0   -                NaT
5                     NaT          0.0    0   -                NaT
6                     NaT          0.0    0   -                NaT
7                     NaT          0.0    0   -                NaT
8                     NaT          0.0    0   -                NaT
9                     NaT          0.0    0   -                NaT
10                    NaT          0.0    0   -                NaT
11                    NaT          0.0    0   -                NaT
12                    NaT          0.0    0   -                NaT
13                    NaT          0.0    0   -                NaT
14                    NaT          0.0    0   -                NaT
15                    NaT          0.0    0   -                NaT
16                    NaT          0.0    0   -                NaT
17                    NaT          0.0    0   -                NaT
18                    NaT          0.0    0   -                NaT
19                    NaT          0.0    0   -                NaT
20                    NaT          0.0    0   -                NaT
21                    NaT          0.0    0   -                NaT
22                    NaT          0.0    0   -                NaT
23                    NaT          0.0    0   -                NaT
24                    NaT          0.0    0   -                NaT
25                    NaT          0.0    0   -                NaT
26                    NaT          0.0    0   -                NaT
27                    NaT          0.0    0   -                NaT
28                    NaT          0.0    0   -                NaT
29                    NaT          0.0    0   -                NaT
..                    ...          ...  ...  ..                ...
970                   NaT          0.0    0   -                NaT
971                   NaT          0.0    0   -                NaT
972                   NaT          0.0    0   -                NaT
973                   NaT          0.0    0   -                NaT
974                   NaT          0.0    0   -                NaT
975                   NaT          0.0    0   -                NaT
976                   NaT          0.0    0   -                NaT
977                   NaT          0.0    0   -                NaT
978                   NaT          0.0    0   -                NaT
979                   NaT          0.0    0   -                NaT
980                   NaT          0.0    0   -                NaT
981                   NaT          0.0    0   -                NaT
982                   NaT          0.0    0   -                NaT
983                   NaT          0.0    0   -                NaT
984                   NaT          0.0    0   -                NaT
985                   NaT          0.0    0   -                NaT
986                   NaT          0.0    0   -                NaT
987                   NaT          0.0    0   -                NaT
988                   NaT          0.0    0   -                NaT
989                   NaT          0.0    0   -                NaT
990                   NaT          0.0    0   -                NaT
991                   NaT          0.0    0   -                NaT
992                   NaT          0.0    0   -                NaT
993                   NaT          0.0    0   -                NaT
994                   NaT          0.0    0   -                NaT
995                   NaT          0.0    0   -                NaT
996                   NaT          0.0    0   -                NaT
997                   NaT          0.0    0   -                NaT
998                   NaT          0.0    0   -                NaT
999                   NaT          0.0    0   -                NaT

[1000 rows x 5 columns], growing= False, observers= defaultdict(<class 'set'>, {'next_appointment_date': set(), 'hiv_in_care': set(), 'age': set(), 'sex': set(), 'initial_visit_date': set()}))



In [11]:

    
simulation.population.clock()









    Out[11]:





Timestamp('2011-01-01 00:00:00')



In [ ]: