HIV patients are visiting a health center for regular appointments. At each visit, they are given an appointment for their next visit. The date for the $v^{th}$ visit of patient $i$ is noted $V_v^i$, and the date of the appointment that was given for this visit is noted $A_v^i$. Thus, if $V_v^i > A_v^i$, the patient came late to his appointment, if $V_v^i < A_v^i$, the patient came early. We note $l_v^i = A_v^i - V_v^i$, the lag between scheduled appointment and actual visit.
The time elapsed between a visit and the next scheduled appointment is set by the national norms regarding the frequency at which HIV patients should be evaluated, depending on their condition and medical history. Patients recently enrolled in care will be seen more frequently than patients with a longer follow-up and no complications. We note ($f_v^i$) the visit frequency regimen of patient $i$ at visit $v$. This unit is usually around a multiple of 28 days, as patients are likely to have a favorite week day for visit.
We can finally express the time between two visits as : $$V_{v+1}^i - V_v^i = f_v^i + l_v^i $$
The date at which a visit is recorded in an EMR is $R(V_v^i)$. By definition, $R(V_v^i) \geq V_v^i$, and the delay in data entry is noted : $$R(V_v^i) - V_v^i = \delta_v^i \geq 0$$
$\delta$ may vary in a facility, depending on the workload, staffing or other factors. In some cases, the visit has not and will never be recorded. I will note this situation as $\delta \rightarrow \infty$.
Finally, data entry is interrupted at date $T_{close}$ before the data is used for analysis. The time elapsed between patient $i$'s last visit and the closing date is noted as $G_i = T_{close} - \max_v(A_v^i)$. For simplicity, we will equate the date of database closure with the date of analysis in a first step, and will relax this assumption when we will be measuring data maturity.
A central piece of the LTFU definition is the \textit{grace period} during which a patient, even if he did not return to a facility, is considered actively followed. This \textit{grace period} is denoted $G_0$.
A patient $i$ is considered LTFU if he is late to his latest scheduled appointment for more than $G_0$ days.
$$l_{v^{*}}^i > G_0$$Looking closer at this definition, we can see it regroups three different situations :
Using this definition, we can express the probability that a patient is identified as LTFU based on the data at hand. Let's $X = 1$ be the event that a patient is actively in care, and $X = 0$ the event that the patient is LTFU. We can get $p(X = 0 | l_{v^{*}}^i \leq G_0)$ as the combination of elements we can measure :
$$p(X = 0 | l_{v^{*}}^i > G_0) = 1 - p(\infty > l_{v^{*}}^i > G_0) - p(\delta_{v^*+1} > G_0) $$We can understand $\infty > l_{v^{*}}^i \leq G_0$ as an intrinsic myopia of the health system, who can not predict the future, and $\delta_{v^*+1} > G_0$ as a data quality measure. Differentiating between these two terms is important in order to understand uncertainty in the LTFU rate and better measure retention in the cohort.
This simulated data will then be used to estimate our elements of interest
Measuring data quality impact : From the cohort simulations, I will measure the LTFU rate using different distributions of $\delta$. Different scenarios will be considered for data quality, varying both the mean and variance of $\delta$. Perfect data quality will be compared to situations with long delays of data entry, and situations with important data loss (high variance of $\delta$). The resulting observed variation in the LTFU rate will be described as the impact of data quality on the measure of retention.
Data maturity : As data is being entered in the EMR, or as missed visits are finally being made, the data for a given period will get completed, and patients actively on care are more and more considered so. As data maturity grows in the EMR, the data quality induced error is lowered. Varying $T_{close}$ can thus have an impact on the measure of retention of a patient on a given date. I will carry out the measure of retention using different closing dates for the database, and only using the data recorded before the closing date. These measures will allow me to define and test a Data Maturity metric, based on a combination of $f$, $l$ and $\delta$ that will allow us to identify the optimal minimum date of analysis to estimate retention rates in a program, and the optimal grace period $G_0$ to use for different levels of maturity.
Robust measures of retention Finally, we will consider more robust metrics that can be considered good proxies for retention. These metrics will include :
For each of these metrics, I will evaluate their capacity to measure retention in the cohort, by comparing with the reference measure of LTFU measured with perfect data. I will also evaluate the sensibility of these metrics to data quality and data maturity.
In [1]:
import numpy as np, pandas as pd, matplotlib.pyplot as plt
%matplotlib inline
#import ceam_public_health.components as cphc,\
# ceam_public_health.components.base_population
import ceam_tests.util as ctu
from ceam.framework.event import listens_for
from ceam.framework.population import uses_columns
from ceam import config
In [2]:
np.random.seed(0)
n_simulants = 1000
n_days = 365
t_timestep = 1 # days
t_start = pd.Timestamp('2010-01-01')
In [3]:
@listens_for('initialize_simulants')
@uses_columns(['age', 'sex', 'hiv_in_care','next_appointment_date' , 'initial_visit_date'])
def my_generate_base_population(event):
population = pd.DataFrame(index=event.index)
population['age'] = 0
population['sex'] = '-'
population['hiv_in_care'] = True
population['next_appointment_date'] = pd.Timestamp('2010-01-01').date()
population['initial_visit_date'] = pd.Timestamp('2010-01-01').date()
event.population_view.update(population)
In [12]:
class HIVFollowUp:
def __init__(self, drop_out_rate):
self.drop_out_rate = drop_out_rate
@listens_for('time_step' , priority=1)
@uses_columns(['hiv_in_care'] , 'hiv_in_care == True')
def drop_out(self, event):
n = len(event.index)
print(n)
drop_out_prob = 1 - np.exp(-self.drop_out_rate / 365)
drop_out_indicator = np.random.uniform(size=n) < drop_out_prob
drop_out_index = event.index[drop_out_indicator] & event.index[event.population.hiv_in_care == True]
event.population.hiv_in_care[drop_out_index] = False
event.population_view.update(event.population)
@listens_for('time_step' , priority=2)
@uses_columns(['next_appointment_date' , 'hiv_in_care'] , 'hiv_in_care == True')
def appointment(self , event):
#TODO This will take a value for appointment in the __init__
today = event.time.date()
visit_index = (event.population.next_appointment_date == today) #& (event.population.hiv_in_care == True)
event.population.next_appointment_date[visit_index] = today + pd.Timedelta(days=30.5)
event.population.next_appointment_date[visit_index] = event.population.next_appointment_date[visit_index]
event.population_view.update(event.population)
mu_drop_out_rate = 1.72
sigma_drop_out_rate = .12
drop_out_rate = np.random.normal(mu_drop_out_rate, sigma_drop_out_rate)
cases = {}
components = [my_generate_base_population,
HIVFollowUp(drop_out_rate)]
simulation = ctu.setup_simulation(components, population_size=n_simulants, start=t_start.date())
ctu.pump_simulation(simulation , time_step_days=2 , iterations = 365)
In [13]:
ctu.pump_simulation??
In [20]:
time_step_emitter(Event(simulation.population.population.index))
In [16]:
Out[16]:
In [11]:
simulation.population.clock()
Out[11]:
In [ ]: