I want to look at wether the patients Total Modified Hopkins Dementia Score is altered by whether the patient tested positive AT the visit. There has been some nominal concern that patients who are testing positive for Cocaine (or anything else) are still high (or recovering) and that could artificially lower thier TMHDS. The doctors in the clinic should be sending innebriated people home instead of testing them, but this is a nice second check.
So to look for this effect I am finding a set of patients in which they tested positive at one (or more visits) and then tested negative at one (or more visits) and had TMHDS at the corresponding visits. Using a Wilcoxen test (since the scores are non-normal) I'll look to see whether the visits with a positive drug test are lower then visits with a negative drug test.
In [1]:
import os, os.path
from pandas import *
from matplotlib import pyplot as plt
In [2]:
from pandas import HDFStore #This is a file storage format for large collections of data
store = HDFStore('/home/will/HIVReportGen/Data/BaseRedcap/HIVAIDSGeneticAnalys_DATA_LABELS_2013-01-16_1211.hdf')
redcap_data = store['redcap']
store.close()
In [3]:
drug_cols = [
'Amphetamines',
'Barbiturates',
'Benzodiazepines',
#'Cannabinoid', #remove Cannabus since the test can be positive for weeks after use.
'Cocaine + metabolite',
'Opiates',
'Phencyclidine'
]
data = redcap_data[drug_cols + ['Patient ID', 'Patient visit number', 'Total Modified Hopkins Dementia Score']]
data = data.rename(columns = {
'Patient visit number':'VisitNum',
'Total Modified Hopkins Dementia Score':'HIVDI'}).dropna()
In [4]:
pos_mask = data[drug_cols].any(axis = 1)
pos_scores = data[['Patient ID', 'VisitNum', 'HIVDI']][pos_mask]
neg_scores = data[['Patient ID', 'VisitNum', 'HIVDI']][~pos_mask]
pat_neg = neg_scores.groupby('Patient ID').mean()
pat_pos = pos_scores.groupby('Patient ID').mean()
merged_data = merge(pat_neg, pat_pos,
left_index = True,
right_index = True,
suffixes = ('_neg', '_pos'))
In [5]:
merged_data.boxplot();
plt.ylabel('TMHDS');
In [6]:
from scipy.stats import wilcoxon
_, pval = wilcoxon(merged_data['HIVDI_neg'], merged_data['HIVDI_pos'])
print 'P-value:', pval
print 'Num Patients:', len(merged_data.index)
In [7]:
data['Was-pos'] = pos_mask
data.boxplot(by = 'Was-pos');
plt.ylabel('TMHDS');
In [8]:
from scipy.stats import ks_2samp
_, pval = ks_2samp(pos_scores['HIVDI'], neg_scores['HIVDI'])
print 'P-value:', pval
Nope, I don't think the positive test alters the TMHDS. When I look at the same patient I don't see an effect of the positive test.
In [8]: