In [1]:
import reader
data = reader.Data()


Local data read/write folder path:
	Default path: /Users/Dan/2017 spring/MATH 497/code and data/data/

Data: all_person_data 
File: all_person_data_Dan_20170406.pickle
File already exists.

Data: systemic_disease_list 
File: systemic_disease_list.pickle
File already exists.

Data: refractive_index 
File: 2017_03_30_refractive_index_columns.pickle
File already exists.

Data: demographics 
File: demographics_Dan_20170304.pickle
File already exists.

Data: visual_accuity 
File: 2017_03_30_visual_acuity_columns.pickle
File already exists.

Data: SNOMED_problem_list 
File: SNOMED_problem_list.pickle
File already exists.

Data: family_hist_for_Enc 
File: family_hist_for_Enc.pickle
File already exists.

Data: family_hist_list 
File: family_hist_list.pickle
File already exists.

Data: person_profile 
File: person_profile_df.pickle
File already exists.

Data: baseline_missingHandled 
File: baseline_missingHandled_Dan_20170406.pickle
File already exists.

Data: SL_Lens_for_Enc 
File: SL_Lens_for_Enc.pickle
File already exists.

Data: all_encounter_data 
File: all_encounter_data_Dan_20170330.pickle
File already exists.

Data: macula_findings_for_Enc 
File: macula_findings_for_Enc.pickle
File already exists.

Data: baseline_raw 
File: baseline_raw_Dan_20170406.pickle
File already exists.

Data: encounters 
File: encounters.pickle
File already exists.

Data: systemic_disease_for_Enc 
File: systemic_disease_for_Enc.pickle
File already exists.

Data: ICD_for_Enc 
File: ICD_for_Enc_Dan_20170304.pickle
File already exists.

In [2]:
data['all_person_data'].recent_DR.value_counts()


Out[2]:
no_DR    12009
mNPDR     2214
PDR        964
MNPDR      654
SNPDR      198
Name: recent_DR, dtype: int64

Tried to merge the snomed_problem_list with all_encounter_data by date.


In [3]:
import datetime

In [4]:
d = data['all_encounter_data'][['Enc_Date', 'Person_Nbr']].copy()
d['Date'] = d.Enc_Date.map(lambda x: datetime.datetime(x.year, x.month, x.day).strftime('%Y-%m-%d'))
d['Enc_Nbr'] = d.index
d.head()


Out[4]:
Enc_Date Person_Nbr Date Enc_Nbr
Enc_Nbr
1043 2016-03-08 06:15:00 544674 2016-03-08 1043
1802 2016-05-13 03:45:00 605657 2016-05-13 1802
2698 2014-06-08 10:15:00 514762 2014-06-08 2698
2966 2016-06-24 03:15:00 552364 2016-06-24 2966
4091 2015-10-29 19:45:00 931187 2015-10-29 4091

In [5]:
d.shape[0]


Out[5]:
61862

In [7]:
import pandas as pd
sno = pd.read_csv('snocodeTally.csv')
snocode = {k:list(v.Class)[0] for k,v in sno.groupby('SNOMED code')}

In [8]:
d1 = data['SNOMED_problem_list']
d1['Date'] = d1.Date_Created.map(lambda x: datetime.datetime(x.year, x.month, x.day).strftime('%Y-%m-%d'))
d1.head()


Out[8]:
Person_ID Person_Nbr Date_Created Concept_ID Description Date
69610 80d3df88-dddf-5ad3-7cc1-b7b1ac6151fa 33 2014-12-18 15:51:19.607 41256004 Presbyopia 2014-12-18
69608 80d3df88-dddf-5ad3-7cc1-b7b1ac6151fa 33 2014-12-18 15:51:28.043 41446000 Blepharitis 2014-12-18
69609 80d3df88-dddf-5ad3-7cc1-b7b1ac6151fa 33 2014-12-18 16:36:28.083 313436004 Type 2 diabetes mellitus without complication 2014-12-18
46510 adca6fa4-e7d4-d7f8-cf41-27056662d84b 89 2014-08-12 03:04:55.010 81416004 Open angle with borderline findings 2014-08-12
46511 adca6fa4-e7d4-d7f8-cf41-27056662d84b 89 2014-08-12 03:04:55.010 28998008 Retinal hemorrhage 2014-08-12

In [9]:
d1['sno_diagnosis'] = d1.Concept_ID.map(lambda x: snocode.get(x, float('nan')))
d1 = d1[d1.sno_diagnosis.notnull()]
d1.shape[0]


Out[9]:
5525

But the problem is not every record in snomed table has a linked encounter number with just date as the key.


In [10]:
d2 = d.merge(d1, left_on = ['Person_Nbr', 'Date'], right_on = ['Person_Nbr','Date'], how = 'outer')
#d2 = d2.drop([], 1)
d2.head()


Out[10]:
Enc_Date Person_Nbr Date Enc_Nbr Person_ID Date_Created Concept_ID Description sno_diagnosis
0 2016-03-08 06:15:00 544674 2016-03-08 1043.0 NaN NaT NaN NaN NaN
1 2016-05-13 03:45:00 605657 2016-05-13 1802.0 NaN NaT NaN NaN NaN
2 2014-06-08 10:15:00 514762 2014-06-08 2698.0 NaN NaT NaN NaN NaN
3 2016-06-24 03:15:00 552364 2016-06-24 2966.0 b5f2e4f6-89ba-b4d7-25b6-950421f87122 2016-06-24 05:45:06.230 422034002.0 Diabetic retinopathy associated with type 2 di... 1.0
4 2015-10-29 19:45:00 931187 2015-10-29 4091.0 NaN NaT NaN NaN NaN

In [11]:
d2.shape[0]


Out[11]:
65595

In [33]:
d2['sno_diagnosis'] = d2.Concept_ID.map(lambda x: snocode.get(x, float('nan')))
temp = d2[d2.sno_diagnosis.notnull()]
temp[temp.Enc_Nbr.isnull()].head()


Out[33]:
Enc_Date Person_Nbr Date Enc_Nbr Person_ID Date_Created Concept_ID Description sno_diagnosis
61948 NaT 112 2014-08-11 NaN adbfae8e-adbe-c019-839c-e9c58b2692dc 2014-08-11 05:19:55.010 312904009.0 Moderate nonproliferative diabetic retinopathy 1.0
61949 NaT 567 2014-08-11 NaN 87122fbc-f1b0-71d9-f040-c9e1b05adaae 2014-08-11 17:34:55.010 312903003.0 Mild non-proliferative diabetic retinopathy 1.0
61950 NaT 844 2014-08-08 NaN d561da3d-65dd-c244-14a4-a66652a36416 2014-08-08 22:19:55.010 312903003.0 Mild non-proliferative diabetic retinopathy 1.0
61951 NaT 1138 2014-08-08 NaN c1da4059-7cd1-f839-76df-7c98e0f69521 2014-08-08 06:04:55.010 312903003.0 Mild non-proliferative diabetic retinopathy 1.0
61952 NaT 1218 2014-08-12 NaN 643fd77c-bd6b-aea6-8d99-de7e81e874f3 2014-08-12 00:04:55.010 312904009.0 Moderate nonproliferative diabetic retinopathy 1.0

65% snomed diagnosed records have no corresponding encounter data.


In [34]:
print(temp[temp.Enc_Nbr.isnull()].shape[0]/float(temp.shape[0]))
print(temp[temp.Enc_Nbr.isnull()].shape[0])
print(temp.shape[0])


0.659255242227
3647
5532

Extract the encounters that are diagnosed as no_DR, no_vision_threatening and vision_threatening with SNOMED codes


In [14]:
d3 = d2[d2.sno_diagnosis.notnull()][['Person_Nbr', 'Enc_Nbr', 'sno_diagnosis']].copy()
d3.head()


Out[14]:
Person_Nbr Enc_Nbr sno_diagnosis
3 552364 2966.0 1.0
5 1048528 4267.0 1.0
66 415217 21680.0 2.0
75 702431 22822.0 1.0
80 994735 23521.0 2.0

In [15]:
d4 = data['all_encounter_data'][['Person_Nbr', 'DR_diagnosis', 'Enc_Date']].copy()
d4['Enc_Nbr'] = d4.index
mapping = {
    'no_DR': 0,
    'mNPDR': 1,
    'MNPDR': 1,
    'SNPDR': 2,
    'PDR':2
}
d4['vision_class'] = d4.DR_diagnosis.map(lambda x: mapping.get(x))
d4.head()


Out[15]:
Person_Nbr DR_diagnosis Enc_Date Enc_Nbr vision_class
Enc_Nbr
1043 544674 no_DR 2016-03-08 06:15:00 1043 0
1802 605657 no_DR 2016-05-13 03:45:00 1802 0
2698 514762 no_DR 2014-06-08 10:15:00 2698 0
2966 552364 mNPDR 2016-06-24 03:15:00 2966 1
4091 931187 no_DR 2015-10-29 19:45:00 4091 0

Merge snomed-diagnosed encounters to the original encoutners by Person_Nbr and Enc_Nbr


In [16]:
d4 = d4.merge(d3, left_on = ['Person_Nbr', 'Enc_Nbr'], right_on = ['Person_Nbr', 'Enc_Nbr'], how='left')
d4.head()


Out[16]:
Person_Nbr DR_diagnosis Enc_Date Enc_Nbr vision_class sno_diagnosis
0 544674 no_DR 2016-03-08 06:15:00 1043 0 NaN
1 605657 no_DR 2016-05-13 03:45:00 1802 0 NaN
2 514762 no_DR 2014-06-08 10:15:00 2698 0 NaN
3 552364 mNPDR 2016-06-24 03:15:00 2966 1 1.0
4 931187 no_DR 2015-10-29 19:45:00 4091 0 NaN

14% encounters that have both records got a different diagnosis


In [17]:
temp = d4[d4.sno_diagnosis.notnull()]
print(temp[temp.vision_class!=temp.sno_diagnosis].shape[0]/float(temp.shape[0]))
print(temp[temp.vision_class!=temp.sno_diagnosis].shape[0])
print(temp.shape[0])


0.140583554377
265
1885

Transfer the encounter-wise profile to person-wise profile

Map all the NaN value of snomed diagnosis in encounter profile to 0


In [18]:
import math
d4.sno_diagnosis = d4.sno_diagnosis.map(lambda x: 0 if math.isnan(x) else x)

In [19]:
d5 = data['all_person_data'][['worst_DR','recent_DR']].copy()
d5['worst_vis_icd'] = d5.worst_DR.map(lambda x: mapping[x])
d5['recent_vis_icd'] = d5.recent_DR.map(lambda x: mapping[x])

In [20]:
d5['worst_vis_sno'] = d4.groupby('Person_Nbr')['sno_diagnosis'].max()

In [21]:
import numpy as np
def recent_DR(groupbyblock):
    templist = groupbyblock.sort_values(['Enc_Date'],ascending=False)['sno_diagnosis'].values
    temp = np.where(templist!=0)[0]
    if len(temp) > 0:
        return templist[temp[0]]
    else:
        return 0

d5['recent_vis_sno'] = d4.groupby('Person_Nbr').apply(lambda x: recent_DR(x))

In [22]:
d5.head()


Out[22]:
worst_DR recent_DR worst_vis_icd recent_vis_icd worst_vis_sno recent_vis_sno
Person_Nbr
33 no_DR no_DR 0 0 0.0 0.0
89 no_DR no_DR 0 0 0.0 0.0
146 no_DR no_DR 0 0 0.0 0.0
196 no_DR no_DR 0 0 0.0 0.0
327 no_DR no_DR 0 0 0.0 0.0

In [23]:
d5.shape[0]


Out[23]:
16039

101 patients that diagnosed as no_DR got a different diagnosis with snomed (in both worst and recent case)


In [24]:
temp = d5[d5.worst_vis_icd==0]
temp[temp.worst_vis_icd!=temp.worst_vis_sno].shape[0]


Out[24]:
101

In [25]:
temp = d5[d5.recent_vis_icd==0]
temp[temp.recent_vis_icd!=temp.recent_vis_sno].shape[0]


Out[25]:
101

Less than 100 patients diagnosed as some kind of vision class got a different diagnosis with snomed


In [26]:
temp = d5[d5.worst_vis_icd!=0]
temp = temp[temp.worst_vis_sno!=0]
temp[temp.worst_vis_icd!=temp.worst_vis_sno].shape[0]


Out[26]:
87

In [27]:
temp = d5[d5.recent_vis_icd!=0]
temp = temp[temp.recent_vis_sno!=0]
temp[temp.recent_vis_icd!=temp.recent_vis_sno].shape[0]


Out[27]:
96

With ICD we had in total 4030 patients with DR, and with SNOMED we had in total 1053 patients with DR


In [28]:
d5[d5.worst_vis_icd!=0].shape[0]


Out[28]:
4030

In [29]:
d5[d5.recent_vis_icd!=0].shape[0]


Out[29]:
4030

In [30]:
d5[d5.worst_vis_sno!=0].shape[0]


Out[30]:
1053

In [31]:
d5[d5.recent_vis_sno!=0].shape[0]


Out[31]:
1053

In [ ]: