In [1]:

    
import reader
data = reader.Data()









    



Local data read/write folder path:
	Default path: /Users/Dan/2017 spring/MATH 497/code and data/data/

Data: all_person_data 
File: all_person_data_Dan_20170406.pickle
File already exists.

Data: systemic_disease_list 
File: systemic_disease_list.pickle
File already exists.

Data: refractive_index 
File: 2017_03_30_refractive_index_columns.pickle
File already exists.

Data: demographics 
File: demographics_Dan_20170304.pickle
File already exists.

Data: visual_accuity 
File: 2017_03_30_visual_acuity_columns.pickle
File already exists.

Data: SNOMED_problem_list 
File: SNOMED_problem_list.pickle
File already exists.

Data: family_hist_for_Enc 
File: family_hist_for_Enc.pickle
File already exists.

Data: family_hist_list 
File: family_hist_list.pickle
File already exists.

Data: person_profile 
File: person_profile_df.pickle
File already exists.

Data: baseline_missingHandled 
File: baseline_missingHandled_Dan_20170406.pickle
File already exists.

Data: SL_Lens_for_Enc 
File: SL_Lens_for_Enc.pickle
File already exists.

Data: all_encounter_data 
File: all_encounter_data_Dan_20170330.pickle
File already exists.

Data: macula_findings_for_Enc 
File: macula_findings_for_Enc.pickle
File already exists.

Data: baseline_raw 
File: baseline_raw_Dan_20170406.pickle
File already exists.

Data: encounters 
File: encounters.pickle
File already exists.

Data: systemic_disease_for_Enc 
File: systemic_disease_for_Enc.pickle
File already exists.

Data: ICD_for_Enc 
File: ICD_for_Enc_Dan_20170304.pickle
File already exists.



In [2]:

    
data['all_person_data'].recent_DR.value_counts()









    Out[2]:





no_DR    12009
mNPDR     2214
PDR        964
MNPDR      654
SNPDR      198
Name: recent_DR, dtype: int64

Tried to merge the snomed_problem_list with all_encounter_data by date.



In [3]:

    
import datetime



In [4]:

    
d = data['all_encounter_data'][['Enc_Date', 'Person_Nbr']].copy()
d['Date'] = d.Enc_Date.map(lambda x: datetime.datetime(x.year, x.month, x.day).strftime('%Y-%m-%d'))
d['Enc_Nbr'] = d.index
d.head()









    Out[4]:






  
    
      
      Enc_Date
      Person_Nbr
      Date
      Enc_Nbr
    
    
      Enc_Nbr
      
      
      
      
    
  
  
    
      1043
      2016-03-08 06:15:00
      544674
      2016-03-08
      1043
    
    
      1802
      2016-05-13 03:45:00
      605657
      2016-05-13
      1802
    
    
      2698
      2014-06-08 10:15:00
      514762
      2014-06-08
      2698
    
    
      2966
      2016-06-24 03:15:00
      552364
      2016-06-24
      2966
    
    
      4091
      2015-10-29 19:45:00
      931187
      2015-10-29
      4091



In [5]:

    
d.shape[0]









    Out[5]:





61862



In [7]:

    
import pandas as pd
sno = pd.read_csv('snocodeTally.csv')
snocode = {k:list(v.Class)[0] for k,v in sno.groupby('SNOMED code')}



In [8]:

    
d1 = data['SNOMED_problem_list']
d1['Date'] = d1.Date_Created.map(lambda x: datetime.datetime(x.year, x.month, x.day).strftime('%Y-%m-%d'))
d1.head()









    Out[8]:






  
    
      
      Person_ID
      Person_Nbr
      Date_Created
      Concept_ID
      Description
      Date
    
  
  
    
      69610
      80d3df88-dddf-5ad3-7cc1-b7b1ac6151fa
      33
      2014-12-18 15:51:19.607
      41256004
      Presbyopia
      2014-12-18
    
    
      69608
      80d3df88-dddf-5ad3-7cc1-b7b1ac6151fa
      33
      2014-12-18 15:51:28.043
      41446000
      Blepharitis
      2014-12-18
    
    
      69609
      80d3df88-dddf-5ad3-7cc1-b7b1ac6151fa
      33
      2014-12-18 16:36:28.083
      313436004
      Type 2 diabetes mellitus without complication
      2014-12-18
    
    
      46510
      adca6fa4-e7d4-d7f8-cf41-27056662d84b
      89
      2014-08-12 03:04:55.010
      81416004
      Open angle with borderline findings
      2014-08-12
    
    
      46511
      adca6fa4-e7d4-d7f8-cf41-27056662d84b
      89
      2014-08-12 03:04:55.010
      28998008
      Retinal hemorrhage
      2014-08-12



In [9]:

    
d1['sno_diagnosis'] = d1.Concept_ID.map(lambda x: snocode.get(x, float('nan')))
d1 = d1[d1.sno_diagnosis.notnull()]
d1.shape[0]









    Out[9]:





5525

But the problem is not every record in snomed table has a linked encounter number with just date as the key.



In [10]:

    
d2 = d.merge(d1, left_on = ['Person_Nbr', 'Date'], right_on = ['Person_Nbr','Date'], how = 'outer')
#d2 = d2.drop([], 1)
d2.head()









    Out[10]:






  
    
      
      Enc_Date
      Person_Nbr
      Date
      Enc_Nbr
      Person_ID
      Date_Created
      Concept_ID
      Description
      sno_diagnosis
    
  
  
    
      0
      2016-03-08 06:15:00
      544674
      2016-03-08
      1043.0
      NaN
      NaT
      NaN
      NaN
      NaN
    
    
      1
      2016-05-13 03:45:00
      605657
      2016-05-13
      1802.0
      NaN
      NaT
      NaN
      NaN
      NaN
    
    
      2
      2014-06-08 10:15:00
      514762
      2014-06-08
      2698.0
      NaN
      NaT
      NaN
      NaN
      NaN
    
    
      3
      2016-06-24 03:15:00
      552364
      2016-06-24
      2966.0
      b5f2e4f6-89ba-b4d7-25b6-950421f87122
      2016-06-24 05:45:06.230
      422034002.0
      Diabetic retinopathy associated with type 2 di...
      1.0
    
    
      4
      2015-10-29 19:45:00
      931187
      2015-10-29
      4091.0
      NaN
      NaT
      NaN
      NaN
      NaN



In [11]:

    
d2.shape[0]









    Out[11]:





65595



In [33]:

    
d2['sno_diagnosis'] = d2.Concept_ID.map(lambda x: snocode.get(x, float('nan')))
temp = d2[d2.sno_diagnosis.notnull()]
temp[temp.Enc_Nbr.isnull()].head()









    Out[33]:






  
    
      
      Enc_Date
      Person_Nbr
      Date
      Enc_Nbr
      Person_ID
      Date_Created
      Concept_ID
      Description
      sno_diagnosis
    
  
  
    
      61948
      NaT
      112
      2014-08-11
      NaN
      adbfae8e-adbe-c019-839c-e9c58b2692dc
      2014-08-11 05:19:55.010
      312904009.0
      Moderate nonproliferative diabetic retinopathy
      1.0
    
    
      61949
      NaT
      567
      2014-08-11
      NaN
      87122fbc-f1b0-71d9-f040-c9e1b05adaae
      2014-08-11 17:34:55.010
      312903003.0
      Mild non-proliferative diabetic retinopathy
      1.0
    
    
      61950
      NaT
      844
      2014-08-08
      NaN
      d561da3d-65dd-c244-14a4-a66652a36416
      2014-08-08 22:19:55.010
      312903003.0
      Mild non-proliferative diabetic retinopathy
      1.0
    
    
      61951
      NaT
      1138
      2014-08-08
      NaN
      c1da4059-7cd1-f839-76df-7c98e0f69521
      2014-08-08 06:04:55.010
      312903003.0
      Mild non-proliferative diabetic retinopathy
      1.0
    
    
      61952
      NaT
      1218
      2014-08-12
      NaN
      643fd77c-bd6b-aea6-8d99-de7e81e874f3
      2014-08-12 00:04:55.010
      312904009.0
      Moderate nonproliferative diabetic retinopathy
      1.0

65% snomed diagnosed records have no corresponding encounter data.



In [34]:

    
print(temp[temp.Enc_Nbr.isnull()].shape[0]/float(temp.shape[0]))
print(temp[temp.Enc_Nbr.isnull()].shape[0])
print(temp.shape[0])









    



0.659255242227
3647
5532

Extract the encounters that are diagnosed as no_DR, no_vision_threatening and vision_threatening with SNOMED codes



In [14]:

    
d3 = d2[d2.sno_diagnosis.notnull()][['Person_Nbr', 'Enc_Nbr', 'sno_diagnosis']].copy()
d3.head()









    Out[14]:






  
    
      
      Person_Nbr
      Enc_Nbr
      sno_diagnosis
    
  
  
    
      3
      552364
      2966.0
      1.0
    
    
      5
      1048528
      4267.0
      1.0
    
    
      66
      415217
      21680.0
      2.0
    
    
      75
      702431
      22822.0
      1.0
    
    
      80
      994735
      23521.0
      2.0



In [15]:

    
d4 = data['all_encounter_data'][['Person_Nbr', 'DR_diagnosis', 'Enc_Date']].copy()
d4['Enc_Nbr'] = d4.index
mapping = {
    'no_DR': 0,
    'mNPDR': 1,
    'MNPDR': 1,
    'SNPDR': 2,
    'PDR':2
}
d4['vision_class'] = d4.DR_diagnosis.map(lambda x: mapping.get(x))
d4.head()









    Out[15]:






  
    
      
      Person_Nbr
      DR_diagnosis
      Enc_Date
      Enc_Nbr
      vision_class
    
    
      Enc_Nbr
      
      
      
      
      
    
  
  
    
      1043
      544674
      no_DR
      2016-03-08 06:15:00
      1043
      0
    
    
      1802
      605657
      no_DR
      2016-05-13 03:45:00
      1802
      0
    
    
      2698
      514762
      no_DR
      2014-06-08 10:15:00
      2698
      0
    
    
      2966
      552364
      mNPDR
      2016-06-24 03:15:00
      2966
      1
    
    
      4091
      931187
      no_DR
      2015-10-29 19:45:00
      4091
      0

Merge snomed-diagnosed encounters to the original encoutners by Person_Nbr and Enc_Nbr



In [16]:

    
d4 = d4.merge(d3, left_on = ['Person_Nbr', 'Enc_Nbr'], right_on = ['Person_Nbr', 'Enc_Nbr'], how='left')
d4.head()









    Out[16]:






  
    
      
      Person_Nbr
      DR_diagnosis
      Enc_Date
      Enc_Nbr
      vision_class
      sno_diagnosis
    
  
  
    
      0
      544674
      no_DR
      2016-03-08 06:15:00
      1043
      0
      NaN
    
    
      1
      605657
      no_DR
      2016-05-13 03:45:00
      1802
      0
      NaN
    
    
      2
      514762
      no_DR
      2014-06-08 10:15:00
      2698
      0
      NaN
    
    
      3
      552364
      mNPDR
      2016-06-24 03:15:00
      2966
      1
      1.0
    
    
      4
      931187
      no_DR
      2015-10-29 19:45:00
      4091
      0
      NaN

14% encounters that have both records got a different diagnosis



In [17]:

    
temp = d4[d4.sno_diagnosis.notnull()]
print(temp[temp.vision_class!=temp.sno_diagnosis].shape[0]/float(temp.shape[0]))
print(temp[temp.vision_class!=temp.sno_diagnosis].shape[0])
print(temp.shape[0])









    



0.140583554377
265
1885

Transfer the encounter-wise profile to person-wise profile

Map all the NaN value of snomed diagnosis in encounter profile to 0



In [18]:

    
import math
d4.sno_diagnosis = d4.sno_diagnosis.map(lambda x: 0 if math.isnan(x) else x)



In [19]:

    
d5 = data['all_person_data'][['worst_DR','recent_DR']].copy()
d5['worst_vis_icd'] = d5.worst_DR.map(lambda x: mapping[x])
d5['recent_vis_icd'] = d5.recent_DR.map(lambda x: mapping[x])



In [20]:

    
d5['worst_vis_sno'] = d4.groupby('Person_Nbr')['sno_diagnosis'].max()



In [21]:

    
import numpy as np
def recent_DR(groupbyblock):
    templist = groupbyblock.sort_values(['Enc_Date'],ascending=False)['sno_diagnosis'].values
    temp = np.where(templist!=0)[0]
    if len(temp) > 0:
        return templist[temp[0]]
    else:
        return 0

d5['recent_vis_sno'] = d4.groupby('Person_Nbr').apply(lambda x: recent_DR(x))



In [22]:

    
d5.head()









    Out[22]:






  
    
      
      worst_DR
      recent_DR
      worst_vis_icd
      recent_vis_icd
      worst_vis_sno
      recent_vis_sno
    
    
      Person_Nbr
      
      
      
      
      
      
    
  
  
    
      33
      no_DR
      no_DR
      0
      0
      0.0
      0.0
    
    
      89
      no_DR
      no_DR
      0
      0
      0.0
      0.0
    
    
      146
      no_DR
      no_DR
      0
      0
      0.0
      0.0
    
    
      196
      no_DR
      no_DR
      0
      0
      0.0
      0.0
    
    
      327
      no_DR
      no_DR
      0
      0
      0.0
      0.0



In [23]:

    
d5.shape[0]









    Out[23]:





16039

101 patients that diagnosed as no_DR got a different diagnosis with snomed (in both worst and recent case)



In [24]:

    
temp = d5[d5.worst_vis_icd==0]
temp[temp.worst_vis_icd!=temp.worst_vis_sno].shape[0]









    Out[24]:





101



In [25]:

    
temp = d5[d5.recent_vis_icd==0]
temp[temp.recent_vis_icd!=temp.recent_vis_sno].shape[0]









    Out[25]:





101

Less than 100 patients diagnosed as some kind of vision class got a different diagnosis with snomed



In [26]:

    
temp = d5[d5.worst_vis_icd!=0]
temp = temp[temp.worst_vis_sno!=0]
temp[temp.worst_vis_icd!=temp.worst_vis_sno].shape[0]









    Out[26]:





87



In [27]:

    
temp = d5[d5.recent_vis_icd!=0]
temp = temp[temp.recent_vis_sno!=0]
temp[temp.recent_vis_icd!=temp.recent_vis_sno].shape[0]









    Out[27]:





96

With ICD we had in total 4030 patients with DR, and with SNOMED we had in total 1053 patients with DR



In [28]:

    
d5[d5.worst_vis_icd!=0].shape[0]









    Out[28]:





4030



In [29]:

    
d5[d5.recent_vis_icd!=0].shape[0]









    Out[29]:





4030



In [30]:

    
d5[d5.worst_vis_sno!=0].shape[0]









    Out[30]:





1053



In [31]:

    
d5[d5.recent_vis_sno!=0].shape[0]









    Out[31]:





1053



In [ ]:

	Enc_Date	Person_Nbr	Date	Enc_Nbr
Enc_Nbr
1043	2016-03-08 06:15:00	544674	2016-03-08	1043
1802	2016-05-13 03:45:00	605657	2016-05-13	1802
2698	2014-06-08 10:15:00	514762	2014-06-08	2698
2966	2016-06-24 03:15:00	552364	2016-06-24	2966
4091	2015-10-29 19:45:00	931187	2015-10-29	4091

	Person_ID	Person_Nbr	Date_Created	Concept_ID	Description	Date
69610	80d3df88-dddf-5ad3-7cc1-b7b1ac6151fa	33	2014-12-18 15:51:19.607	41256004	Presbyopia	2014-12-18
69608	80d3df88-dddf-5ad3-7cc1-b7b1ac6151fa	33	2014-12-18 15:51:28.043	41446000	Blepharitis	2014-12-18
69609	80d3df88-dddf-5ad3-7cc1-b7b1ac6151fa	33	2014-12-18 16:36:28.083	313436004	Type 2 diabetes mellitus without complication	2014-12-18
46510	adca6fa4-e7d4-d7f8-cf41-27056662d84b	89	2014-08-12 03:04:55.010	81416004	Open angle with borderline findings	2014-08-12
46511	adca6fa4-e7d4-d7f8-cf41-27056662d84b	89	2014-08-12 03:04:55.010	28998008	Retinal hemorrhage	2014-08-12

	Enc_Date	Person_Nbr	Date	Enc_Nbr	Person_ID	Date_Created	Concept_ID	Description	sno_diagnosis
61948	NaT	112	2014-08-11	NaN	adbfae8e-adbe-c019-839c-e9c58b2692dc	2014-08-11 05:19:55.010	312904009.0	Moderate nonproliferative diabetic retinopathy	1.0
61949	NaT	567	2014-08-11	NaN	87122fbc-f1b0-71d9-f040-c9e1b05adaae	2014-08-11 17:34:55.010	312903003.0	Mild non-proliferative diabetic retinopathy	1.0
61950	NaT	844	2014-08-08	NaN	d561da3d-65dd-c244-14a4-a66652a36416	2014-08-08 22:19:55.010	312903003.0	Mild non-proliferative diabetic retinopathy	1.0
61951	NaT	1138	2014-08-08	NaN	c1da4059-7cd1-f839-76df-7c98e0f69521	2014-08-08 06:04:55.010	312903003.0	Mild non-proliferative diabetic retinopathy	1.0
61952	NaT	1218	2014-08-12	NaN	643fd77c-bd6b-aea6-8d99-de7e81e874f3	2014-08-12 00:04:55.010	312904009.0	Moderate nonproliferative diabetic retinopathy	1.0

	Person_Nbr	Enc_Nbr	sno_diagnosis
3	552364	2966.0	1.0
5	1048528	4267.0	1.0
66	415217	21680.0	2.0
75	702431	22822.0	1.0
80	994735	23521.0	2.0

	Person_Nbr	DR_diagnosis	Enc_Date	Enc_Nbr	vision_class	sno_diagnosis
0	544674	no_DR	2016-03-08 06:15:00	1043	0	NaN
1	605657	no_DR	2016-05-13 03:45:00	1802	0	NaN
2	514762	no_DR	2014-06-08 10:15:00	2698	0	NaN
3	552364	mNPDR	2016-06-24 03:15:00	2966	1	1.0
4	931187	no_DR	2015-10-29 19:45:00	4091	0	NaN

	worst_DR	recent_DR	worst_vis_icd	recent_vis_icd	worst_vis_sno	recent_vis_sno
Person_Nbr
33	no_DR	no_DR	0	0	0.0	0.0
89	no_DR	no_DR	0	0	0.0	0.0
146	no_DR	no_DR	0	0	0.0	0.0
196	no_DR	no_DR	0	0	0.0	0.0
327	no_DR	no_DR	0	0	0.0	0.0

Tried to merge the snomed_problem_list with all_encounter_data by date.

But the problem is not every record in snomed table has a linked encounter number with just date as the key.

65% snomed diagnosed records have no corresponding encounter data.

Extract the encounters that are diagnosed as no_DR, no_vision_threatening and vision_threatening with SNOMED codes

Transfer the original DR-diagnosis to vision-related class

Merge snomed-diagnosed encounters to the original encoutners by Person_Nbr and Enc_Nbr

14% encounters that have both records got a different diagnosis

Transfer the encounter-wise profile to person-wise profile

Map all the NaN value of snomed diagnosis in encounter profile to 0

101 patients that diagnosed as no_DR got a different diagnosis with snomed (in both worst and recent case)

Less than 100 patients diagnosed as some kind of vision class got a different diagnosis with snomed

With ICD we had in total 4030 patients with DR, and with SNOMED we had in total 1053 patients with DR