/snip calculate percentages of PPIs that are lansoprazole vs omeprazole vs pantoprazole in the GP prescribing dataset? I am involved in a project using GPRD data. We'd like to know how much PPI prescibing varies by PCT/CCG and also whether there is a difference between GP prescribing and prison prescribing. /


In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('datas/T201501PDPI+BNFT.CSV') #frpm http://datagov.ic.nhs.uk/presentation/2015_01_January/T201501PDPI+BNFT.CSV
df.columns = [x.strip() for x in df.columns] #clean column whitespace

In [3]:
gpdetail = pd.read_csv('datas/epraccur.csv') #from http://systems.hscic.gov.uk/data/ods/datadownloads/data-files/epraccur.zip
gpdetail = gpdetail.icol([1,2,10,26,15]) #throw away columns we don't care about
gpdetail.columns = ['PRACTICE', 'NAME', 'POSTCODE', 'Prescribing Setting', 'CCG']
setting_lookup =  {0:'Other' , 1:'WIC Practice' , 2:'OOH Practice' , 3:'WIC + OOH Practice' , 4:'GP Practice' , 
              8:'Public Health Service' , 9:'Community Health Service' , 10:'Hospital Service' , 11:'Optometry Service' , 
              12:'Urgent & Emergency Care' , 13:'Hospice' , 14:'Care Home / Nursing Home' , 15:'Border Force' , 
              16:'Young Offender Institution' , 17:'Secure Training Centre' , 18:'Secure Children\'s Home' , 19:'Immigration Removal Centre' , 
              20:'Court' , 21:'Police Custody' , 22:'Sexual Assault Referral Centre\'(SARC)' , 24:'Other – Justice Estate' , 
              25:'Prison'}

In [4]:
lans = 'Lansoprazole|Zoton'
omep = 'Omeprzole|Losec'
pant = 'Pantoprazole|Protium'
ppi = lans + '|' + omep + '|' + pant

In [7]:
df = df[df['BNF NAME'].str.contains(ppi)] #throw away everything that's not about a ppi

In [8]:
df = pd.merge(df, gpdetail, on='PRACTICE') #add gp details

In [9]:
df['Prescribing Setting'] = df['Prescribing Setting'].map(setting_lookup.get)

In [10]:
df1 = df[df['BNF NAME'].str.contains(lans)]
df2 = df[df['BNF NAME'].str.contains(omep)]
df3 = df[df['BNF NAME'].str.contains(pant)]

In [11]:
df1['ppi'] = 'lans'
df2['ppi'] = 'omep'
df3['ppi'] = 'pant'


/home/sam/.virtualenvs/pa-research/lib/python2.7/site-packages/IPython/kernel/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
/home/sam/.virtualenvs/pa-research/lib/python2.7/site-packages/IPython/kernel/__main__.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from IPython.kernel.zmq import kernelapp as app
/home/sam/.virtualenvs/pa-research/lib/python2.7/site-packages/IPython/kernel/__main__.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()

In [12]:
df_list = [df1[['ppi','ITEMS','CCG','Prescribing Setting']], df2[['ppi','ITEMS','CCG','Prescribing Setting']],
df3[['ppi','ITEMS','CCG','Prescribing Setting']]]

In [13]:
df4 = pd.concat(df_list)

In [14]:
df5 = df4.groupby(['ppi']).ITEMS.sum()

In [15]:
df5


Out[15]:
ppi
lans    1849958
omep      10251
pant      88701
Name: ITEMS, dtype: int64

In [16]:
pd.DataFrame(df5).apply(lambda x: 100*x/float(x.sum()))


Out[16]:
ITEMS
ppi
lans 94.922700
omep 0.525986
pant 4.551313

In [17]:
df6 = df4.groupby(['Prescribing Setting', 'ppi']).ITEMS.sum()

In [25]:
df6


Out[25]:
Prescribing Setting       ppi 
Care Home / Nursing Home  lans        193
                          omep          3
                          pant          3
Community Health Service  lans        534
                          omep         18
                          pant          9
GP Practice               lans    1846517
                          omep      10194
                          pant      88633
Hospice                   lans         84
                          omep          4
                          pant          4
Hospital Service          lans        101
                          omep         15
                          pant          1
OOH Practice              lans       1660
                          omep         11
                          pant         33
Other                     lans        310
                          omep          4
                          pant         10
Prison                    lans          3
Public Health Service     lans         69
                          pant          1
Urgent & Emergency Care   lans        136
                          pant          1
WIC + OOH Practice        lans         39
                          omep          1
WIC Practice              lans        312
                          omep          1
                          pant          6
Name: ITEMS, dtype: int64

In [20]:
pd.DataFrame(df6).to_csv('settingbyppi.csv')

In [21]:
df7 = df4.groupby(['CCG', 'ppi']).ITEMS.sum()

In [26]:
df7


Out[26]:
CCG  ppi 
00C  lans     4550
     omep       23
     pant       98
00D  lans    20786
     omep       52
     pant      348
00J  lans    12227
     omep       31
     pant      194
00K  lans    14608
     omep       33
     pant      291
00L  lans    21720
     omep       95
     pant      347
00M  lans    18581
     omep       57
     pant      262
00N  lans    13992
     omep       15
     pant      186
00P  lans    32016
     omep       43
     pant      262
00Q  lans     4031
     omep       38
     pant      207
00R  lans     5170
     omep       46
     pant      283
             ...  
99E  lans    11948
     omep       40
     pant      178
99F  lans     5780
     omep       35
     pant      102
99G  lans     5081
     omep       22
     pant      120
99H  lans     6109
     omep       50
     pant      366
99J  lans    11592
     omep       90
     pant     1010
99K  lans     5519
     omep       35
     pant      848
99M  lans     2827
     omep       32
     pant      495
99N  lans    11327
     omep       74
     pant      722
99P  lans    20023
     omep      158
     pant      625
99Q  lans     7615
     omep       67
     pant      217
Name: ITEMS, dtype: int64

In [24]:
pd.DataFrame(df7).to_csv('ccgbyppi.csv')

In [ ]: