In [1]:

    
# Disabling autosave to allow for notebook pairing via https://github.com/mwouts/jupytext.
%autosave 0









    














    



Autosave disabled

Measure calculations

This notebook describes how we perform measure calculations.

A measure is the compuation of a ratio between two values (a numerator and a denominator).

For instance:

The proportion of prescribing (either items or quantity) for a chemical that is not prescribed generically (eg Desogestrel prescribed as a branded product)
The amount of prescribing per population (eg High dose opioids per 1000 patients)

Measures are computed monthly for each practice, and are also aggregated at CCG and national level, and we call the results of the computations "measure values".

For each month, we compute the deciles in the distribution of measure values for practices and CCGs.

Certain measures are identified as cost-saving. For these measures, we want to know how much a practice or CCG would have saved if its prescribing had been in line with the practice or CCG at the 10th percentile.

For the measures on the website, these calculations are done by performing a series of queries against data in BigQuery. The code is here, and some of the queries are quite gnarly.

(Note: In the OpenPrescribing source, we store measure values for practices and CCGs in MeasureValue objects, and national measure values in MeasureGlobal objects. Deciles for practices and CCGs are also stored in MeasureGlobal objects. Cost-saving calculations are stored in both kinds of objects.)

This notebook works through these calculations for the Cerazette/Desogestrel measure using one month of relevant prescribing data, using Pandas.

The OpenPrescribing implementation metadata for the measure is here. Note that the measure value is computed as the ratio of the quantity of prescriptions for presentations whose BNF code starts with 0703021Q0B (ie branded prescriptions for Desogestrel) to the quantity of prescriptions for presentations whose BNF code starts with 0703021Q0 (ie all prescriptions for Desogestrel). Also note that the is_cost_based and low_is_good fields are true.

Setup

We need pandas for the computation, and requests to validate the computation against the OpenPrescribing implementation, via the OpenPrescribing API.



In [2]:

    
import pandas as pd
import requests

We can get the prescribing data from BigQuery. We're only interested in prescribing from GPs.



In [3]:

    
sql = '''
SELECT
    prescriptions.pct AS ccg,
    prescriptions.practice,
    prescriptions.bnf_code,
    prescriptions.bnf_name,
    prescriptions.quantity,
    prescriptions.actual_cost AS cost
FROM
    hscic.normalised_prescribing_standard AS prescriptions
    INNER JOIN hscic.practices
        ON prescriptions.practice = practices.code
WHERE
    prescriptions.month = TIMESTAMP('2018-08-01')
    AND prescriptions.bnf_code LIKE '0703021Q0%'
    AND practices.setting = 4
'''

prescriptions = pd.read_gbq(sql, 'ebmdatalab', dialect='standard')
prescriptions.head()









    Out[3]:







  
    
      
      ccg
      practice
      bnf_code
      bnf_name
      quantity
      cost
    
  
  
    
      0
      07G
      F81010
      0703021Q0AAAAAA
      Desogestrel_Tab 75mcg
      3136
      95.98536
    
    
      1
      06V
      D82001
      0703021Q0AAAAAA
      Desogestrel_Tab 75mcg
      5250
      160.02845
    
    
      2
      04H
      C84019
      0703021Q0AAAAAA
      Desogestrel_Tab 75mcg
      9212
      281.13790
    
    
      3
      10L
      J84013
      0703021Q0AAAAAA
      Desogestrel_Tab 75mcg
      5950
      181.61845
    
    
      4
      09H
      H82050
      0703021Q0AAAAAA
      Desogestrel_Tab 75mcg
      3640
      111.97010

We filter prescribing to find the prescriptions that contribute to the measure's numerator (that is, prescriptions for branded Desogestrel), and the prescriptions that contribute to the measure's denominator (that is, all prescriptions for Desogestrel).



In [4]:

    
numerators = prescriptions[prescriptions['bnf_code'].str.startswith('0703021Q0B')]
denominators = prescriptions[prescriptions['bnf_code'].str.startswith('0703021Q0')]

We can look at the numerators and deonominators for one particular practice.



In [5]:

    
numerators[numerators['practice'] == 'A81001']









    Out[5]:







  
    
      
      ccg
      practice
      bnf_code
      bnf_name
      quantity
      cost
    
  
  
    
      9628
      00K
      A81001
      0703021Q0BDAAAA
      Cerelle_Tab 75mcg
      336
      13.02744
    
    
      13309
      00K
      A81001
      0703021Q0BBAAAA
      Cerazette_Tab 75mcg
      672
      71.01925
    
    
      16352
      00K
      A81001
      0703021Q0BHAAAA
      Zelleta_Tab 75mcg
      700
      23.24791



In [6]:

    
denominators[denominators['practice'] == 'A81001']









    Out[6]:







  
    
      
      ccg
      practice
      bnf_code
      bnf_name
      quantity
      cost
    
  
  
    
      9628
      00K
      A81001
      0703021Q0BDAAAA
      Cerelle_Tab 75mcg
      336
      13.02744
    
    
      13309
      00K
      A81001
      0703021Q0BBAAAA
      Cerazette_Tab 75mcg
      672
      71.01925
    
    
      16352
      00K
      A81001
      0703021Q0BHAAAA
      Zelleta_Tab 75mcg
      700
      23.24791
    
    
      17966
      00K
      A81001
      0703021Q0AAAAAA
      Desogestrel_Tab 75mcg
      252
      7.68707

Note the branded prescriptions (Cerelle, Cerazette, Zelleta) are present in both the numerator and the denominator. Generic prescriptions are only present in the denominator.

Practices

We'll start with the calculations for practices.

Not every practice will have prescribed Desogestrel in January 2018, so we also want a list of all practices.



In [7]:

    
sql = 'SELECT code, name, ccg_id AS ccg FROM hscic.practices WHERE setting = 4 ORDER BY code'
practices = pd.read_gbq(sql, 'ebmdatalab', dialect='standard').set_index('code')
practices.head()









    Out[7]:







  
    
      
      name
      ccg
    
    
      code
      
      
    
  
  
    
      A81001
      THE DENSHAM SURGERY
      00K
    
    
      A81002
      QUEENS PARK MEDICAL CENTRE
      00K
    
    
      A81003
      VICTORIA MEDICAL PRACTICE
      00K
    
    
      A81004
      BLUEBELL MEDICAL CENTRE
      00M
    
    
      A81005
      SPRINGWOOD SURGERY
      00M

practices is a DataFrame indexed by practice code. We will add columns to practices as we compute the measure values, percentiles, and cost savings.

For each practice, we want to know the total quantity and total cost for Desogestrel prescriptions, as well as the quantity and cost for branded and for generic prescriptions.

We can get the total quantity and cost from the denominators, the quantity and cost of branded prescriptions from the numerators, and the quantity and cost of generic prescriptions by subtracting one from the other.



In [8]:

    
practices['quantity_total'] = denominators.groupby('practice')['quantity'].sum()
practices['cost_total'] = denominators.groupby('practice')['cost'].sum()
practices['quantity_branded'] = numerators.groupby('practice')['quantity'].sum()
practices['cost_branded'] = numerators.groupby('practice')['cost'].sum()

# Practices with no prescribing will have NaN values for the new columns.  This
# also means that the new columns will have dtypes of float64, but quantities
# should always be ints.
practices = practices.fillna(0)
practices['quantity_total'] = practices['quantity_total'].astype('int')
practices['quantity_branded'] = practices['quantity_branded'].astype('int')

practices['quantity_generic'] = practices['quantity_total'] - practices['quantity_branded']
practices['cost_generic'] = practices['cost_total'] - practices['cost_branded']

practices.head()









    Out[8]:







  
    
      
      name
      ccg
      quantity_total
      cost_total
      quantity_branded
      cost_branded
      quantity_generic
      cost_generic
    
    
      code
      
      
      
      
      
      
      
      
    
  
  
    
      A81001
      THE DENSHAM SURGERY
      00K
      1960
      114.98167
      1708
      107.29460
      252
      7.68707
    
    
      A81002
      QUEENS PARK MEDICAL CENTRE
      00K
      10021
      391.92138
      7676
      319.01973
      2345
      72.90165
    
    
      A81003
      VICTORIA MEDICAL PRACTICE
      00K
      0
      0.00000
      0
      0.00000
      0
      0.00000
    
    
      A81004
      BLUEBELL MEDICAL CENTRE
      00M
      6090
      198.78799
      1505
      58.71303
      4585
      140.07496
    
    
      A81005
      SPRINGWOOD SURGERY
      00M
      5376
      176.86098
      168
      17.75171
      5208
      159.10927

The measure value for this measure is the ratio of the quantity of branded prescriptions to the quantity of total prescriptions.



In [9]:

    
practices['quantity_ratio'] = practices['quantity_branded'] / practices['quantity_total']
practices.head()









    Out[9]:







  
    
      
      name
      ccg
      quantity_total
      cost_total
      quantity_branded
      cost_branded
      quantity_generic
      cost_generic
      quantity_ratio
    
    
      code
      
      
      
      
      
      
      
      
      
    
  
  
    
      A81001
      THE DENSHAM SURGERY
      00K
      1960
      114.98167
      1708
      107.29460
      252
      7.68707
      0.871429
    
    
      A81002
      QUEENS PARK MEDICAL CENTRE
      00K
      10021
      391.92138
      7676
      319.01973
      2345
      72.90165
      0.765991
    
    
      A81003
      VICTORIA MEDICAL PRACTICE
      00K
      0
      0.00000
      0
      0.00000
      0
      0.00000
      NaN
    
    
      A81004
      BLUEBELL MEDICAL CENTRE
      00M
      6090
      198.78799
      1505
      58.71303
      4585
      140.07496
      0.247126
    
    
      A81005
      SPRINGWOOD SURGERY
      00M
      5376
      176.86098
      168
      17.75171
      5208
      159.10927
      0.031250

Note the quantity_ratio for a practice with no prescribing is NaN. The OpenPrescribing implementation stores such values as database NULLs, or Python Nones.

We can compute each practice's percentile, according to their quantity_ratio.



In [10]:

    
# In order to match the percentiles calculated by the OpenPrescribing
# implementation (via BigQuery) we need to adjust the results of .rank() to
# ignore any NaNs.
#
# I can't see why this isn't the same as:
#
#   practices['quantity_ratio'].dropna().rank(method='min', pct=True) * 100
#
# But it's not, quite.
#
# Thanks Dave for working out what to do here!

ranks = practices['quantity_ratio'].rank(method='min')
num_non_nans = practices['quantity_ratio'].count()
practices['quantity_ratio_percentile'] = (ranks - 1) / ((num_non_nans - 1) / 100)
practices.head()









    Out[10]:







  
    
      
      name
      ccg
      quantity_total
      cost_total
      quantity_branded
      cost_branded
      quantity_generic
      cost_generic
      quantity_ratio
      quantity_ratio_percentile
    
    
      code
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      A81001
      THE DENSHAM SURGERY
      00K
      1960
      114.98167
      1708
      107.29460
      252
      7.68707
      0.871429
      87.542857
    
    
      A81002
      QUEENS PARK MEDICAL CENTRE
      00K
      10021
      391.92138
      7676
      319.01973
      2345
      72.90165
      0.765991
      81.257143
    
    
      A81003
      VICTORIA MEDICAL PRACTICE
      00K
      0
      0.00000
      0
      0.00000
      0
      0.00000
      NaN
      NaN
    
    
      A81004
      BLUEBELL MEDICAL CENTRE
      00M
      6090
      198.78799
      1505
      58.71303
      4585
      140.07496
      0.247126
      42.857143
    
    
      A81005
      SPRINGWOOD SURGERY
      00M
      5376
      176.86098
      168
      17.75171
      5208
      159.10927
      0.031250
      11.928571

We can compute the value of the ratio of the practice at the 10th percentile:



In [11]:

    
practice_quantity_ratio_10 = practices['quantity_ratio'].quantile(0.1)
practice_quantity_ratio_10









    Out[11]:





0.017241379310344827

That is, for the practice at the 10th percentile, the ratio of the quantity of branded prescriptions to total prescriptions is 0.0172.

To find out the how much a practice would have saved if its prescribing had matched that of the practice at the 10th percentile, we first need to compute the unit cost of branded and generic prescriptions.



In [12]:

    
practices['unit_cost_branded'] = practices['cost_branded'] / practices['quantity_branded']
practices['unit_cost_generic'] = practices['cost_generic'] / practices['quantity_generic']
practices[['name', 'unit_cost_branded', 'unit_cost_generic']].head()









    Out[12]:







  
    
      
      name
      unit_cost_branded
      unit_cost_generic
    
    
      code
      
      
      
    
  
  
    
      A81001
      THE DENSHAM SURGERY
      0.062819
      0.030504
    
    
      A81002
      QUEENS PARK MEDICAL CENTRE
      0.041561
      0.031088
    
    
      A81003
      VICTORIA MEDICAL PRACTICE
      NaN
      NaN
    
    
      A81004
      BLUEBELL MEDICAL CENTRE
      0.039012
      0.030551
    
    
      A81005
      SPRINGWOOD SURGERY
      0.105665
      0.030551

When a practice hasn't prescribed either branded or generic or both, we use global values.



In [13]:

    
global_unit_cost_branded = practices['cost_branded'].sum() / practices['quantity_branded'].sum()
global_unit_cost_generic = practices['cost_generic'].sum() / practices['quantity_generic'].sum()
global_unit_cost_branded, global_unit_cost_generic









    Out[13]:





(0.05334542068605227, 0.0305928758937223)



In [14]:

    
practices['unit_cost_branded'] = practices['unit_cost_branded'].fillna(global_unit_cost_branded)
practices['unit_cost_generic'] = practices['unit_cost_generic'].fillna(global_unit_cost_generic)
practices[['name', 'unit_cost_branded', 'unit_cost_generic']].head()









    Out[14]:







  
    
      
      name
      unit_cost_branded
      unit_cost_generic
    
    
      code
      
      
      
    
  
  
    
      A81001
      THE DENSHAM SURGERY
      0.062819
      0.030504
    
    
      A81002
      QUEENS PARK MEDICAL CENTRE
      0.041561
      0.031088
    
    
      A81003
      VICTORIA MEDICAL PRACTICE
      0.053345
      0.030593
    
    
      A81004
      BLUEBELL MEDICAL CENTRE
      0.039012
      0.030551
    
    
      A81005
      SPRINGWOOD SURGERY
      0.105665
      0.030551

We also need the quantity of branded and generic prescriptions that would have been prescribed had its prescribing matched that of the practice at the 10th percentile, assuming the total prescribing remains the same.



In [15]:

    
practices['quantity_branded_10'] = practices['quantity_total'] * practice_quantity_ratio_10
practices['quantity_generic_10'] = practices['quantity_total'] - practices['quantity_branded_10']
practices[['name', 'quantity_branded', 'quantity_branded_10', 'quantity_generic', 'quantity_generic_10']].head()









    Out[15]:







  
    
      
      name
      quantity_branded
      quantity_branded_10
      quantity_generic
      quantity_generic_10
    
    
      code
      
      
      
      
      
    
  
  
    
      A81001
      THE DENSHAM SURGERY
      1708
      33.793103
      252
      1926.206897
    
    
      A81002
      QUEENS PARK MEDICAL CENTRE
      7676
      172.775862
      2345
      9848.224138
    
    
      A81003
      VICTORIA MEDICAL PRACTICE
      0
      0.000000
      0
      0.000000
    
    
      A81004
      BLUEBELL MEDICAL CENTRE
      1505
      105.000000
      4585
      5985.000000
    
    
      A81005
      SPRINGWOOD SURGERY
      168
      92.689655
      5208
      5283.310345

We can then work out a target cost for each practice, and the amount that would be saved if the practice could make that target.



In [16]:

    
practices['target_cost_10'] = practices['unit_cost_branded'] * practices['quantity_branded_10'] + practices['unit_cost_generic'] * practices['quantity_generic_10']
practices['cost_saving_10'] = practices['cost_total'] - practices['target_cost_10']
practices.head()









    Out[16]:







  
    
      
      name
      ccg
      quantity_total
      cost_total
      quantity_branded
      cost_branded
      quantity_generic
      cost_generic
      quantity_ratio
      quantity_ratio_percentile
      unit_cost_branded
      unit_cost_generic
      quantity_branded_10
      quantity_generic_10
      target_cost_10
      cost_saving_10
    
    
      code
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      A81001
      THE DENSHAM SURGERY
      00K
      1960
      114.98167
      1708
      107.29460
      252
      7.68707
      0.871429
      87.542857
      0.062819
      0.030504
      33.793103
      1926.206897
      60.880333
      54.101337
    
    
      A81002
      QUEENS PARK MEDICAL CENTRE
      00K
      10021
      391.92138
      7676
      319.01973
      2345
      72.90165
      0.765991
      81.257143
      0.041561
      0.031088
      172.775862
      9848.224138
      313.343491
      78.577889
    
    
      A81003
      VICTORIA MEDICAL PRACTICE
      00K
      0
      0.00000
      0
      0.00000
      0
      0.00000
      NaN
      NaN
      0.053345
      0.030593
      0.000000
      0.000000
      0.000000
      0.000000
    
    
      A81004
      BLUEBELL MEDICAL CENTRE
      00M
      6090
      198.78799
      1505
      58.71303
      4585
      140.07496
      0.247126
      42.857143
      0.039012
      0.030551
      105.000000
      5985.000000
      186.942198
      11.845792
    
    
      A81005
      SPRINGWOOD SURGERY
      00M
      5376
      176.86098
      168
      17.75171
      5208
      159.10927
      0.031250
      11.928571
      0.105665
      0.030551
      92.689655
      5283.310345
      171.204118
      5.656862

Here are the practices that would save the least.



In [17]:

    
practices.sort_values('cost_saving_10').head()[['name', 'cost_saving_10']]









    Out[17]:







  
    
      
      name
      cost_saving_10
    
    
      code
      
      
    
  
  
    
      A84014
      MARINE MEDICAL GROUP
      -32.501748
    
    
      L81063
      KINGSWOOD HEALTH CENTRE
      -17.928911
    
    
      K83051
      WEAVERS MEDICAL
      -13.596447
    
    
      K83021
      ROTHWELL MEDICAL CENTRE
      -12.396787
    
    
      M81057
      BEWDLEY MEDICAL CENTRE
      -10.592165

And the practices that would save the most.



In [18]:

    
practices.sort_values('cost_saving_10').tail()[['name', 'cost_saving_10']]









    Out[18]:







  
    
      
      name
      cost_saving_10
    
    
      code
      
      
    
  
  
    
      N83028
      ASTON HEALTHCARE LIMITED
      431.809296
    
    
      N81057
      TOWER HOUSE PRACTICE
      438.062705
    
    
      B86064
      LEIGH VIEW MEDICAL PRACTICE
      440.273693
    
    
      H82042
      CATHEDRAL MEDICAL GROUP
      449.205846
    
    
      P81018
      ST FILLAN'S MEDICAL CTRE
      453.080751

We can verify our calculations by checking against data returned by the API, for each of the following kinds of practice:

No prescribing
All prescribing is branded
All prescribing is generic
Prescribing is mixture of branded and generic



In [19]:

    
def get_practice_data_from_api(practice_id):
    url = 'https://openprescribing.net/api/1.0/measure_by_practice/'
    params = {
        'format': 'json',
        'measure': 'desogestrel',
        'org': practice_id,
    }
    rsp = requests.get(url, params)
    for record in rsp.json()['measures'][0]['data']:
        if record['date'] == '2018-08-01':
            return record
    assert False



In [20]:

    
for partition in [
    practices[practices['quantity_total'] == 0],  # no prescribing
    practices[(practices['quantity_total'] > 0) & (practices['quantity_generic'] == 0)],  # all branded
    practices[(practices['quantity_total'] > 0) & (practices['quantity_branded'] == 0)],  # all generic
    practices[(practices['quantity_branded'] > 0) & (practices['quantity_generic'] > 0)],  # a mixture
]:
    practice_id = partition.sample().index[0]
    data = get_practice_data_from_api(practice_id)
    practice = practices.loc[practice_id]
    print('-' * 80)
    print(practice)
    print(data)
    assert data['numerator'] == practice['quantity_branded']
    assert data['denominator'] == practice['quantity_total']
    if data['percentile'] is not None:
        assert abs(data['percentile'] - practice['quantity_ratio_percentile']) < 0.001
    assert abs(data['cost_savings']['10'] - practice['cost_saving_10']) < 0.001









    



--------------------------------------------------------------------------------
name                         BRACKENDALE SURGERY
ccg                                          08J
quantity_total                                 0
cost_total                                     0
quantity_branded                               0
cost_branded                                   0
quantity_generic                               0
cost_generic                                   0
quantity_ratio                               NaN
quantity_ratio_percentile                    NaN
unit_cost_branded                      0.0533454
unit_cost_generic                      0.0305929
quantity_branded_10                            0
quantity_generic_10                            0
target_cost_10                                 0
cost_saving_10                                 0
Name: H84609, dtype: object
{'cost_savings': {'10': 0.0, '20': 0.0, '30': 0.0, '50': 0.0, '40': 0.0, '60': 0.0, '70': 0.0, '90': 0.0, '80': 0.0}, 'denominator': 0.0, 'numerator': 0.0, 'practice_name': 'BRACKENDALE SURGERY', 'percentile': None, 'practice_id': 'H84609', 'calc_value': None, 'date': '2018-08-01'}
--------------------------------------------------------------------------------
name                         HADDENHAM SURGERY
ccg                                        06H
quantity_total                            4732
cost_total                             189.683
quantity_branded                          4732
cost_branded                           189.683
quantity_generic                             0
cost_generic                                 0
quantity_ratio                               1
quantity_ratio_percentile              95.9857
unit_cost_branded                    0.0400851
unit_cost_generic                    0.0305929
quantity_branded_10                    81.5862
quantity_generic_10                    4650.41
target_cost_10                          145.54
cost_saving_10                         44.1429
Name: D81062, dtype: object
{'cost_savings': {'10': 44.14285521451177, '20': 41.5058514275467, '30': 38.598898630454556, '50': 30.71763144978138, '40': 34.99566570092432, '60': 25.453131720180494, '70': 19.28801331044818, '90': 3.905851414861445, '80': 11.462296320169145}, 'denominator': 4732.0, 'numerator': 4732.0, 'practice_name': 'HADDENHAM SURGERY', 'percentile': 95.9857142857143, 'practice_id': 'D81062', 'calc_value': 1.0, 'date': '2018-08-01'}
--------------------------------------------------------------------------------
name                         ANSDELL MEDICAL CENTRE
ccg                                             02M
quantity_total                                 4144
cost_total                                  126.452
quantity_branded                                  0
cost_branded                                      0
quantity_generic                               4144
cost_generic                                126.452
quantity_ratio                                    0
quantity_ratio_percentile                         0
unit_cost_branded                         0.0533454
unit_cost_generic                         0.0305145
quantity_branded_10                         71.4483
quantity_generic_10                         4072.55
target_cost_10                              128.083
cost_saving_10                             -1.63123
Name: P81037, dtype: object
{'cost_savings': {'10': -1.6312304021206785, '20': -7.1856731637721225, '30': -13.308721982315518, '50': -29.909398727915942, '40': -20.898377773825587, '60': -40.99825743996641, '70': -53.984130837240826, '90': -86.38428825143424, '80': -70.46779965343734}, 'denominator': 4144.0, 'numerator': 0.0, 'practice_name': 'ANSDELL MEDICAL CENTRE', 'percentile': 0.0, 'practice_id': 'P81037', 'calc_value': 0.0, 'date': '2018-08-01'}
--------------------------------------------------------------------------------
name                         HASTINGS MED P & WALKIN
ccg                                              09P
quantity_total                                   672
cost_total                                   21.2161
quantity_branded                                  84
cost_branded                                 3.26306
quantity_generic                                 588
cost_generic                                  17.953
quantity_ratio                                 0.125
quantity_ratio_percentile                    27.6857
unit_cost_branded                           0.038846
unit_cost_generic                          0.0305324
quantity_branded_10                          11.5862
quantity_generic_10                          660.414
target_cost_10                               20.6141
cost_saving_10                              0.602018
Name: Y03051, dtype: object
{'cost_savings': {'10': 0.6020184729064049, '20': 0.27403271247739625, '30': -0.08752881746747221, '50': -1.0677865714285701, '40': -0.5356924106472007, '60': -1.722575523809521, '70': -2.489381798319325, '90': -4.402587267080744, '80': -3.4627303200948454}, 'denominator': 672.0, 'numerator': 84.0, 'practice_name': 'HASTINGS MED P & WALKIN', 'percentile': 27.6857142857143, 'practice_id': 'Y03051', 'calc_value': 0.125, 'date': '2018-08-01'}

CCGs

We'll move onto the calculations for CCGs. The calculations are very similar to those for practices.

We're interested in CCGs with active GPs.



In [21]:

    
sql = '''
SELECT
    ccgs.code, ccgs.name
FROM
    hscic.ccgs
    INNER JOIN (
        SELECT DISTINCT(ccg_id)
        FROM hscic.practices
        WHERE practices.setting = 4 AND practices.status_code = 'A'
    ) AS practices
    ON ccgs.code = practices.ccg_id
WHERE
    ccgs.org_type = 'CCG'
ORDER BY ccgs.code
'''
ccgs = pd.read_gbq(sql, 'ebmdatalab', dialect='standard').set_index('code')
ccgs.head()









    Out[21]:







  
    
      
      name
    
    
      code
      
    
  
  
    
      00C
      NHS DARLINGTON CCG
    
    
      00D
      NHS DURHAM DALES, EASINGTON AND SEDGEFIELD CCG
    
    
      00J
      NHS NORTH DURHAM CCG
    
    
      00K
      NHS HARTLEPOOL AND STOCKTON-ON-TEES CCG
    
    
      00L
      NHS NORTHUMBERLAND CCG



In [22]:

    
ccgs['quantity_total'] = denominators.groupby('ccg')['quantity'].sum()
ccgs['cost_total'] = denominators.groupby('ccg')['cost'].sum()
ccgs['quantity_branded'] = numerators.groupby('ccg')['quantity'].sum()
ccgs['cost_branded'] = numerators.groupby('ccg')['cost'].sum()
ccgs = ccgs.fillna(0)
ccgs['quantity_total'] = ccgs['quantity_total'].astype('int')
ccgs['quantity_branded'] = ccgs['quantity_branded'].astype('int')
ccgs['quantity_generic'] = ccgs['quantity_total'] - ccgs['quantity_branded']
ccgs['cost_generic'] = ccgs['cost_total'] - ccgs['cost_branded']
ccgs['quantity_ratio'] = ccgs['quantity_branded'] / ccgs['quantity_total']
ranks = ccgs['quantity_ratio'].rank(method='min')
num_non_nans = ccgs['quantity_ratio'].count()
ccgs['quantity_ratio_percentile'] = (ranks - 1) / ((num_non_nans - 1) / 100)
ccgs.head()









    Out[22]:







  
    
      
      name
      quantity_total
      cost_total
      quantity_branded
      cost_branded
      quantity_generic
      cost_generic
      quantity_ratio
      quantity_ratio_percentile
    
    
      code
      
      
      
      
      
      
      
      
      
    
  
  
    
      00C
      NHS DARLINGTON CCG
      61557
      2413.70215
      35326
      1613.15451
      26231
      800.54764
      0.573875
      73.711340
    
    
      00D
      NHS DURHAM DALES, EASINGTON AND SEDGEFIELD CCG
      197050
      8375.82889
      144378
      6766.94830
      52672
      1608.88059
      0.732697
      85.051546
    
    
      00J
      NHS NORTH DURHAM CCG
      141258
      5851.53094
      82803
      4063.82131
      58455
      1787.70963
      0.586183
      74.742268
    
    
      00K
      NHS HARTLEPOOL AND STOCKTON-ON-TEES CCG
      208715
      8660.04179
      108112
      5584.50268
      100603
      3075.53911
      0.517989
      69.072165
    
    
      00L
      NHS NORTHUMBERLAND CCG
      179532
      7981.52114
      162783
      7468.36390
      16749
      513.15724
      0.906707
      94.845361



In [23]:

    
ccg_quantity_ratio_10 = ccgs['quantity_ratio'].quantile(0.1)
ccg_quantity_ratio_10









    Out[23]:





0.11073908218256545



In [24]:

    
ccgs['unit_cost_branded'] = ccgs['cost_branded'] / ccgs['quantity_branded']
ccgs['unit_cost_generic'] = ccgs['cost_generic'] / ccgs['quantity_generic']
ccgs[['name', 'unit_cost_branded', 'unit_cost_generic']].head()









    Out[24]:







  
    
      
      name
      unit_cost_branded
      unit_cost_generic
    
    
      code
      
      
      
    
  
  
    
      00C
      NHS DARLINGTON CCG
      0.045665
      0.030519
    
    
      00D
      NHS DURHAM DALES, EASINGTON AND SEDGEFIELD CCG
      0.046870
      0.030545
    
    
      00J
      NHS NORTH DURHAM CCG
      0.049078
      0.030583
    
    
      00K
      NHS HARTLEPOOL AND STOCKTON-ON-TEES CCG
      0.051655
      0.030571
    
    
      00L
      NHS NORTHUMBERLAND CCG
      0.045879
      0.030638



In [25]:

    
global_unit_cost_branded = ccgs['cost_branded'].sum() / ccgs['quantity_branded'].sum()
global_unit_cost_generic = ccgs['cost_generic'].sum() / ccgs['quantity_generic'].sum()
global_unit_cost_branded, global_unit_cost_generic









    Out[25]:





(0.05334488604245437, 0.03059282698733445)



In [26]:

    
ccgs['unit_cost_branded'] = ccgs['unit_cost_branded'].fillna(global_unit_cost_branded)
ccgs['unit_cost_generic'] = ccgs['unit_cost_generic'].fillna(global_unit_cost_generic)
ccgs[['name', 'unit_cost_branded', 'unit_cost_generic']].head()









    Out[26]:







  
    
      
      name
      unit_cost_branded
      unit_cost_generic
    
    
      code
      
      
      
    
  
  
    
      00C
      NHS DARLINGTON CCG
      0.045665
      0.030519
    
    
      00D
      NHS DURHAM DALES, EASINGTON AND SEDGEFIELD CCG
      0.046870
      0.030545
    
    
      00J
      NHS NORTH DURHAM CCG
      0.049078
      0.030583
    
    
      00K
      NHS HARTLEPOOL AND STOCKTON-ON-TEES CCG
      0.051655
      0.030571
    
    
      00L
      NHS NORTHUMBERLAND CCG
      0.045879
      0.030638



In [27]:

    
ccgs['quantity_branded_10'] = ccgs['quantity_total'] * ccg_quantity_ratio_10
ccgs['quantity_generic_10'] = ccgs['quantity_total'] - ccgs['quantity_branded_10']
ccgs[['name', 'quantity_branded', 'quantity_branded_10', 'quantity_generic', 'quantity_generic_10']].head()









    Out[27]:







  
    
      
      name
      quantity_branded
      quantity_branded_10
      quantity_generic
      quantity_generic_10
    
    
      code
      
      
      
      
      
    
  
  
    
      00C
      NHS DARLINGTON CCG
      35326
      6816.765682
      26231
      54740.234318
    
    
      00D
      NHS DURHAM DALES, EASINGTON AND SEDGEFIELD CCG
      144378
      21821.136144
      52672
      175228.863856
    
    
      00J
      NHS NORTH DURHAM CCG
      82803
      15642.781271
      58455
      125615.218729
    
    
      00K
      NHS HARTLEPOOL AND STOCKTON-ON-TEES CCG
      108112
      23112.907538
      100603
      185602.092462
    
    
      00L
      NHS NORTHUMBERLAND CCG
      162783
      19881.208902
      16749
      159650.791098



In [28]:

    
ccgs['target_cost_10'] = ccgs['unit_cost_branded'] * ccgs['quantity_branded_10'] + ccgs['unit_cost_generic'] * ccgs['quantity_generic_10']
ccgs['cost_saving_10'] = ccgs['cost_total'] - ccgs['target_cost_10']
ccgs.head()









    Out[28]:







  
    
      
      name
      quantity_total
      cost_total
      quantity_branded
      cost_branded
      quantity_generic
      cost_generic
      quantity_ratio
      quantity_ratio_percentile
      unit_cost_branded
      unit_cost_generic
      quantity_branded_10
      quantity_generic_10
      target_cost_10
      cost_saving_10
    
    
      code
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      00C
      NHS DARLINGTON CCG
      61557
      2413.70215
      35326
      1613.15451
      26231
      800.54764
      0.573875
      73.711340
      0.045665
      0.030519
      6816.765682
      54740.234318
      1981.911239
      431.790911
    
    
      00D
      NHS DURHAM DALES, EASINGTON AND SEDGEFIELD CCG
      197050
      8375.82889
      144378
      6766.94830
      52672
      1608.88059
      0.732697
      85.051546
      0.046870
      0.030545
      21821.136144
      175228.863856
      6375.162698
      2000.666192
    
    
      00J
      NHS NORTH DURHAM CCG
      141258
      5851.53094
      82803
      4063.82131
      58455
      1787.70963
      0.586183
      74.742268
      0.049078
      0.030583
      15642.781271
      125615.218729
      4609.367421
      1242.163519
    
    
      00K
      NHS HARTLEPOOL AND STOCKTON-ON-TEES CCG
      208715
      8660.04179
      108112
      5584.50268
      100603
      3075.53911
      0.517989
      69.072165
      0.051655
      0.030571
      23112.907538
      185602.092462
      6867.942809
      1792.098981
    
    
      00L
      NHS NORTHUMBERLAND CCG
      179532
      7981.52114
      162783
      7468.36390
      16749
      513.15724
      0.906707
      94.845361
      0.045879
      0.030638
      19881.208902
      159650.791098
      5803.529264
      2177.991876

Here are the CCGs that would save the least.



In [29]:

    
ccgs.sort_values('cost_saving_10').head()[['name', 'cost_saving_10']]









    Out[29]:







  
    
      
      name
      cost_saving_10
    
    
      code
      
      
    
  
  
    
      99P
      NHS NORTHERN, EASTERN AND WESTERN DEVON CCG
      -1082.240967
    
    
      11J
      NHS DORSET CCG
      -1007.204447
    
    
      04G
      NHS NENE CCG
      -806.683200
    
    
      04F
      NHS MILTON KEYNES CCG
      -434.863498
    
    
      07V
      NHS CROYDON CCG
      -235.435153

And the CCGs that would save the most.



In [30]:

    
ccgs.sort_values('cost_saving_10').tail()[['name', 'cost_saving_10']]









    Out[30]:







  
    
      
      name
      cost_saving_10
    
    
      code
      
      
    
  
  
    
      11A
      NHS WEST HAMPSHIRE CCG
      3366.164895
    
    
      99A
      NHS LIVERPOOL CCG
      3404.648279
    
    
      06H
      NHS CAMBRIDGESHIRE AND PETERBOROUGH CCG
      3756.611719
    
    
      14L
      NHS MANCHESTER CCG
      3969.551534
    
    
      13T
      NHS NEWCASTLE GATESHEAD CCG
      4686.268764

Again, we can verify our calculations by checking against data returned by the API.



In [31]:

    
def get_ccg_data_from_api(ccg_id):
    url = 'https://openprescribing.net/api/1.0/measure_by_ccg/'
    params = {
        'format': 'json',
        'measure': 'desogestrel',
        'org': ccg_id,
    }
    rsp = requests.get(url, params)
    for record in rsp.json()['measures'][0]['data']:
        if record['date'] == '2018-08-01':
            return record
    assert False



In [32]:

    
for ccg_id, ccg in ccgs.sample(4).iterrows():
    data = get_ccg_data_from_api(ccg_id)
    print('-' * 80)
    print(ccg)
    assert data['numerator'] == ccg['quantity_branded']
    assert data['denominator'] == ccg['quantity_total']
    assert abs(data['percentile'] - ccg['quantity_ratio_percentile']) < 0.001
    assert abs(data['cost_savings']['10'] - ccg['cost_saving_10']) < 0.001









    



--------------------------------------------------------------------------------
name                         NHS BROMLEY CCG
quantity_total                        132931
cost_total                           5264.57
quantity_branded                       23359
cost_branded                         1914.59
quantity_generic                      109572
cost_generic                         3349.98
quantity_ratio                      0.175723
quantity_ratio_percentile            24.7423
unit_cost_branded                  0.0819636
unit_cost_generic                  0.0305733
quantity_branded_10                  14720.7
quantity_generic_10                   118210
target_cost_10                       4820.64
cost_saving_10                       443.927
Name: 07Q, dtype: object
--------------------------------------------------------------------------------
name                         NHS DARLINGTON CCG
quantity_total                            61557
cost_total                               2413.7
quantity_branded                          35326
cost_branded                            1613.15
quantity_generic                          26231
cost_generic                            800.548
quantity_ratio                         0.573875
quantity_ratio_percentile               73.7113
unit_cost_branded                     0.0456648
unit_cost_generic                     0.0305191
quantity_branded_10                     6816.77
quantity_generic_10                     54740.2
target_cost_10                          1981.91
cost_saving_10                          431.791
Name: 00C, dtype: object
--------------------------------------------------------------------------------
name                         NHS EASTERN CHESHIRE CCG
quantity_total                                  94129
cost_total                                    4147.11
quantity_branded                                87213
cost_branded                                   3935.2
quantity_generic                                 6916
cost_generic                                  211.905
quantity_ratio                               0.926526
quantity_ratio_percentile                     96.3918
unit_cost_branded                           0.0451217
unit_cost_generic                           0.0306398
quantity_branded_10                           10423.8
quantity_generic_10                           83705.2
target_cost_10                                3035.05
cost_saving_10                                1112.05
Name: 01C, dtype: object
--------------------------------------------------------------------------------
name                         NHS WYRE FOREST CCG
quantity_total                             72912
cost_total                               2407.45
quantity_branded                            5061
cost_branded                             335.126
quantity_generic                           67851
cost_generic                             2072.32
quantity_ratio                         0.0694124
quantity_ratio_percentile                2.06186
unit_cost_branded                      0.0662173
unit_cost_generic                      0.0305423
quantity_branded_10                      8074.21
quantity_generic_10                      64837.8
target_cost_10                           2514.95
cost_saving_10                          -107.496
Name: 06D, dtype: object

	ccg	practice	bnf_code	bnf_name	quantity	cost
0	07G	F81010	0703021Q0AAAAAA	Desogestrel_Tab 75mcg	3136	95.98536
1	06V	D82001	0703021Q0AAAAAA	Desogestrel_Tab 75mcg	5250	160.02845
2	04H	C84019	0703021Q0AAAAAA	Desogestrel_Tab 75mcg	9212	281.13790
3	10L	J84013	0703021Q0AAAAAA	Desogestrel_Tab 75mcg	5950	181.61845
4	09H	H82050	0703021Q0AAAAAA	Desogestrel_Tab 75mcg	3640	111.97010

	ccg	practice	bnf_code	bnf_name	quantity	cost
9628	00K	A81001	0703021Q0BDAAAA	Cerelle_Tab 75mcg	336	13.02744
13309	00K	A81001	0703021Q0BBAAAA	Cerazette_Tab 75mcg	672	71.01925
16352	00K	A81001	0703021Q0BHAAAA	Zelleta_Tab 75mcg	700	23.24791

	name	ccg
code
A81001	THE DENSHAM SURGERY	00K
A81002	QUEENS PARK MEDICAL CENTRE	00K
A81003	VICTORIA MEDICAL PRACTICE	00K
A81004	BLUEBELL MEDICAL CENTRE	00M
A81005	SPRINGWOOD SURGERY	00M

	name	cost_saving_10
code
A84014	MARINE MEDICAL GROUP	-32.501748
L81063	KINGSWOOD HEALTH CENTRE	-17.928911
K83051	WEAVERS MEDICAL	-13.596447
K83021	ROTHWELL MEDICAL CENTRE	-12.396787
M81057	BEWDLEY MEDICAL CENTRE	-10.592165

	name	cost_saving_10
code
N83028	ASTON HEALTHCARE LIMITED	431.809296
N81057	TOWER HOUSE PRACTICE	438.062705
B86064	LEIGH VIEW MEDICAL PRACTICE	440.273693
H82042	CATHEDRAL MEDICAL GROUP	449.205846
P81018	ST FILLAN'S MEDICAL CTRE	453.080751

	name
code
00C	NHS DARLINGTON CCG
00D	NHS DURHAM DALES, EASINGTON AND SEDGEFIELD CCG
00J	NHS NORTH DURHAM CCG
00K	NHS HARTLEPOOL AND STOCKTON-ON-TEES CCG
00L	NHS NORTHUMBERLAND CCG

	name	quantity_total	cost_total	quantity_branded	cost_branded	quantity_generic	cost_generic	quantity_ratio	quantity_ratio_percentile
code
00C	NHS DARLINGTON CCG	61557	2413.70215	35326	1613.15451	26231	800.54764	0.573875	73.711340
00D	NHS DURHAM DALES, EASINGTON AND SEDGEFIELD CCG	197050	8375.82889	144378	6766.94830	52672	1608.88059	0.732697	85.051546
00J	NHS NORTH DURHAM CCG	141258	5851.53094	82803	4063.82131	58455	1787.70963	0.586183	74.742268
00K	NHS HARTLEPOOL AND STOCKTON-ON-TEES CCG	208715	8660.04179	108112	5584.50268	100603	3075.53911	0.517989	69.072165
00L	NHS NORTHUMBERLAND CCG	179532	7981.52114	162783	7468.36390	16749	513.15724	0.906707	94.845361

	name	unit_cost_branded	unit_cost_generic
code
00C	NHS DARLINGTON CCG	0.045665	0.030519
00D	NHS DURHAM DALES, EASINGTON AND SEDGEFIELD CCG	0.046870	0.030545
00J	NHS NORTH DURHAM CCG	0.049078	0.030583
00K	NHS HARTLEPOOL AND STOCKTON-ON-TEES CCG	0.051655	0.030571
00L	NHS NORTHUMBERLAND CCG	0.045879	0.030638

	name	cost_saving_10
code
99P	NHS NORTHERN, EASTERN AND WESTERN DEVON CCG	-1082.240967
11J	NHS DORSET CCG	-1007.204447
04G	NHS NENE CCG	-806.683200
04F	NHS MILTON KEYNES CCG	-434.863498
07V	NHS CROYDON CCG	-235.435153

	name	cost_saving_10
code
11A	NHS WEST HAMPSHIRE CCG	3366.164895
99A	NHS LIVERPOOL CCG	3404.648279
06H	NHS CAMBRIDGESHIRE AND PETERBOROUGH CCG	3756.611719
14L	NHS MANCHESTER CCG	3969.551534
13T	NHS NEWCASTLE GATESHEAD CCG	4686.268764