In [1]:
# Disabling autosave to allow for notebook pairing via https://github.com/mwouts/jupytext.
%autosave 0


Autosave disabled

Measure calculations

This notebook describes how we perform measure calculations.

A measure is the compuation of a ratio between two values (a numerator and a denominator).

For instance:

Measures are computed monthly for each practice, and are also aggregated at CCG and national level, and we call the results of the computations "measure values".

For each month, we compute the deciles in the distribution of measure values for practices and CCGs.

Certain measures are identified as cost-saving. For these measures, we want to know how much a practice or CCG would have saved if its prescribing had been in line with the practice or CCG at the 10th percentile.

For the measures on the website, these calculations are done by performing a series of queries against data in BigQuery. The code is here, and some of the queries are quite gnarly.

(Note: In the OpenPrescribing source, we store measure values for practices and CCGs in MeasureValue objects, and national measure values in MeasureGlobal objects. Deciles for practices and CCGs are also stored in MeasureGlobal objects. Cost-saving calculations are stored in both kinds of objects.)


This notebook works through these calculations for the Cerazette/Desogestrel measure using one month of relevant prescribing data, using Pandas.

The OpenPrescribing implementation metadata for the measure is here. Note that the measure value is computed as the ratio of the quantity of prescriptions for presentations whose BNF code starts with 0703021Q0B (ie branded prescriptions for Desogestrel) to the quantity of prescriptions for presentations whose BNF code starts with 0703021Q0 (ie all prescriptions for Desogestrel). Also note that the is_cost_based and low_is_good fields are true.


Setup

We need pandas for the computation, and requests to validate the computation against the OpenPrescribing implementation, via the OpenPrescribing API.


In [2]:
import pandas as pd
import requests

We can get the prescribing data from BigQuery. We're only interested in prescribing from GPs.


In [3]:
sql = '''
SELECT
    prescriptions.pct AS ccg,
    prescriptions.practice,
    prescriptions.bnf_code,
    prescriptions.bnf_name,
    prescriptions.quantity,
    prescriptions.actual_cost AS cost
FROM
    hscic.normalised_prescribing_standard AS prescriptions
    INNER JOIN hscic.practices
        ON prescriptions.practice = practices.code
WHERE
    prescriptions.month = TIMESTAMP('2018-08-01')
    AND prescriptions.bnf_code LIKE '0703021Q0%'
    AND practices.setting = 4
'''

prescriptions = pd.read_gbq(sql, 'ebmdatalab', dialect='standard')
prescriptions.head()


Out[3]:
ccg practice bnf_code bnf_name quantity cost
0 07G F81010 0703021Q0AAAAAA Desogestrel_Tab 75mcg 3136 95.98536
1 06V D82001 0703021Q0AAAAAA Desogestrel_Tab 75mcg 5250 160.02845
2 04H C84019 0703021Q0AAAAAA Desogestrel_Tab 75mcg 9212 281.13790
3 10L J84013 0703021Q0AAAAAA Desogestrel_Tab 75mcg 5950 181.61845
4 09H H82050 0703021Q0AAAAAA Desogestrel_Tab 75mcg 3640 111.97010

We filter prescribing to find the prescriptions that contribute to the measure's numerator (that is, prescriptions for branded Desogestrel), and the prescriptions that contribute to the measure's denominator (that is, all prescriptions for Desogestrel).


In [4]:
numerators = prescriptions[prescriptions['bnf_code'].str.startswith('0703021Q0B')]
denominators = prescriptions[prescriptions['bnf_code'].str.startswith('0703021Q0')]

We can look at the numerators and deonominators for one particular practice.


In [5]:
numerators[numerators['practice'] == 'A81001']


Out[5]:
ccg practice bnf_code bnf_name quantity cost
9628 00K A81001 0703021Q0BDAAAA Cerelle_Tab 75mcg 336 13.02744
13309 00K A81001 0703021Q0BBAAAA Cerazette_Tab 75mcg 672 71.01925
16352 00K A81001 0703021Q0BHAAAA Zelleta_Tab 75mcg 700 23.24791

In [6]:
denominators[denominators['practice'] == 'A81001']


Out[6]:
ccg practice bnf_code bnf_name quantity cost
9628 00K A81001 0703021Q0BDAAAA Cerelle_Tab 75mcg 336 13.02744
13309 00K A81001 0703021Q0BBAAAA Cerazette_Tab 75mcg 672 71.01925
16352 00K A81001 0703021Q0BHAAAA Zelleta_Tab 75mcg 700 23.24791
17966 00K A81001 0703021Q0AAAAAA Desogestrel_Tab 75mcg 252 7.68707

Note the branded prescriptions (Cerelle, Cerazette, Zelleta) are present in both the numerator and the denominator. Generic prescriptions are only present in the denominator.

Practices

We'll start with the calculations for practices.

Not every practice will have prescribed Desogestrel in January 2018, so we also want a list of all practices.


In [7]:
sql = 'SELECT code, name, ccg_id AS ccg FROM hscic.practices WHERE setting = 4 ORDER BY code'
practices = pd.read_gbq(sql, 'ebmdatalab', dialect='standard').set_index('code')
practices.head()


Out[7]:
name ccg
code
A81001 THE DENSHAM SURGERY 00K
A81002 QUEENS PARK MEDICAL CENTRE 00K
A81003 VICTORIA MEDICAL PRACTICE 00K
A81004 BLUEBELL MEDICAL CENTRE 00M
A81005 SPRINGWOOD SURGERY 00M

practices is a DataFrame indexed by practice code. We will add columns to practices as we compute the measure values, percentiles, and cost savings.

For each practice, we want to know the total quantity and total cost for Desogestrel prescriptions, as well as the quantity and cost for branded and for generic prescriptions.

We can get the total quantity and cost from the denominators, the quantity and cost of branded prescriptions from the numerators, and the quantity and cost of generic prescriptions by subtracting one from the other.


In [8]:
practices['quantity_total'] = denominators.groupby('practice')['quantity'].sum()
practices['cost_total'] = denominators.groupby('practice')['cost'].sum()
practices['quantity_branded'] = numerators.groupby('practice')['quantity'].sum()
practices['cost_branded'] = numerators.groupby('practice')['cost'].sum()

# Practices with no prescribing will have NaN values for the new columns.  This
# also means that the new columns will have dtypes of float64, but quantities
# should always be ints.
practices = practices.fillna(0)
practices['quantity_total'] = practices['quantity_total'].astype('int')
practices['quantity_branded'] = practices['quantity_branded'].astype('int')

practices['quantity_generic'] = practices['quantity_total'] - practices['quantity_branded']
practices['cost_generic'] = practices['cost_total'] - practices['cost_branded']

practices.head()


Out[8]:
name ccg quantity_total cost_total quantity_branded cost_branded quantity_generic cost_generic
code
A81001 THE DENSHAM SURGERY 00K 1960 114.98167 1708 107.29460 252 7.68707
A81002 QUEENS PARK MEDICAL CENTRE 00K 10021 391.92138 7676 319.01973 2345 72.90165
A81003 VICTORIA MEDICAL PRACTICE 00K 0 0.00000 0 0.00000 0 0.00000
A81004 BLUEBELL MEDICAL CENTRE 00M 6090 198.78799 1505 58.71303 4585 140.07496
A81005 SPRINGWOOD SURGERY 00M 5376 176.86098 168 17.75171 5208 159.10927

The measure value for this measure is the ratio of the quantity of branded prescriptions to the quantity of total prescriptions.


In [9]:
practices['quantity_ratio'] = practices['quantity_branded'] / practices['quantity_total']
practices.head()


Out[9]:
name ccg quantity_total cost_total quantity_branded cost_branded quantity_generic cost_generic quantity_ratio
code
A81001 THE DENSHAM SURGERY 00K 1960 114.98167 1708 107.29460 252 7.68707 0.871429
A81002 QUEENS PARK MEDICAL CENTRE 00K 10021 391.92138 7676 319.01973 2345 72.90165 0.765991
A81003 VICTORIA MEDICAL PRACTICE 00K 0 0.00000 0 0.00000 0 0.00000 NaN
A81004 BLUEBELL MEDICAL CENTRE 00M 6090 198.78799 1505 58.71303 4585 140.07496 0.247126
A81005 SPRINGWOOD SURGERY 00M 5376 176.86098 168 17.75171 5208 159.10927 0.031250

Note the quantity_ratio for a practice with no prescribing is NaN. The OpenPrescribing implementation stores such values as database NULLs, or Python Nones.

We can compute each practice's percentile, according to their quantity_ratio.


In [10]:
# In order to match the percentiles calculated by the OpenPrescribing
# implementation (via BigQuery) we need to adjust the results of .rank() to
# ignore any NaNs.
#
# I can't see why this isn't the same as:
#
#   practices['quantity_ratio'].dropna().rank(method='min', pct=True) * 100
#
# But it's not, quite.
#
# Thanks Dave for working out what to do here!

ranks = practices['quantity_ratio'].rank(method='min')
num_non_nans = practices['quantity_ratio'].count()
practices['quantity_ratio_percentile'] = (ranks - 1) / ((num_non_nans - 1) / 100)
practices.head()


Out[10]:
name ccg quantity_total cost_total quantity_branded cost_branded quantity_generic cost_generic quantity_ratio quantity_ratio_percentile
code
A81001 THE DENSHAM SURGERY 00K 1960 114.98167 1708 107.29460 252 7.68707 0.871429 87.542857
A81002 QUEENS PARK MEDICAL CENTRE 00K 10021 391.92138 7676 319.01973 2345 72.90165 0.765991 81.257143
A81003 VICTORIA MEDICAL PRACTICE 00K 0 0.00000 0 0.00000 0 0.00000 NaN NaN
A81004 BLUEBELL MEDICAL CENTRE 00M 6090 198.78799 1505 58.71303 4585 140.07496 0.247126 42.857143
A81005 SPRINGWOOD SURGERY 00M 5376 176.86098 168 17.75171 5208 159.10927 0.031250 11.928571

We can compute the value of the ratio of the practice at the 10th percentile:


In [11]:
practice_quantity_ratio_10 = practices['quantity_ratio'].quantile(0.1)
practice_quantity_ratio_10


Out[11]:
0.017241379310344827

That is, for the practice at the 10th percentile, the ratio of the quantity of branded prescriptions to total prescriptions is 0.0172.

To find out the how much a practice would have saved if its prescribing had matched that of the practice at the 10th percentile, we first need to compute the unit cost of branded and generic prescriptions.


In [12]:
practices['unit_cost_branded'] = practices['cost_branded'] / practices['quantity_branded']
practices['unit_cost_generic'] = practices['cost_generic'] / practices['quantity_generic']
practices[['name', 'unit_cost_branded', 'unit_cost_generic']].head()


Out[12]:
name unit_cost_branded unit_cost_generic
code
A81001 THE DENSHAM SURGERY 0.062819 0.030504
A81002 QUEENS PARK MEDICAL CENTRE 0.041561 0.031088
A81003 VICTORIA MEDICAL PRACTICE NaN NaN
A81004 BLUEBELL MEDICAL CENTRE 0.039012 0.030551
A81005 SPRINGWOOD SURGERY 0.105665 0.030551

When a practice hasn't prescribed either branded or generic or both, we use global values.


In [13]:
global_unit_cost_branded = practices['cost_branded'].sum() / practices['quantity_branded'].sum()
global_unit_cost_generic = practices['cost_generic'].sum() / practices['quantity_generic'].sum()
global_unit_cost_branded, global_unit_cost_generic


Out[13]:
(0.05334542068605227, 0.0305928758937223)

In [14]:
practices['unit_cost_branded'] = practices['unit_cost_branded'].fillna(global_unit_cost_branded)
practices['unit_cost_generic'] = practices['unit_cost_generic'].fillna(global_unit_cost_generic)
practices[['name', 'unit_cost_branded', 'unit_cost_generic']].head()


Out[14]:
name unit_cost_branded unit_cost_generic
code
A81001 THE DENSHAM SURGERY 0.062819 0.030504
A81002 QUEENS PARK MEDICAL CENTRE 0.041561 0.031088
A81003 VICTORIA MEDICAL PRACTICE 0.053345 0.030593
A81004 BLUEBELL MEDICAL CENTRE 0.039012 0.030551
A81005 SPRINGWOOD SURGERY 0.105665 0.030551

We also need the quantity of branded and generic prescriptions that would have been prescribed had its prescribing matched that of the practice at the 10th percentile, assuming the total prescribing remains the same.


In [15]:
practices['quantity_branded_10'] = practices['quantity_total'] * practice_quantity_ratio_10
practices['quantity_generic_10'] = practices['quantity_total'] - practices['quantity_branded_10']
practices[['name', 'quantity_branded', 'quantity_branded_10', 'quantity_generic', 'quantity_generic_10']].head()


Out[15]:
name quantity_branded quantity_branded_10 quantity_generic quantity_generic_10
code
A81001 THE DENSHAM SURGERY 1708 33.793103 252 1926.206897
A81002 QUEENS PARK MEDICAL CENTRE 7676 172.775862 2345 9848.224138
A81003 VICTORIA MEDICAL PRACTICE 0 0.000000 0 0.000000
A81004 BLUEBELL MEDICAL CENTRE 1505 105.000000 4585 5985.000000
A81005 SPRINGWOOD SURGERY 168 92.689655 5208 5283.310345

We can then work out a target cost for each practice, and the amount that would be saved if the practice could make that target.


In [16]:
practices['target_cost_10'] = practices['unit_cost_branded'] * practices['quantity_branded_10'] + practices['unit_cost_generic'] * practices['quantity_generic_10']
practices['cost_saving_10'] = practices['cost_total'] - practices['target_cost_10']
practices.head()


Out[16]:
name ccg quantity_total cost_total quantity_branded cost_branded quantity_generic cost_generic quantity_ratio quantity_ratio_percentile unit_cost_branded unit_cost_generic quantity_branded_10 quantity_generic_10 target_cost_10 cost_saving_10
code
A81001 THE DENSHAM SURGERY 00K 1960 114.98167 1708 107.29460 252 7.68707 0.871429 87.542857 0.062819 0.030504 33.793103 1926.206897 60.880333 54.101337
A81002 QUEENS PARK MEDICAL CENTRE 00K 10021 391.92138 7676 319.01973 2345 72.90165 0.765991 81.257143 0.041561 0.031088 172.775862 9848.224138 313.343491 78.577889
A81003 VICTORIA MEDICAL PRACTICE 00K 0 0.00000 0 0.00000 0 0.00000 NaN NaN 0.053345 0.030593 0.000000 0.000000 0.000000 0.000000
A81004 BLUEBELL MEDICAL CENTRE 00M 6090 198.78799 1505 58.71303 4585 140.07496 0.247126 42.857143 0.039012 0.030551 105.000000 5985.000000 186.942198 11.845792
A81005 SPRINGWOOD SURGERY 00M 5376 176.86098 168 17.75171 5208 159.10927 0.031250 11.928571 0.105665 0.030551 92.689655 5283.310345 171.204118 5.656862

Here are the practices that would save the least.


In [17]:
practices.sort_values('cost_saving_10').head()[['name', 'cost_saving_10']]


Out[17]:
name cost_saving_10
code
A84014 MARINE MEDICAL GROUP -32.501748
L81063 KINGSWOOD HEALTH CENTRE -17.928911
K83051 WEAVERS MEDICAL -13.596447
K83021 ROTHWELL MEDICAL CENTRE -12.396787
M81057 BEWDLEY MEDICAL CENTRE -10.592165

And the practices that would save the most.


In [18]:
practices.sort_values('cost_saving_10').tail()[['name', 'cost_saving_10']]


Out[18]:
name cost_saving_10
code
N83028 ASTON HEALTHCARE LIMITED 431.809296
N81057 TOWER HOUSE PRACTICE 438.062705
B86064 LEIGH VIEW MEDICAL PRACTICE 440.273693
H82042 CATHEDRAL MEDICAL GROUP 449.205846
P81018 ST FILLAN'S MEDICAL CTRE 453.080751

We can verify our calculations by checking against data returned by the API, for each of the following kinds of practice:

  • No prescribing
  • All prescribing is branded
  • All prescribing is generic
  • Prescribing is mixture of branded and generic

In [19]:
def get_practice_data_from_api(practice_id):
    url = 'https://openprescribing.net/api/1.0/measure_by_practice/'
    params = {
        'format': 'json',
        'measure': 'desogestrel',
        'org': practice_id,
    }
    rsp = requests.get(url, params)
    for record in rsp.json()['measures'][0]['data']:
        if record['date'] == '2018-08-01':
            return record
    assert False

In [20]:
for partition in [
    practices[practices['quantity_total'] == 0],  # no prescribing
    practices[(practices['quantity_total'] > 0) & (practices['quantity_generic'] == 0)],  # all branded
    practices[(practices['quantity_total'] > 0) & (practices['quantity_branded'] == 0)],  # all generic
    practices[(practices['quantity_branded'] > 0) & (practices['quantity_generic'] > 0)],  # a mixture
]:
    practice_id = partition.sample().index[0]
    data = get_practice_data_from_api(practice_id)
    practice = practices.loc[practice_id]
    print('-' * 80)
    print(practice)
    print(data)
    assert data['numerator'] == practice['quantity_branded']
    assert data['denominator'] == practice['quantity_total']
    if data['percentile'] is not None:
        assert abs(data['percentile'] - practice['quantity_ratio_percentile']) < 0.001
    assert abs(data['cost_savings']['10'] - practice['cost_saving_10']) < 0.001


--------------------------------------------------------------------------------
name                         BRACKENDALE SURGERY
ccg                                          08J
quantity_total                                 0
cost_total                                     0
quantity_branded                               0
cost_branded                                   0
quantity_generic                               0
cost_generic                                   0
quantity_ratio                               NaN
quantity_ratio_percentile                    NaN
unit_cost_branded                      0.0533454
unit_cost_generic                      0.0305929
quantity_branded_10                            0
quantity_generic_10                            0
target_cost_10                                 0
cost_saving_10                                 0
Name: H84609, dtype: object
{'cost_savings': {'10': 0.0, '20': 0.0, '30': 0.0, '50': 0.0, '40': 0.0, '60': 0.0, '70': 0.0, '90': 0.0, '80': 0.0}, 'denominator': 0.0, 'numerator': 0.0, 'practice_name': 'BRACKENDALE SURGERY', 'percentile': None, 'practice_id': 'H84609', 'calc_value': None, 'date': '2018-08-01'}
--------------------------------------------------------------------------------
name                         HADDENHAM SURGERY
ccg                                        06H
quantity_total                            4732
cost_total                             189.683
quantity_branded                          4732
cost_branded                           189.683
quantity_generic                             0
cost_generic                                 0
quantity_ratio                               1
quantity_ratio_percentile              95.9857
unit_cost_branded                    0.0400851
unit_cost_generic                    0.0305929
quantity_branded_10                    81.5862
quantity_generic_10                    4650.41
target_cost_10                          145.54
cost_saving_10                         44.1429
Name: D81062, dtype: object
{'cost_savings': {'10': 44.14285521451177, '20': 41.5058514275467, '30': 38.598898630454556, '50': 30.71763144978138, '40': 34.99566570092432, '60': 25.453131720180494, '70': 19.28801331044818, '90': 3.905851414861445, '80': 11.462296320169145}, 'denominator': 4732.0, 'numerator': 4732.0, 'practice_name': 'HADDENHAM SURGERY', 'percentile': 95.9857142857143, 'practice_id': 'D81062', 'calc_value': 1.0, 'date': '2018-08-01'}
--------------------------------------------------------------------------------
name                         ANSDELL MEDICAL CENTRE
ccg                                             02M
quantity_total                                 4144
cost_total                                  126.452
quantity_branded                                  0
cost_branded                                      0
quantity_generic                               4144
cost_generic                                126.452
quantity_ratio                                    0
quantity_ratio_percentile                         0
unit_cost_branded                         0.0533454
unit_cost_generic                         0.0305145
quantity_branded_10                         71.4483
quantity_generic_10                         4072.55
target_cost_10                              128.083
cost_saving_10                             -1.63123
Name: P81037, dtype: object
{'cost_savings': {'10': -1.6312304021206785, '20': -7.1856731637721225, '30': -13.308721982315518, '50': -29.909398727915942, '40': -20.898377773825587, '60': -40.99825743996641, '70': -53.984130837240826, '90': -86.38428825143424, '80': -70.46779965343734}, 'denominator': 4144.0, 'numerator': 0.0, 'practice_name': 'ANSDELL MEDICAL CENTRE', 'percentile': 0.0, 'practice_id': 'P81037', 'calc_value': 0.0, 'date': '2018-08-01'}
--------------------------------------------------------------------------------
name                         HASTINGS MED P & WALKIN
ccg                                              09P
quantity_total                                   672
cost_total                                   21.2161
quantity_branded                                  84
cost_branded                                 3.26306
quantity_generic                                 588
cost_generic                                  17.953
quantity_ratio                                 0.125
quantity_ratio_percentile                    27.6857
unit_cost_branded                           0.038846
unit_cost_generic                          0.0305324
quantity_branded_10                          11.5862
quantity_generic_10                          660.414
target_cost_10                               20.6141
cost_saving_10                              0.602018
Name: Y03051, dtype: object
{'cost_savings': {'10': 0.6020184729064049, '20': 0.27403271247739625, '30': -0.08752881746747221, '50': -1.0677865714285701, '40': -0.5356924106472007, '60': -1.722575523809521, '70': -2.489381798319325, '90': -4.402587267080744, '80': -3.4627303200948454}, 'denominator': 672.0, 'numerator': 84.0, 'practice_name': 'HASTINGS MED P & WALKIN', 'percentile': 27.6857142857143, 'practice_id': 'Y03051', 'calc_value': 0.125, 'date': '2018-08-01'}

CCGs

We'll move onto the calculations for CCGs. The calculations are very similar to those for practices.

We're interested in CCGs with active GPs.


In [21]:
sql = '''
SELECT
    ccgs.code, ccgs.name
FROM
    hscic.ccgs
    INNER JOIN (
        SELECT DISTINCT(ccg_id)
        FROM hscic.practices
        WHERE practices.setting = 4 AND practices.status_code = 'A'
    ) AS practices
    ON ccgs.code = practices.ccg_id
WHERE
    ccgs.org_type = 'CCG'
ORDER BY ccgs.code
'''
ccgs = pd.read_gbq(sql, 'ebmdatalab', dialect='standard').set_index('code')
ccgs.head()


Out[21]:
name
code
00C NHS DARLINGTON CCG
00D NHS DURHAM DALES, EASINGTON AND SEDGEFIELD CCG
00J NHS NORTH DURHAM CCG
00K NHS HARTLEPOOL AND STOCKTON-ON-TEES CCG
00L NHS NORTHUMBERLAND CCG

In [22]:
ccgs['quantity_total'] = denominators.groupby('ccg')['quantity'].sum()
ccgs['cost_total'] = denominators.groupby('ccg')['cost'].sum()
ccgs['quantity_branded'] = numerators.groupby('ccg')['quantity'].sum()
ccgs['cost_branded'] = numerators.groupby('ccg')['cost'].sum()
ccgs = ccgs.fillna(0)
ccgs['quantity_total'] = ccgs['quantity_total'].astype('int')
ccgs['quantity_branded'] = ccgs['quantity_branded'].astype('int')
ccgs['quantity_generic'] = ccgs['quantity_total'] - ccgs['quantity_branded']
ccgs['cost_generic'] = ccgs['cost_total'] - ccgs['cost_branded']
ccgs['quantity_ratio'] = ccgs['quantity_branded'] / ccgs['quantity_total']
ranks = ccgs['quantity_ratio'].rank(method='min')
num_non_nans = ccgs['quantity_ratio'].count()
ccgs['quantity_ratio_percentile'] = (ranks - 1) / ((num_non_nans - 1) / 100)
ccgs.head()


Out[22]:
name quantity_total cost_total quantity_branded cost_branded quantity_generic cost_generic quantity_ratio quantity_ratio_percentile
code
00C NHS DARLINGTON CCG 61557 2413.70215 35326 1613.15451 26231 800.54764 0.573875 73.711340
00D NHS DURHAM DALES, EASINGTON AND SEDGEFIELD CCG 197050 8375.82889 144378 6766.94830 52672 1608.88059 0.732697 85.051546
00J NHS NORTH DURHAM CCG 141258 5851.53094 82803 4063.82131 58455 1787.70963 0.586183 74.742268
00K NHS HARTLEPOOL AND STOCKTON-ON-TEES CCG 208715 8660.04179 108112 5584.50268 100603 3075.53911 0.517989 69.072165
00L NHS NORTHUMBERLAND CCG 179532 7981.52114 162783 7468.36390 16749 513.15724 0.906707 94.845361

In [23]:
ccg_quantity_ratio_10 = ccgs['quantity_ratio'].quantile(0.1)
ccg_quantity_ratio_10


Out[23]:
0.11073908218256545

In [24]:
ccgs['unit_cost_branded'] = ccgs['cost_branded'] / ccgs['quantity_branded']
ccgs['unit_cost_generic'] = ccgs['cost_generic'] / ccgs['quantity_generic']
ccgs[['name', 'unit_cost_branded', 'unit_cost_generic']].head()


Out[24]:
name unit_cost_branded unit_cost_generic
code
00C NHS DARLINGTON CCG 0.045665 0.030519
00D NHS DURHAM DALES, EASINGTON AND SEDGEFIELD CCG 0.046870 0.030545
00J NHS NORTH DURHAM CCG 0.049078 0.030583
00K NHS HARTLEPOOL AND STOCKTON-ON-TEES CCG 0.051655 0.030571
00L NHS NORTHUMBERLAND CCG 0.045879 0.030638

In [25]:
global_unit_cost_branded = ccgs['cost_branded'].sum() / ccgs['quantity_branded'].sum()
global_unit_cost_generic = ccgs['cost_generic'].sum() / ccgs['quantity_generic'].sum()
global_unit_cost_branded, global_unit_cost_generic


Out[25]:
(0.05334488604245437, 0.03059282698733445)

In [26]:
ccgs['unit_cost_branded'] = ccgs['unit_cost_branded'].fillna(global_unit_cost_branded)
ccgs['unit_cost_generic'] = ccgs['unit_cost_generic'].fillna(global_unit_cost_generic)
ccgs[['name', 'unit_cost_branded', 'unit_cost_generic']].head()


Out[26]:
name unit_cost_branded unit_cost_generic
code
00C NHS DARLINGTON CCG 0.045665 0.030519
00D NHS DURHAM DALES, EASINGTON AND SEDGEFIELD CCG 0.046870 0.030545
00J NHS NORTH DURHAM CCG 0.049078 0.030583
00K NHS HARTLEPOOL AND STOCKTON-ON-TEES CCG 0.051655 0.030571
00L NHS NORTHUMBERLAND CCG 0.045879 0.030638

In [27]:
ccgs['quantity_branded_10'] = ccgs['quantity_total'] * ccg_quantity_ratio_10
ccgs['quantity_generic_10'] = ccgs['quantity_total'] - ccgs['quantity_branded_10']
ccgs[['name', 'quantity_branded', 'quantity_branded_10', 'quantity_generic', 'quantity_generic_10']].head()


Out[27]:
name quantity_branded quantity_branded_10 quantity_generic quantity_generic_10
code
00C NHS DARLINGTON CCG 35326 6816.765682 26231 54740.234318
00D NHS DURHAM DALES, EASINGTON AND SEDGEFIELD CCG 144378 21821.136144 52672 175228.863856
00J NHS NORTH DURHAM CCG 82803 15642.781271 58455 125615.218729
00K NHS HARTLEPOOL AND STOCKTON-ON-TEES CCG 108112 23112.907538 100603 185602.092462
00L NHS NORTHUMBERLAND CCG 162783 19881.208902 16749 159650.791098

In [28]:
ccgs['target_cost_10'] = ccgs['unit_cost_branded'] * ccgs['quantity_branded_10'] + ccgs['unit_cost_generic'] * ccgs['quantity_generic_10']
ccgs['cost_saving_10'] = ccgs['cost_total'] - ccgs['target_cost_10']
ccgs.head()


Out[28]:
name quantity_total cost_total quantity_branded cost_branded quantity_generic cost_generic quantity_ratio quantity_ratio_percentile unit_cost_branded unit_cost_generic quantity_branded_10 quantity_generic_10 target_cost_10 cost_saving_10
code
00C NHS DARLINGTON CCG 61557 2413.70215 35326 1613.15451 26231 800.54764 0.573875 73.711340 0.045665 0.030519 6816.765682 54740.234318 1981.911239 431.790911
00D NHS DURHAM DALES, EASINGTON AND SEDGEFIELD CCG 197050 8375.82889 144378 6766.94830 52672 1608.88059 0.732697 85.051546 0.046870 0.030545 21821.136144 175228.863856 6375.162698 2000.666192
00J NHS NORTH DURHAM CCG 141258 5851.53094 82803 4063.82131 58455 1787.70963 0.586183 74.742268 0.049078 0.030583 15642.781271 125615.218729 4609.367421 1242.163519
00K NHS HARTLEPOOL AND STOCKTON-ON-TEES CCG 208715 8660.04179 108112 5584.50268 100603 3075.53911 0.517989 69.072165 0.051655 0.030571 23112.907538 185602.092462 6867.942809 1792.098981
00L NHS NORTHUMBERLAND CCG 179532 7981.52114 162783 7468.36390 16749 513.15724 0.906707 94.845361 0.045879 0.030638 19881.208902 159650.791098 5803.529264 2177.991876

Here are the CCGs that would save the least.


In [29]:
ccgs.sort_values('cost_saving_10').head()[['name', 'cost_saving_10']]


Out[29]:
name cost_saving_10
code
99P NHS NORTHERN, EASTERN AND WESTERN DEVON CCG -1082.240967
11J NHS DORSET CCG -1007.204447
04G NHS NENE CCG -806.683200
04F NHS MILTON KEYNES CCG -434.863498
07V NHS CROYDON CCG -235.435153

And the CCGs that would save the most.


In [30]:
ccgs.sort_values('cost_saving_10').tail()[['name', 'cost_saving_10']]


Out[30]:
name cost_saving_10
code
11A NHS WEST HAMPSHIRE CCG 3366.164895
99A NHS LIVERPOOL CCG 3404.648279
06H NHS CAMBRIDGESHIRE AND PETERBOROUGH CCG 3756.611719
14L NHS MANCHESTER CCG 3969.551534
13T NHS NEWCASTLE GATESHEAD CCG 4686.268764

Again, we can verify our calculations by checking against data returned by the API.


In [31]:
def get_ccg_data_from_api(ccg_id):
    url = 'https://openprescribing.net/api/1.0/measure_by_ccg/'
    params = {
        'format': 'json',
        'measure': 'desogestrel',
        'org': ccg_id,
    }
    rsp = requests.get(url, params)
    for record in rsp.json()['measures'][0]['data']:
        if record['date'] == '2018-08-01':
            return record
    assert False

In [32]:
for ccg_id, ccg in ccgs.sample(4).iterrows():
    data = get_ccg_data_from_api(ccg_id)
    print('-' * 80)
    print(ccg)
    assert data['numerator'] == ccg['quantity_branded']
    assert data['denominator'] == ccg['quantity_total']
    assert abs(data['percentile'] - ccg['quantity_ratio_percentile']) < 0.001
    assert abs(data['cost_savings']['10'] - ccg['cost_saving_10']) < 0.001


--------------------------------------------------------------------------------
name                         NHS BROMLEY CCG
quantity_total                        132931
cost_total                           5264.57
quantity_branded                       23359
cost_branded                         1914.59
quantity_generic                      109572
cost_generic                         3349.98
quantity_ratio                      0.175723
quantity_ratio_percentile            24.7423
unit_cost_branded                  0.0819636
unit_cost_generic                  0.0305733
quantity_branded_10                  14720.7
quantity_generic_10                   118210
target_cost_10                       4820.64
cost_saving_10                       443.927
Name: 07Q, dtype: object
--------------------------------------------------------------------------------
name                         NHS DARLINGTON CCG
quantity_total                            61557
cost_total                               2413.7
quantity_branded                          35326
cost_branded                            1613.15
quantity_generic                          26231
cost_generic                            800.548
quantity_ratio                         0.573875
quantity_ratio_percentile               73.7113
unit_cost_branded                     0.0456648
unit_cost_generic                     0.0305191
quantity_branded_10                     6816.77
quantity_generic_10                     54740.2
target_cost_10                          1981.91
cost_saving_10                          431.791
Name: 00C, dtype: object
--------------------------------------------------------------------------------
name                         NHS EASTERN CHESHIRE CCG
quantity_total                                  94129
cost_total                                    4147.11
quantity_branded                                87213
cost_branded                                   3935.2
quantity_generic                                 6916
cost_generic                                  211.905
quantity_ratio                               0.926526
quantity_ratio_percentile                     96.3918
unit_cost_branded                           0.0451217
unit_cost_generic                           0.0306398
quantity_branded_10                           10423.8
quantity_generic_10                           83705.2
target_cost_10                                3035.05
cost_saving_10                                1112.05
Name: 01C, dtype: object
--------------------------------------------------------------------------------
name                         NHS WYRE FOREST CCG
quantity_total                             72912
cost_total                               2407.45
quantity_branded                            5061
cost_branded                             335.126
quantity_generic                           67851
cost_generic                             2072.32
quantity_ratio                         0.0694124
quantity_ratio_percentile                2.06186
unit_cost_branded                      0.0662173
unit_cost_generic                      0.0305423
quantity_branded_10                      8074.21
quantity_generic_10                      64837.8
target_cost_10                           2514.95
cost_saving_10                          -107.496
Name: 06D, dtype: object