Hospital Readmissions Data Analysis and Recommendations for Reduction

Background

In October 2012, the US government's Center for Medicare and Medicaid Services (CMS) began reducing Medicare payments for Inpatient Prospective Payment System hospitals with excess readmissions. Excess readmissions are measured by a ratio, by dividing a hospital’s number of “predicted” 30-day readmissions for heart attack, heart failure, and pneumonia by the number that would be “expected,” based on an average hospital with similar patients. A ratio greater than 1 indicates excess readmissions.

Exercise Directions

In this exercise, you will:

critique a preliminary analysis of readmissions data and recommendations (provided below) for reducing the readmissions rate
construct a statistically sound analysis and make recommendations of your own

More instructions provided below. Include your work in this notebook and submit to your Github account.

Resources

Data source: https://data.medicare.gov/Hospital-Compare/Hospital-Readmission-Reduction/9n3s-kdb3
More information: http://www.cms.gov/Medicare/medicare-fee-for-service-payment/acuteinpatientPPS/readmissions-reduction-program.html
Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet



In [34]:

    
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import bokeh.plotting as bkp
import seaborn as sns
from mpl_toolkits.axes_grid1 import make_axes_locatable



In [35]:

    
# read in readmissions data provided
hospital_read_df = pd.read_csv('data/cms_hospital_readmissions.csv')

Preliminary Analysis



In [36]:

    
# deal with missing and inconvenient portions of data 
clean_hospital_read_df = hospital_read_df[hospital_read_df['Number of Discharges'] != 'Not Available']
clean_hospital_read_df.loc[:, 'Number of Discharges'] = clean_hospital_read_df['Number of Discharges'].astype(int)
clean_hospital_read_df = clean_hospital_read_df.sort_values('Number of Discharges')
clean_hospital_read_df.head()









    



/home/zczapran/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py:477: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s






    Out[36]:






  
    
      
      Hospital Name
      Provider Number
      State
      Measure Name
      Number of Discharges
      Footnote
      Excess Readmission Ratio
      Predicted Readmission Rate
      Expected Readmission Rate
      Number of Readmissions
      Start Date
      End Date
    
  
  
    
      16857
      THREE RIVERS MEDICAL CENTER
      180128
      KY
      READM-30-HIP-KNEE-HRRP
      0
      7.0
      NaN
      NaN
      NaN
      NaN
      07/01/2010
      06/30/2013
    
    
      14582
      SELLS INDIAN HEALTH SERVICE HOSPITAL
      30074
      AZ
      READM-30-COPD-HRRP
      0
      7.0
      NaN
      NaN
      NaN
      NaN
      07/01/2010
      06/30/2013
    
    
      15606
      PHS INDIAN HOSPITAL AT PINE RIDGE
      430081
      SD
      READM-30-AMI-HRRP
      0
      7.0
      NaN
      NaN
      NaN
      NaN
      07/01/2010
      06/30/2013
    
    
      15615
      FLORIDA STATE HOSPITAL UNIT 31 MED
      100298
      FL
      READM-30-COPD-HRRP
      0
      7.0
      NaN
      NaN
      NaN
      NaN
      07/01/2010
      06/30/2013
    
    
      14551
      GREENE COUNTY HOSPITAL
      10051
      AL
      READM-30-AMI-HRRP
      0
      7.0
      NaN
      NaN
      NaN
      NaN
      07/01/2010
      06/30/2013



In [37]:

    
# generate a scatterplot for number of discharges vs. excess rate of readmissions
# lists work better with matplotlib scatterplot function
x = [a for a in clean_hospital_read_df['Number of Discharges'][81:-3]]
y = list(clean_hospital_read_df['Excess Readmission Ratio'][81:-3])

fig, ax = plt.subplots(figsize=(8,5))
ax.scatter(x, y,alpha=0.2)

ax.fill_between([0,350], 1.15, 2, facecolor='red', alpha = .15, interpolate=True)
ax.fill_between([800,2500], .5, .95, facecolor='green', alpha = .15, interpolate=True)

ax.set_xlim([0, max(x)])
ax.set_xlabel('Number of discharges', fontsize=12)
ax.set_ylabel('Excess rate of readmissions', fontsize=12)
ax.set_title('Scatterplot of number of discharges vs. excess rate of readmissions', fontsize=14)

ax.grid(True)
fig.tight_layout()

Preliminary Report

Read the following results/report. While you are reading it, think about if the conclusions are correct, incorrect, misleading or unfounded. Think about what you would change or what additional analyses you would perform.

A. Initial observations based on the plot above

Overall, rate of readmissions is trending down with increasing number of discharges
With lower number of discharges, there is a greater incidence of excess rate of readmissions (area shaded red)
With higher number of discharges, there is a greater incidence of lower rates of readmissions (area shaded green)

B. Statistics

In hospitals/facilities with number of discharges < 100, mean excess readmission rate is 1.023 and 63% have excess readmission rate greater than 1
In hospitals/facilities with number of discharges > 1000, mean excess readmission rate is 0.978 and 44% have excess readmission rate greater than 1

C. Conclusions

There is a significant correlation between hospital capacity (number of discharges) and readmission rates.
Smaller hospitals/facilities may be lacking necessary resources to ensure quality care and prevent complications that lead to readmissions.

D. Regulatory policy recommendations

Hospitals/facilties with small capacity (< 300) should be required to demonstrate upgraded resource allocation for quality care to continue operation.
Directives and incentives should be provided for consolidation of hospitals and facilities to have a smaller number of them with higher capacity and number of discharges.

Exercise

Include your work on the following in this notebook and submit to your Github account.

A. Do you agree with the above analysis and recommendations? Why or why not?

B. Provide support for your arguments and your own recommendations with a statistically sound analysis:

Setup an appropriate hypothesis test.
Compute and report the observed significance value (or p-value).
Report statistical significance for $\alpha$ = .01.
Discuss statistical significance and practical significance. Do they differ here? How does this change your recommendation to the client?
Look at the scatterplot above.
- What are the advantages and disadvantages of using this plot to convey information?
- Construct another plot that conveys the same information in a more direct manner.

You can compose in notebook cells using Markdown:

In the control panel at the top, choose Cell > Cell Type > Markdown
Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet </div>

Do you agree with the above analysis and recommendations? Why or why not?

I don't agree with the analysis, as it doesn't properly investigate the difference of means being statistically significant. Thus it is hard to say that recommendations are correct.

Setup an appropriate hypothesis test

Null hyphotesis states that there is no difference in mean excess readmission rate in hospitals with number of discharges < 100 and hospitals with number of discharges > 1000. The alternate hypothesis states that, in fact, there is a difference.

Compute and report the observed significance value (or p-value).



In [38]:

    
df = clean_hospital_read_df
low_discharges = df[df['Number of Discharges'] < 100]['Excess Readmission Ratio'].dropna()
high_discharges = df[df['Number of Discharges'] > 1000]['Excess Readmission Ratio'].dropna()
(len(low_discharges.index), len(high_discharges.index))









    Out[38]:





(1188, 463)



In [39]:

    
low_mean = low_discharges.mean()
high_mean = high_discharges.mean()
d = np.sqrt(low_discharges.var() / len(low_discharges.index) + high_discharges.var() / len(high_discharges.index))
z = (low_mean - high_mean) / d
(low_mean, high_mean, std, low_mean - high_mean, z)









    Out[39]:





(1.0226183501683506,
 0.9783354211663071,
 0.017549891338289974,
 0.04428292900204345,
 7.6017424185004856)

Z-score is 7.6 which gives us p-value << 0.001%. (There is <0.001% chance (two-tailed distribution) of sampling such difference or greater)

Report statistical significance for α = .01.

In order to report statistical significance for α = .01, I've retrieved expected Z-value for a two-tailed distribution and p = 0.995 which equals 2.58. Our Z-score computed in the analysis is 7.6 which is much greater than required 2.58, thus the difference between means is statistically significant (for α = .01).

Discuss statistical significance and practical significance. Do they differ here? How does this change your recommendation to the client?

General problem with traditional statistics is that if you take large enough samples, almost any difference or any correlation will be significant.

On the other side, practical significance looks at whether the difference is large enough to be of value in a practical sense.

In case of this analysis, there seem to be difference between means of the two sample seems to be very small in real life and it doesn't render a value in practical sense.

Look at the scatterplot above.

What are the advantages and disadvantages of using this plot to convey information?
Construct another plot that conveys the same information in a more direct manner.

The advantage is that it's easy to see how the excess rate of readmissions relate to number of discharges. The disadvantage is that it's hard to compare the two groups we are mainly interested in, as we get data ploted for all the spectrum of discharges.



In [40]:

    
pd.DataFrame({'low': low_discharges, 'high': high_discharges}).plot.hist(alpha=0.5, bins=20)









    Out[40]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f9fa6427f60>



In [ ]:

	Hospital Name	Provider Number	State	Measure Name	Footnote	Excess Readmission Ratio	Predicted Readmission Rate	Expected Readmission Rate	Number of Readmissions	Start Date	End Date
16857	THREE RIVERS MEDICAL CENTER	180128	KY	READM-30-HIP-KNEE-HRRP	7.0	NaN	NaN	NaN	NaN	07/01/2010	06/30/2013
14582	SELLS INDIAN HEALTH SERVICE HOSPITAL	30074	AZ	READM-30-COPD-HRRP	7.0	NaN	NaN	NaN	NaN	07/01/2010	06/30/2013
15606	PHS INDIAN HOSPITAL AT PINE RIDGE	430081	SD	READM-30-AMI-HRRP	7.0	NaN	NaN	NaN	NaN	07/01/2010	06/30/2013
15615	FLORIDA STATE HOSPITAL UNIT 31 MED	100298	FL	READM-30-COPD-HRRP	7.0	NaN	NaN	NaN	NaN	07/01/2010	06/30/2013
14551	GREENE COUNTY HOSPITAL	10051	AL	READM-30-AMI-HRRP	7.0	NaN	NaN	NaN	NaN	07/01/2010	06/30/2013