Hospital Readmissions Data Analysis and Recommendations for Reduction

Background

In October 2012, the US government's Center for Medicare and Medicaid Services (CMS) began reducing Medicare payments for Inpatient Prospective Payment System hospitals with excess readmissions. Excess readmissions are measured by a ratio, by dividing a hospital’s number of “predicted” 30-day readmissions for heart attack, heart failure, and pneumonia by the number that would be “expected,” based on an average hospital with similar patients. A ratio greater than 1 indicates excess readmissions.

Exercise Directions

In this exercise, you will:

critique a preliminary analysis of readmissions data and recommendations (provided below) for reducing the readmissions rate
construct a statistically sound analysis and make recommendations of your own

More instructions provided below. Include your work in this notebook and submit to your Github account.

Resources

Data source: https://data.medicare.gov/Hospital-Compare/Hospital-Readmission-Reduction/9n3s-kdb3
More information: http://www.cms.gov/Medicare/medicare-fee-for-service-payment/acuteinpatientPPS/readmissions-reduction-program.html
Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet



In [132]:

    
%matplotlib inline

import pandas as pd
import numpy as np
import scikits.bootstrap as bt  
import scipy
import math
import pylab
import matplotlib.pyplot as plt
import bokeh.plotting as bkp
from mpl_toolkits.axes_grid1 import make_axes_locatable



In [4]:

    
# read in readmissions data provided
hospital_read_df = pd.read_csv('data/cms_hospital_readmissions.csv')
hospital_read_df.head(2)









    Out[4]:






  
    
      
      Hospital Name
      Provider Number
      State
      Measure Name
      Number of Discharges
      Footnote
      Excess Readmission Ratio
      Predicted Readmission Rate
      Expected Readmission Rate
      Number of Readmissions
      Start Date
      End Date
    
  
  
    
      0
      FROEDTERT MEMORIAL LUTHERAN HOSPITAL
      520177
      WI
      READM-30-HIP-KNEE-HRRP
      242
      NaN
      1.9095
      10.8
      5.6
      38.0
      07/01/2010
      06/30/2013
    
    
      1
      PROVIDENCE HOSPITAL
      90006
      DC
      READM-30-HIP-KNEE-HRRP
      247
      NaN
      1.7521
      9.2
      5.3
      33.0
      07/01/2010
      06/30/2013

Preliminary Analysis



In [151]:

    
# deal with missing and inconvenient portions of data 
clean_hospital_read_df = hospital_read_df[hospital_read_df['Number of Discharges'] != 'Not Available']
clean_hospital_read_df.loc[:, 'Number of Discharges'] = clean_hospital_read_df['Number of Discharges'].astype(int)
clean_hospital_read_df = clean_hospital_read_df.sort_values('Number of Discharges')









    



C:\Users\Chung\Anaconda3\lib\site-packages\pandas\core\indexing.py:465: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s



In [6]:

    
# generate a scatterplot for number of discharges vs. excess rate of readmissions
# lists work better with matplotlib scatterplot function
x = [a for a in clean_hospital_read_df['Number of Discharges'][81:-3]]
y = list(clean_hospital_read_df['Excess Readmission Ratio'][81:-3])

fig, ax = plt.subplots(figsize=(8,5))
ax.scatter(x, y,alpha=0.2)

ax.fill_between([0,350], 1.15, 2, facecolor='red', alpha = .15, interpolate=True)
ax.fill_between([800,2500], .5, .95, facecolor='green', alpha = .15, interpolate=True)

ax.set_xlim([0, max(x)])
ax.set_xlabel('Number of discharges', fontsize=12)
ax.set_ylabel('Excess rate of readmissions', fontsize=12)
ax.set_title('Scatterplot of number of discharges vs. excess rate of readmissions', fontsize=14)

ax.grid(True)
fig.tight_layout()

Preliminary Report

Read the following results/report. While you are reading it, think about if the conclusions are correct, incorrect, misleading or unfounded. Think about what you would change or what additional analyses you would perform.

A. Initial observations based on the plot above

Overall, rate of readmissions is trending down with increasing number of discharges
With lower number of discharges, there is a greater incidence of excess rate of readmissions (area shaded red)
With higher number of discharges, there is a greater incidence of lower rates of readmissions (area shaded green)

B. Statistics

In hospitals/facilities with number of discharges < 100, mean excess readmission rate is 1.023 and 63% have excess readmission rate greater than 1
In hospitals/facilities with number of discharges > 1000, mean excess readmission rate is 0.978 and 44% have excess readmission rate greater than 1

C. Conclusions

There is a significant correlation between hospital capacity (number of discharges) and readmission rates.
Smaller hospitals/facilities may be lacking necessary resources to ensure quality care and prevent complications that lead to readmissions.

D. Regulatory policy recommendations

Hospitals/facilties with small capacity (< 300) should be required to demonstrate upgraded resource allocation for quality care to continue operation.
Directives and incentives should be provided for consolidation of hospitals and facilities to have a smaller number of them with higher capacity and number of discharges.

Exercise

Include your work on the following in this notebook and submit to your Github account.

A. Do you agree with the above analysis and recommendations? Why or why not?

B. Provide support for your arguments and your own recommendations with a statistically sound analysis:

Setup an appropriate hypothesis test.
Compute and report the observed significance value (or p-value).
Report statistical significance for $\alpha$ = .01.
Discuss statistical significance and practical significance. Do they differ here? How does this change your recommendation to the client?
Look at the scatterplot above.
- What are the advantages and disadvantages of using this plot to convey information?
- Construct another plot that conveys the same information in a more direct manner.

You can compose in notebook cells using Markdown:

In the control panel at the top, choose Cell > Cell Type > Markdown
Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet



In [123]:

    
#Collect the entries with non-zero 'Number of Discharges' and not-null 'Excess Readmission Ratio' 
df = clean_hospital_read_df.loc[clean_hospital_read_df ['Number of Discharges']!=0,]
df1 = df.loc[df['Excess Readmission Ratio'].notnull(),]

(1) Examine the distributions of excess readmission ratio for small and large hospitals using histograms and normal probability plots:

The distribution of excess readmission ratio for large hospitals follows a normal distribution, however, the distribution of excess readmission ratio for small hospitals deviates from a normal distribution with a long upper tail.
The distribution of the excess readmission ratio for large hospitals is more spread-out compared to that for small hospitals. The mean of excess readmission ratio for larger hospitals seem to be somewhat lower than that for smaller hospitals.



In [147]:

    
plt.hist(small['Excess Readmission Ratio'], 50, normed=1, alpha = 0.7, label = 'Small Hospitals')
plt.hist(large['Excess Readmission Ratio'], 50, normed=1, alpha = 0.7, label = 'Large Hospitals')
plt.legend(loc='upper right')
plt.title('Normalized histograms of excess readmission ratio for small and large hospitals')
plt.show()

stats.probplot(small['Excess Readmission Ratio'], dist="norm", plot=pylab)
pylab.title('Probability plot for the excess readmission ratio of small hospitals')
pylab.show()

stats.probplot(large['Excess Readmission Ratio'], dist="norm", plot=pylab)
pylab.title('Probability plot for the excess readmission ratio of large hospitals')
pylab.show()

#data = [small['Excess Readmission Ratio'], large['Excess Readmission Ratio']]
#plt.boxplot(data)

(2) Since the distribution of the excess readmission ratio for small hospital deviates from normal distribution, a Mann–Whitney U test is performed here to test if there is a significant difference in excess readmission ratio $R$ between small (number of discharges < 100) and large (number of discharges > 1000) hospitals

$H_0: R_s = R_l$
$H_A: R_s \neq R_l$

In this case, the excess readmission ratio for small hospital is 0.044 higher than that for large hospital, with p value 3.50e-14 (<0.1). Therefore, we are able to reject the Hypothesis that the excess readmission ratio $R$ between small and large hospitals are the same. The 90% confidence interval of mean excess readmission ratio for small hospitals is [1.020, 1.026], and the 90% confidence interval of mean excess readmission ratio for large hospitals is [0.969, 0.987]. These confidence intervals were calculated using bootstrap.



In [152]:

    
large = df1[df1['Number of Discharges']>=1000]
small = df1[df1['Number of Discharges']<100]

n_l = large['Number of Discharges'].count()
n_s = small['Number of Discharges'].count()
print('number of large hospitals: ' + str(n_l))
print('number of large hospitals: ' + str(n_s))

mean_l = np.mean(large['Excess Readmission Ratio'])
mean_s = np.mean(small['Excess Readmission Ratio'])

diff = mean_l - mean_s
md = stats.mannwhitneyu(small['Excess Readmission Ratio'],large['Excess Readmission Ratio'])
p_mw = md[1] 

print('The excess readmission ratio of large hospitals is ' + str(abs(diff)) + ' lower compared to that of small hospitals with p value ' + str(abs(p_mw)) )









    



number of large hospitals: 464
number of large hospitals: 1188
The excess readmission ratio of large hospitals is 0.044284082926970836 lower compared to that of small hospitals with p value 3.50027718963e-14



In [135]:

    
CI_s = bt.ci(data=small['Excess Readmission Ratio'], statfunction=scipy.mean, alpha=0.1)  
CI_l = bt.ci(data=large['Excess Readmission Ratio'], statfunction=scipy.mean, alpha=0.1)  
print('90% confidence interval of excess readmission ratio for small hospitals:')
print(CI_s)
print('90% confidence interval of excess readmission ratio for large hospitals:')
print(CI_l)









    



90% confidence interval of excess readmission ratio for small hospitals:
[ 1.0199771   1.02551313]
90% confidence interval of excess readmission ratio for large hospitals:
[ 0.96925     0.98747457]

(3) Calculate the Correlation Coefficient

The distribution of the excess readmission ratio somewhat deviates from a normal distribution around its upper tail, and the distribution of the number of discharges is very right skew.
The Spearman Correlation Coefficient between these two variables shows that there is a weak correlation (r=-0.077) between these two variables, and this weak correlation is not due to sampling errors (p=0.22e-16).



In [150]:

    
#Examine the distributions of 'Excess Readmission Ratio' and 'Number of Discharges'
excess_ratio = df1['Excess Readmission Ratio']
discharged = df1['Number of Discharges']

plt.hist(excess_ratio, 50)
plt.title('Histogram of excess readmission ratio')
plt.show()

stats.probplot(excess_ratio, dist="norm", plot=pylab)
pylab.title('Probability plot for the excess readmission ratio')
pylab.show()

plt.hist(discharged, 50)
plt.title('Histogram of number of discharges')
plt.show()

stats.probplot(discharged, dist="norm", plot=pylab)
pylab.title('Probability plot for number of discharges')
pylab.show()

#Since not both distributions are normal distributions, it is more appropriate to use Spearman rank correlation coefficient here.
sp = stats.spearmanr(discharged.as_matrix(), excess_ratio.as_matrix())
print('Spearman correlation coeffcient between number of discharges and excess readmission ratio: ')
print('correlation=' + str(sp[0]) + ' pvalue=' + str(sp[1]))









    












    












    












    












    



Spearman correlation coeffcient between number of discharges and excess readmission ratio: 
correlation=-0.0771294053707 pvalue=1.21564596596e-16

Summary:

The mean of the excess readmission ratio for small hospitals (number of discharges < 100) is slightly higher (0.044) than that for large hospitals (number of discharges > 1000). In general, there is a weak negative correlation (r=-0.077) between the excess readmission ratio and hospital size. These very small difference and correlation are statistically significant. However, further investigations should be carried out to evaluate if they are also clinically significant.
The distribution of the excess readmission ratio for large hospitals is more spread-out compared to that for small hospitals. This may due to the difference in the settings of large hospitals (e.g. county hospitals, university medical centers, or private medical centers).
It is quite arbitrary to make the assumption that hospitals/facilities with capacity < 300 should be required to demonstrate upgraded resource allocation for quality care to continue operation. In addition, there may be some confounding factors that correlate with both hospital size and readmission rate, so making the recommendation of consolidation of hospitals to increase capacity may not really address the problem (reduce the readmission rate).
We should further investigate potential confounding factors to identify the true cause of readmission rate between small and large hospitals.

	Hospital Name	Provider Number	State	Measure Name	Number of Discharges	Footnote	Excess Readmission Ratio	Predicted Readmission Rate	Expected Readmission Rate	Number of Readmissions	Start Date	End Date
0	FROEDTERT MEMORIAL LUTHERAN HOSPITAL	520177	WI	READM-30-HIP-KNEE-HRRP	242	NaN	1.9095	10.8	5.6	38.0	07/01/2010	06/30/2013
1	PROVIDENCE HOSPITAL	90006	DC	READM-30-HIP-KNEE-HRRP	247	NaN	1.7521	9.2	5.3	33.0	07/01/2010	06/30/2013