In October 2012, the US government's Center for Medicare and Medicaid Services (CMS) began reducing Medicare payments for Inpatient Prospective Payment System hospitals with excess readmissions. Excess readmissions are measured by a ratio, by dividing a hospital’s number of “predicted” 30-day readmissions for heart attack, heart failure, and pneumonia by the number that would be “expected,” based on an average hospital with similar patients. A ratio greater than 1 indicates excess readmissions.
In this exercise, you will:
More instructions provided below. Include your work in this notebook and submit to your Github account.
In [132]:
%matplotlib inline
import pandas as pd
import numpy as np
import scikits.bootstrap as bt
import scipy
import math
import pylab
import matplotlib.pyplot as plt
import bokeh.plotting as bkp
from mpl_toolkits.axes_grid1 import make_axes_locatable
In [4]:
# read in readmissions data provided
hospital_read_df = pd.read_csv('data/cms_hospital_readmissions.csv')
hospital_read_df.head(2)
Out[4]:
In [151]:
# deal with missing and inconvenient portions of data
clean_hospital_read_df = hospital_read_df[hospital_read_df['Number of Discharges'] != 'Not Available']
clean_hospital_read_df.loc[:, 'Number of Discharges'] = clean_hospital_read_df['Number of Discharges'].astype(int)
clean_hospital_read_df = clean_hospital_read_df.sort_values('Number of Discharges')
In [6]:
# generate a scatterplot for number of discharges vs. excess rate of readmissions
# lists work better with matplotlib scatterplot function
x = [a for a in clean_hospital_read_df['Number of Discharges'][81:-3]]
y = list(clean_hospital_read_df['Excess Readmission Ratio'][81:-3])
fig, ax = plt.subplots(figsize=(8,5))
ax.scatter(x, y,alpha=0.2)
ax.fill_between([0,350], 1.15, 2, facecolor='red', alpha = .15, interpolate=True)
ax.fill_between([800,2500], .5, .95, facecolor='green', alpha = .15, interpolate=True)
ax.set_xlim([0, max(x)])
ax.set_xlabel('Number of discharges', fontsize=12)
ax.set_ylabel('Excess rate of readmissions', fontsize=12)
ax.set_title('Scatterplot of number of discharges vs. excess rate of readmissions', fontsize=14)
ax.grid(True)
fig.tight_layout()
Read the following results/report. While you are reading it, think about if the conclusions are correct, incorrect, misleading or unfounded. Think about what you would change or what additional analyses you would perform.
A. Initial observations based on the plot above
B. Statistics
C. Conclusions
D. Regulatory policy recommendations
Include your work on the following in this notebook and submit to your Github account.
A. Do you agree with the above analysis and recommendations? Why or why not?
B. Provide support for your arguments and your own recommendations with a statistically sound analysis:
You can compose in notebook cells using Markdown:
In [123]:
#Collect the entries with non-zero 'Number of Discharges' and not-null 'Excess Readmission Ratio'
df = clean_hospital_read_df.loc[clean_hospital_read_df ['Number of Discharges']!=0,]
df1 = df.loc[df['Excess Readmission Ratio'].notnull(),]
(1) Examine the distributions of excess readmission ratio for small and large hospitals using histograms and normal probability plots:
In [147]:
plt.hist(small['Excess Readmission Ratio'], 50, normed=1, alpha = 0.7, label = 'Small Hospitals')
plt.hist(large['Excess Readmission Ratio'], 50, normed=1, alpha = 0.7, label = 'Large Hospitals')
plt.legend(loc='upper right')
plt.title('Normalized histograms of excess readmission ratio for small and large hospitals')
plt.show()
stats.probplot(small['Excess Readmission Ratio'], dist="norm", plot=pylab)
pylab.title('Probability plot for the excess readmission ratio of small hospitals')
pylab.show()
stats.probplot(large['Excess Readmission Ratio'], dist="norm", plot=pylab)
pylab.title('Probability plot for the excess readmission ratio of large hospitals')
pylab.show()
#data = [small['Excess Readmission Ratio'], large['Excess Readmission Ratio']]
#plt.boxplot(data)
(2) Since the distribution of the excess readmission ratio for small hospital deviates from normal distribution, a Mann–Whitney U test is performed here to test if there is a significant difference in excess readmission ratio $R$ between small (number of discharges < 100) and large (number of discharges > 1000) hospitals
$H_0: R_s = R_l$
$H_A: R_s \neq R_l$
In this case, the excess readmission ratio for small hospital is 0.044 higher than that for large hospital, with p value 3.50e-14 (<0.1). Therefore, we are able to reject the Hypothesis that the excess readmission ratio $R$ between small and large hospitals are the same. The 90% confidence interval of mean excess readmission ratio for small hospitals is [1.020, 1.026], and the 90% confidence interval of mean excess readmission ratio for large hospitals is [0.969, 0.987]. These confidence intervals were calculated using bootstrap.
In [152]:
large = df1[df1['Number of Discharges']>=1000]
small = df1[df1['Number of Discharges']<100]
n_l = large['Number of Discharges'].count()
n_s = small['Number of Discharges'].count()
print('number of large hospitals: ' + str(n_l))
print('number of large hospitals: ' + str(n_s))
mean_l = np.mean(large['Excess Readmission Ratio'])
mean_s = np.mean(small['Excess Readmission Ratio'])
diff = mean_l - mean_s
md = stats.mannwhitneyu(small['Excess Readmission Ratio'],large['Excess Readmission Ratio'])
p_mw = md[1]
print('The excess readmission ratio of large hospitals is ' + str(abs(diff)) + ' lower compared to that of small hospitals with p value ' + str(abs(p_mw)) )
In [135]:
CI_s = bt.ci(data=small['Excess Readmission Ratio'], statfunction=scipy.mean, alpha=0.1)
CI_l = bt.ci(data=large['Excess Readmission Ratio'], statfunction=scipy.mean, alpha=0.1)
print('90% confidence interval of excess readmission ratio for small hospitals:')
print(CI_s)
print('90% confidence interval of excess readmission ratio for large hospitals:')
print(CI_l)
(3) Calculate the Correlation Coefficient
In [150]:
#Examine the distributions of 'Excess Readmission Ratio' and 'Number of Discharges'
excess_ratio = df1['Excess Readmission Ratio']
discharged = df1['Number of Discharges']
plt.hist(excess_ratio, 50)
plt.title('Histogram of excess readmission ratio')
plt.show()
stats.probplot(excess_ratio, dist="norm", plot=pylab)
pylab.title('Probability plot for the excess readmission ratio')
pylab.show()
plt.hist(discharged, 50)
plt.title('Histogram of number of discharges')
plt.show()
stats.probplot(discharged, dist="norm", plot=pylab)
pylab.title('Probability plot for number of discharges')
pylab.show()
#Since not both distributions are normal distributions, it is more appropriate to use Spearman rank correlation coefficient here.
sp = stats.spearmanr(discharged.as_matrix(), excess_ratio.as_matrix())
print('Spearman correlation coeffcient between number of discharges and excess readmission ratio: ')
print('correlation=' + str(sp[0]) + ' pvalue=' + str(sp[1]))
Summary:
The mean of the excess readmission ratio for small hospitals (number of discharges < 100) is slightly higher (0.044) than that for large hospitals (number of discharges > 1000). In general, there is a weak negative correlation (r=-0.077) between the excess readmission ratio and hospital size. These very small difference and correlation are statistically significant. However, further investigations should be carried out to evaluate if they are also clinically significant.
The distribution of the excess readmission ratio for large hospitals is more spread-out compared to that for small hospitals. This may due to the difference in the settings of large hospitals (e.g. county hospitals, university medical centers, or private medical centers).
It is quite arbitrary to make the assumption that hospitals/facilities with capacity < 300 should be required to demonstrate upgraded resource allocation for quality care to continue operation. In addition, there may be some confounding factors that correlate with both hospital size and readmission rate, so making the recommendation of consolidation of hospitals to increase capacity may not really address the problem (reduce the readmission rate).
We should further investigate potential confounding factors to identify the true cause of readmission rate between small and large hospitals.