In October 2012, the US government's Center for Medicare and Medicaid Services (CMS) began reducing Medicare payments for Inpatient Prospective Payment System hospitals with excess readmissions. Excess readmissions are measured by a ratio, by dividing a hospital’s number of “predicted” 30-day readmissions for heart attack, heart failure, and pneumonia by the number that would be “expected,” based on an average hospital with similar patients. A ratio greater than 1 indicates excess readmissions.
In this exercise, you will:
More instructions provided below. Include your work in this notebook and submit to your Github account.
In [31]:
%matplotlib inline
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import bokeh.plotting as bkp
from mpl_toolkits.axes_grid1 import make_axes_locatable
In [32]:
# read in readmissions data provided
hospital_read_df = pd.read_csv('data/cms_hospital_readmissions.csv')
In [33]:
# deal with missing and inconvenient portions of data
clean_hospital_read_df = hospital_read_df[hospital_read_df['Number of Discharges'] != 'Not Available']
clean_hospital_read_df.loc[:, 'Number of Discharges'] = clean_hospital_read_df['Number of Discharges'].astype(int)
clean_hospital_read_df = clean_hospital_read_df.sort_values('Number of Discharges')
clean_hospital_read_df.head(5)
Out[33]:
In [34]:
# generate a scatterplot for number of discharges vs. excess rate of readmissions
# lists work better with matplotlib scatterplot function
x = [a for a in clean_hospital_read_df['Number of Discharges'][81:-3]]
y = list(clean_hospital_read_df['Excess Readmission Ratio'][81:-3])
fig, ax = plt.subplots(figsize=(8,5))
ax.scatter(x, y,alpha=0.2)
ax.fill_between([0,350], 1.15, 2, facecolor='red', alpha = .15, interpolate=True)
ax.fill_between([800,2500], .5, .95, facecolor='green', alpha = .15, interpolate=True)
ax.set_xlim([0, max(x)])
ax.set_xlabel('Number of discharges', fontsize=12)
ax.set_ylabel('Excess rate of readmissions', fontsize=12)
ax.set_title('Scatterplot of number of discharges vs. excess rate of readmissions', fontsize=14)
ax.grid(True)
fig.tight_layout()
Read the following results/report. While you are reading it, think about if the conclusions are correct, incorrect, misleading or unfounded. Think about what you would change or what additional analyses you would perform.
A. Initial observations based on the plot above
B. Statistics
C. Conclusions
D. Regulatory policy recommendations
Include your work on the following in this notebook and submit to your Github account.
A. Do you agree with the above analysis and recommendations? Why or why not?
B. Provide support for your arguments and your own recommendations with a statistically sound analysis:
You can compose in notebook cells using Markdown:
In [35]:
# Your turn
In [36]:
# select only columns that are I'm interested
df = clean_hospital_read_df[['Number of Discharges','Excess Readmission Ratio', \
'Predicted Readmission Rate','Expected Readmission Rate','Number of Readmissions']]
# remove data with 4 columns with NaN (null)
df = df.dropna(how='any')
In [37]:
from scipy import stats
# define 1000 as high number of discharges and less than 300 as low
df_hi_dsch = df[df['Number of Discharges'] >= 1000]
df_lo_dsch = df[df['Number of Discharges'] <= 300]
# compute the mean for checking
print('Mean Excess Readmission (high discharges): ',df_hi_dsch['Excess Readmission Ratio'].mean())
print('Mean Excess Readmission (low discharges): ',df_lo_dsch['Excess Readmission Ratio'].mean())
print('Number of hospitals with high number of discharges and high excess readmission ratio = ', \
(df_hi_dsch[df_hi_dsch['Excess Readmission Ratio'] > 1.0].count()[0])/df_hi_dsch.count()[0]*100)
print('Number of hospitals with low number of discharges and high excess readmission ratio = ', \
(df_lo_dsch[df_lo_dsch['Excess Readmission Ratio'] > 1.0].count()[0])/df_lo_dsch.count()[0]*100)
# compute the t-test pvalue
print('Compare number of dscharges p-value=', \
stats.ttest_ind(df_hi_dsch['Excess Readmission Ratio'], df_lo_dsch['Excess Readmission Ratio'], equal_var = False))
# construct scatterplot
sns.lmplot('Number of Discharges','Excess Readmission Ratio', df_hi_dsch)
sns.plt.title('Scatterplot of High number of discharges vs. excess rate of readmissions', fontsize=14)
sns.lmplot('Number of Discharges','Excess Readmission Ratio', df_lo_dsch)
sns.plt.title('Scatterplot of Low number of discharges vs. excess rate of readmissions', fontsize=14)
Out[37]:
In [38]:
# construct correlation matrix
corrmat = df.corr()
print(corrmat)
# Draw the heatmap using seaborn
sns.heatmap(corrmat, vmax=.8, square=True)
Out[38]:
In [39]:
import numpy as np
import scipy as sc
def cohens_d(x, y):
lx = len(x)- 1
ly = len(y)- 1
md = np.abs(x.mean() - y.mean()) ## mean difference (numerator)
csd = lx * x.var() + ly * y.var()
csd = csd/(lx + ly)
#print(md)
csd = np.sqrt(csd) ## common sd computation
return md/csd ## cohen's d
def printCohen(x):
if x >= .80:
print("large effect")
elif x >= .50:
print("medium effect")
elif x >= .20:
print("small effect")
else: print("no effect")
return x
cd=cohens_d(df_hi_dsch, df_lo_dsch)
print(df_hi_dsch.dtypes.index[1], 'cohen''s d ratio= ',printCohen(cd[1]))
print('test Pearson r: ', sc.stats.pearsonr(df['Number of Discharges'],df['Excess Readmission Ratio']))
Include your work on the following in this notebook and submit to your Github account.
A. Do you agree with the above analysis and recommendations? Why or why not?
B. Statistics
C. Conclusions
B. Provide support for your arguments and your own recommendations with a statistically sound analysis: