Ch7 Figure3


In [1]:
# I once worked with a graduate school that was trying to increase their enrollment by looking at past data. It turned out that the best idea came from a project manager who was also an avid scuba diver. He looked at the demographic data and suggested that a buddy system might increase the students who went through the whole program. That was common practice in scuba training.

buddy = ['yes', 'no']

data = []

for i in range(1000):
    has_buddy = buddy[rd.randint(0,1)]
    if has_buddy == 'yes':
        if rd.random() >= .3:
            dur = rd.randint(4,12)
        else:
            dur = rd.randint(0,12)
    else:
        if rd.random() <= .6:
            dur = rd.randint(0,7)
        else:
            dur = rd.randint(0,12)
    
    data.append([i, has_buddy, dur])
    
df = pd.DataFrame(data, columns = ['id', 'has_buddy', 'weeks-in-program'])
# df.to_csv('csv_output/ch7_fig3.csv', index=False)
df = pd.read_csv('csv_output/ch7_fig3.csv')
df.head()


Out[1]:
id has_buddy weeks-in-program
0 0 yes 7
1 1 no 4
2 2 yes 11
3 3 no 2
4 4 yes 11

In [2]:
df = pd.read_csv('csv_output/ch7_fig3.csv')

%matplotlib inline
sns.set_style("whitegrid")

f, ax = plt.subplots(2,2, figsize=(8,5))

ax1 = plt.subplot2grid((2,2),(0,0), rowspan=2)

ds = df.groupby('has_buddy')['weeks-in-program'].mean().reset_index()
sns.barplot(y='has_buddy', x='weeks-in-program', data=ds, ax=ax1, palette=['cornflowerblue', 'lightblue']);


ds2 = df.groupby(['has_buddy', 'weeks-in-program']).id.count()
ds2_no = ds2['no'].reset_index()
ds2_yes = ds2['yes'].reset_index()

sns.barplot(x='weeks-in-program', y='id', data=ds2_no, ax=ax[0][1], color='cornflowerblue')
sns.barplot(x='weeks-in-program', y='id', data=ds2_yes, ax=ax[1][1], color='lightblue')

ax[0][1].set_ylim(0,80)
ax[0][1].set_ylabel('total count')
ax[1][1].set_ylabel('total count')
ax[0][1].set_title('without buddy- weeks-in-program distribution')
ax[1][1].set_title('with buddy- weeks-in-program distribution')

f.tight_layout()
f.savefig('svg_output/ch7_fig3.svg', format='svg')


Looking at the left, if a student participats in their buddy program, averagely they will stay 3 more weeks longer in the program than those who did not. The distribution of those who has a buddy are more likely to stick around after8 weeks than those who did not.


In [3]:
%load_ext rpy2.ipython

In [4]:
%%R -w 480 -h 300 -u px
require(dplyr)

df = read.csv('csv_output/ch7_fig3.csv')
dgb = df %>% group_by(weeks.in.program, has_buddy)
df2 = summarize(dgb, count=n())

require(ggplot2)
ggplot(df2, aes(x=weeks.in.program, y=count, colour=has_buddy)) + geom_line(size=1) + geom_point(size=2) + ggtitle('weeks in program count \n (with verus without buddy)') + theme_bw()
# ggsave("svg_output/ch7_fig3_R.svg")