Ch15 Figure2



In [1]:

    
# Your team works with the web designers to run a few experiments. They randomly replace images of young fit runners with those who are less fit and older. The team works with a data analyst to create reports to look at the difference. In the reports there is a drop in overall sales. 

version = ['young fit runners', 'less fit and older']

data = []

for i in range(1000):
    
    date = random_date()
    v = version[rd.randint(0,1)]
    value = rd.random() * 100
    
    if v == 0:
        if rd.random() <= 0.8:
            value *= 1.3
        else:
            value *= .8
            
    
    data.append([i, date, v, value])

df = pd.DataFrame(data, columns=['id', 'date', 'version', 'value'])
# df.to_csv('csv_output/ch15_fig2.csv', index=False)
df = pd.read_csv('csv_output/ch15_fig2.csv')
df.head()









    Out[1]:






  
    
      
      id
      date
      version
      value
    
  
  
    
      0
      0
      2009-03-29 01:16:34
      young fit runners
      8.853708
    
    
      1
      1
      2009-03-11 13:33:59
      young fit runners
      67.295266
    
    
      2
      2
      2009-03-25 07:32:48
      less fit and older
      83.722551
    
    
      3
      3
      2009-03-23 05:57:30
      less fit and older
      82.323605
    
    
      4
      4
      2009-03-20 21:43:40
      less fit and older
      25.289439



In [2]:

    
df = pd.read_csv('csv_output/ch15_fig2.csv')
df['date_new'] = df.date.map(lambda x: str(x)[:10])
df['date_new'] = pd.to_datetime(df.date_new)

%matplotlib inline
sns.set_style("white")

f, ax = plt.subplots(1,2, figsize=(10,6))

df2 = df.pivot_table(index='date_new', columns='version', values='value', aggfunc=np.sum).reset_index();
ax[0].plot_date(x=df2.date_new, y=df2['young fit runners'], fmt='o-');
ax[0].plot_date(x=df2.date_new, y=df2['less fit and older'], fmt='o--');
ax[0].legend(['young fit runners', 'less fit and older']);
ax[0].set_title('total sales value by day and version');

ax[1].plot_date(x=df2.date_new, y = df2['young fit runners'].rolling(center=False, window=3).mean(), fmt='o-');
ax[1].plot_date(x=df2.date_new, y = df2['less fit and older'].rolling(center=False, window=5).mean(), fmt='o--');
ax[1].legend(['young fit runners', 'less fit and older']);
ax[1].set_title('total sales value 5 days moving average by version');

f.autofmt_xdate()
f.tight_layout()

f.savefig('svg_output/ch15_fig2.svg', format='svg')

If just looking at the time-series, buyers that saw 'less fit and older' version of the page has slightly lower total sales by day. If looking at 5-days moving average, 'less fit and older' version is all lower the entire month.

	id	date	version	value
0	0	2009-03-29 01:16:34	young fit runners	8.853708
1	1	2009-03-11 13:33:59	young fit runners	67.295266
2	2	2009-03-25 07:32:48	less fit and older	83.722551
3	3	2009-03-23 05:57:30	less fit and older	82.323605
4	4	2009-03-20 21:43:40	less fit and older	25.289439