The number of shoes sold by an e-commerce company during the first three months(12 weeks) of the year were:
23 21 19 24 35 17 18 24 33 27 21 23
Meanwhile, the company developed some dynamic price optimization algorithms and the sales for the next 12 weeks were:
31 28 19 24 32 27 16 41 23 32 29 33
Did the dynamic price optimization algorithm deliver superior results? Can it be trusted?
Before we get onto different approaches, let's quickly get a feel for the data
In [1]:
import numpy as np
import seaborn as sns
sns.set(color_codes=True)
%matplotlib inline
In [2]:
#Load the data
before_opt = np.array([23, 21, 19, 24, 35, 17, 18, 24, 33, 27, 21, 23])
after_opt = np.array([31, 28, 19, 24, 32, 27, 16, 41, 23, 32, 29, 33])
In [3]:
before_opt.mean()
Out[3]:
In [4]:
after_opt.mean()
Out[4]:
In [5]:
observed_difference = after_opt.mean() - before_opt.mean()
In [6]:
print("Difference between the means is:", observed_difference)
On average, the sales after optimization is more than the sales before optimization. But is the difference legit? Could it be due to chance?
Classical Method : We could cover this method later on. This entails doing a t-test
Hacker's Method : Let's see if we can provide a hacker's perspective to this problem, similar to what we did in the previous notebook.
In [7]:
#Step 1: Create the dataset. Let's give Label 0 to before_opt and Label 1 to after_opt
In [8]:
#Learn about the following three functions
In [9]:
?np.append
In [10]:
?np.zeros
In [11]:
?np.ones
In [12]:
shoe_sales = np.array([np.append(np.zeros(before_opt.shape[0]), np.ones(after_opt.shape[0])),
np.append(before_opt, after_opt)], dtype=int)
In [13]:
print("Shape:", shoe_sales.shape)
print("Data:", "\n", shoe_sales)
In [14]:
shoe_sales = shoe_sales.T
print("Shape:",shoe_sales.shape)
print("Data:", "\n", shoe_sales)
In [15]:
#This is the approach we are going to take
#We are going to randomly shuffle the labels. Then compute the mean between the two groups.
#Find the % of times when the difference between the means computed is greater than what we observed above
#If the % of times is less than 5%, we would make the call that the improvements are real
In [16]:
np.random.shuffle(shoe_sales)
In [17]:
shoe_sales
Out[17]:
In [18]:
experiment_label = np.random.randint(0,2,shoe_sales.shape[0])
In [19]:
experiment_label
Out[19]:
In [20]:
experiment_data = np.array([experiment_label, shoe_sales[:,1]])
experiment_data = experiment_data.T
print(experiment_data)
In [21]:
experiment_diff_mean = experiment_data[experiment_data[:,0]==1].mean() \
- experiment_data[experiment_data[:,0]==0].mean()
In [22]:
experiment_diff_mean
Out[22]:
In [23]:
#Like the previous notebook, let's repeat this experiment 100 and then 100000 times
In [24]:
def shuffle_experiment(number_of_times):
experiment_diff_mean = np.empty([number_of_times,1])
for times in np.arange(number_of_times):
experiment_label = np.random.randint(0,2,shoe_sales.shape[0])
experiment_data = np.array([experiment_label, shoe_sales[:,1]]).T
experiment_diff_mean[times] = experiment_data[experiment_data[:,0]==1].mean() \
- experiment_data[experiment_data[:,0]==0].mean()
return experiment_diff_mean
In [25]:
experiment_diff_mean = shuffle_experiment(100)
In [26]:
experiment_diff_mean[:10]
Out[26]:
In [27]:
sns.distplot(experiment_diff_mean, kde=False)
Out[27]:
In [28]:
#Finding % of times difference of means is greater than observed
print("Data: Difference in mean greater than observed:", \
experiment_diff_mean[experiment_diff_mean>=observed_difference])
print("Number of times diff in mean greater than observed:", \
experiment_diff_mean[experiment_diff_mean>=observed_difference].shape[0])
print("% of times diff in mean greater than observed:", \
experiment_diff_mean[experiment_diff_mean>=observed_difference].shape[0]/float(experiment_diff_mean.shape[0])*100)
In [ ]:
Thought process is this: If price optimization had no real effect, then, the sales before optimization would often give more sales than sales after optimization. By shuffling, we are simulating the situation where that happens - sales before optimization is greater than sales after optimization. If many such trials provide improvements, then, the price optimization has no effect. In statistical terms, the observed difference could have occurred by chance.
Now, to show that the same difference in mean might lead to a different conclusion, let's try the same experiment with a different dataset.
In [29]:
before_opt = np.array([230, 210, 190, 240, 350, 170, 180, 240, 330, 270, 210, 230])
after_opt = np.array([310, 180, 190, 240, 220, 240, 160, 410, 130, 320, 290, 210])
In [30]:
print("Mean sales before price optimization:", np.mean(before_opt))
print("Mean sales after price optimization:", np.mean(after_opt))
print("Difference in mean sales:", np.mean(after_opt) - np.mean(before_opt)) #Same as above
In [31]:
shoe_sales = np.array([np.append(np.zeros(before_opt.shape[0]), np.ones(after_opt.shape[0])),
np.append(before_opt, after_opt)], dtype=int)
shoe_sales = shoe_sales.T
In [32]:
experiment_diff_mean = shuffle_experiment(100000)
sns.distplot(experiment_diff_mean, kde=False)
Out[32]:
In [33]:
#Finding % of times difference of means is greater than observed
print("Number of times diff in mean greater than observed:", \
experiment_diff_mean[experiment_diff_mean>=observed_difference].shape[0])
print("% of times diff in mean greater than observed:", \
experiment_diff_mean[experiment_diff_mean>=observed_difference].shape[0]/float(experiment_diff_mean.shape[0])*100)
In [ ]:
In [34]:
before_opt = np.array([23, 21, 19, 24, 35, 17, 18, 24, 33, 27, 21, 23])
after_opt = np.array([31, 28, 19, 24, 32, 27, 16, 41, 23, 32, 29, 33])
print("The % increase of sales in the first case:", \
(np.mean(after_opt) - np.mean(before_opt))/np.mean(before_opt)*100,"%")
In [35]:
before_opt = np.array([230, 210, 190, 240, 350, 170, 180, 240, 330, 270, 210, 230])
after_opt = np.array([310, 180, 190, 240, 220, 240, 160, 410, 130, 320, 290, 210])
print("The % increase of sales in the second case:", \
(np.mean(after_opt) - np.mean(before_opt))/np.mean(before_opt)*100,"%")
Would business feel comfortable spending millions of dollars if the increase is going to be just 1.75%. Does it make sense? Maybe yes - if margins are thin and any increase is considered good. But if the returns from the price optimization module does not let the company break even, it makes no sense to take that path.
Someone tells you the result is statistically significant. The first question you should ask?
To answer such a question, we will make use of the concept confidence interval
In plain english, confidence interval is the range of values the measurement metric is going to take.
An example would be: 90% of the times, the increase in average sales (before and after price optimization) would be within the bucket 3.4 and 6.7
(These numbers are illustrative. We will derive those numbers below)
What is the hacker's way of doing it? We will do the following steps:
In [36]:
#Load the data
before_opt = np.array([23, 21, 19, 24, 35, 17, 18, 24, 33, 27, 21, 23])
after_opt = np.array([31, 28, 19, 24, 32, 27, 16, 41, 23, 32, 29, 33])
In [37]:
#generate a uniform random sample
random_before_opt = np.random.choice(before_opt, size=before_opt.size, replace=True)
In [38]:
print("Actual sample before optimization:", before_opt)
print("Bootstrapped sample before optimization: ", random_before_opt)
In [39]:
print("Mean for actual sample:", np.mean(before_opt))
print("Mean for bootstrapped sample:", np.mean(random_before_opt))
In [40]:
random_after_opt = np.random.choice(after_opt, size=after_opt.size, replace=True)
print("Actual sample after optimization:", after_opt)
print("Bootstrapped sample after optimization: ", random_after_opt)
print("Mean for actual sample:", np.mean(after_opt))
print("Mean for bootstrapped sample:", np.mean(random_after_opt))
In [41]:
print("Difference in means of actual samples:", np.mean(after_opt) - np.mean(before_opt))
print("Difference in means of bootstrapped samples:", np.mean(random_after_opt) - np.mean(random_before_opt))
In [42]:
#Like always, we will repeat this experiment 100,000 times.
def bootstrap_experiment(number_of_times):
mean_difference = np.empty([number_of_times,1])
for times in np.arange(number_of_times):
random_before_opt = np.random.choice(before_opt, size=before_opt.size, replace=True)
random_after_opt = np.random.choice(after_opt, size=after_opt.size, replace=True)
mean_difference[times] = np.mean(random_after_opt) - np.mean(random_before_opt)
return mean_difference
In [43]:
mean_difference = bootstrap_experiment(100000)
sns.distplot(mean_difference, kde=False)
Out[43]:
In [44]:
mean_difference = np.sort(mean_difference, axis=0)
In [45]:
mean_difference #Sorted difference
Out[45]:
In [46]:
np.percentile(mean_difference, [5,95])
Out[46]:
Reiterating what this means: 90% of the times, the mean difference is between the limits as shown above
Exercise: Find the 95% percentile for confidence intevals
In [ ]:
First of all there are two points to be made.
For the first one:
What if sales in the first month after price changes was 80 and the month before price changes was 40. The difference is 40. And confidence interval,as explained above, using replacements, would always generate 40. But if we do the significance testing, as detailed above - where the labels are shuffled, the prices are equally likely to occur in both the groups. And so, significance testing would answer that there was no difference. But don't we all know that the data is too small to make meaningful inferences?
For the second one:
Traditional statistics derivation assumes normal distribution. But what if the underlying distribution isn't normal? Also, people relate to resampling much better :-)
In [ ]: