At the time of this experiment, Udacity courses currently have two options on the home page: "start free trial", and "access course materials". If the student clicks "start free trial", they will be asked to enter their credit card information, and then they will be enrolled in a free trial for the paid version of the course. After 14 days, they will automatically be charged unless they cancel first. If the student clicks "access course materials", they will be able to view the videos and take the quizzes for free, but they will not receive coaching support or a verified certificate, and they will not submit their final project for feedback.
In the experiment, Udacity tested a change where if the student clicked "start free trial", they were asked how much time they had available to devote to the course. If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. If they indicated fewer than 5 hours per week, a message would appear indicating that Udacity courses usually require a greater time commitment for successful completion, and suggesting that the student might like to access the course materials for free. At this point, the student would have the option to continue enrolling in the free trial, or access the course materials for free instead. This screenshot shows what the experiment looks like.
The hypothesis was that this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course. If this hypothesis held true, Udacity could improve the overall student experience and improve coaches' capacity to support students who are likely to complete the course.
The unit of diversion is a cookie, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.
If the experiment is successful, I expect the users who don't have the time commitment will not complete checkout and enroll into the trial. As an effect, the number of user-ids should statistically decrease, and of those users who wish to enroll into the trial, a higher percentage of them should convert into paying users. The expectation of the evaluation metrics based on my conjecture are as follows:
Baseline Values:
Metrics | Values |
---|---|
Unique cookies to view page per day | 40000 |
Unique cookies to click "Start free trial" per day | 3200 |
Enrollments per day | 660 |
Click-through-probability on "Start free trial" | 0.0800 |
Probability of enrolling, given click | 0.2063 |
Probability of payment, given enroll | 0.5300 |
Probability of payment, given click | 0.1093 |
Standard Deviation for Evaluation Metric:
Metrics | Baseline Values | SE | SE/5000 |
---|---|---|---|
Gross Conversion | 0.2063 | 0.0072 | 0.0202 |
Retention | 0.5300 | 0.0194 | 0.0549 |
Net Conversion | 0.1093 | 0.0055 | 0.0156 |
For Gross and Net Conversion, the analytical estimates would be comparable to the empircal variablility due to the fact that the denominator used to calculate those values were using the the unit of diversion, which was the cookie.
The Retention metric's analytical estimates may differ from the empirical variability. Due to this chance, the retention metric will not be used as an evaluation metric.
Bonferroni correction was not used in this phase because Gross Conversion and Net Conversion are highly correlated with each other. Using Bonferroni correction would too conservative to use, requiring too many pageviews required for the experiment.
In [75]:
# Samples Needed calculated using
# http://www.evanmiller.org/ab-testing/sample-size.html
# Required pageviews for Gross Conversion
samples_needed = 25839.
pageviews = 2 * samples_needed / 0.08
print("Gross Conversion Pageviews Required: {}".format(pageviews))
In [76]:
# Required pageviews for Retention
samples_needed = 39115.
pageviews = 2 * samples_needed / 0.0165
print("Retention Pageviews Required: {}".format(pageviews))
In [77]:
# Required pageviews for Net Conversion
samples_needed = 27411.
pageviews = 2 * samples_needed / 0.08
pageviews
print("Net Conversion Pageviews Required: {}".format(pageviews))
This experiment will not affect other parts of the services that Udacity provides. Paying customers who are already enrolled will not be affected by the experiment. The experiment being launched also does not represent any drastic change. With that said, I consider this experiment being of low risk. It will be launched on all traffic.
With a baseline of 40,000 pageviews a day, the experiment running on all traffic the number of pageviews required for 474,1212 views is 118 days, which is too long of a duration. The Retention evaluation metric will not be used.
The experiment will run for 18 days to cover the 685,275 pageviews required to fulfill the requirements for both Retention and Net Conversion.
In [79]:
# Number of Cookies
import math
pageviews_cont = 345543.
pageviews_exp = 344660.
pageviews_tot = pageviews_cont + pageviews_exp
p = 0.5
std_error = math.sqrt(p*(1-p)/(pageviews_tot))
print("Standard Error: {}".format(std_error))
margin = std_error * 1.96
ci = [0.5-margin,0.5+margin]
observed = pageviews_cont/pageviews_tot
print("Confidence Interval: {},\nObserved: {}".format(ci, observed))
In [80]:
# Click Through Probability
clicks_cont = 28378.
clicks_exp = 28325.
ctp_cont = clicks_cont / pageviews_cont
ctp_exp = clicks_exp / pageviews_exp
ctp_pooled = (clicks_cont + clicks_exp)/ (pageviews_tot)
p = 0.8
std_error = math.sqrt(ctp_cont*(1-ctp_cont)/(pageviews_cont))
print("Standard Error: {}".format(std_error))
margin = std_error * 1.96
ci = [ctp_cont-margin, ctp_cont+margin]
print("Confidence Interval: {},\nObserved: {}".format(ci, ctp_exp))
Both invariant metrics are within range of the confidence interval.
In [86]:
# Gross Conversion
eclicks_cont = 17293.
eclicks_exp = 17260.
enroll_cont = 3785.
enroll_exp = 3423.
gross_conv_pool = ((enroll_cont + enroll_exp) /
(eclicks_cont + eclicks_exp))
se_pool = math.sqrt(gross_conv_pool *
(1 - gross_conv_pool) *
(1/eclicks_cont + 1/eclicks_exp))
print("Gross Conversion Pooled Probability: {}".format(gross_conv_pool))
print("Std. Error Pooled: {}".format(se_pool))
d = (enroll_exp / eclicks_exp) - (enroll_cont / eclicks_cont)
margin = se_pool * 1.96
ci = [d-margin, d+margin]
print("Confidence Interval: {},\nd: {}".format(ci, d))
The gross conversion for the experiment shows practical significance based on dmin = 0.01. It is statistically significant since 0 is outside of the confidence interval.
In [85]:
# Net Conversion
eclicks_cont = 17293.
eclicks_exp = 17260.
pay_cont = 2033.
pay_exp = 1945.
net_conv_pool = (pay_cont + pay_exp) / (eclicks_cont + eclicks_exp)
se_pool = math.sqrt(net_conv_pool *
(1 - net_conv_pool) *
(1/eclicks_cont + 1/eclicks_exp))
print("Net Conversion Pooled Probability: {}".format(net_conv_pool))
print("Std. Error Pooled: {}".format(se_pool))
d = (pay_exp / eclicks_exp) - (pay_cont / eclicks_cont)
margin = se_pool * 1.96
ci = [d-margin, d+margin]
print("Confidence Interval: {},\nd: {}".format(ci, d))
With the resulting confidence interval, Net conversion is not practically significant for dmin= 0.0075. It is also not statistically significant due to 0 being inside the confidence interval.
For Gross Conversion there were 4 successes out of 23 trials. This yielded a two tailed p-value of 0.0026. This is statistically significant for alpha = 0.05/2.
For Net Conversion there were 10 successes out of 23 trials. This yielded a two tailed p-value of 0.6776. This is not statistically significant for alpha = 0.05/2.
As stated before Bonferroni correction was not used in this phase because Gross Conversion and Net Conversion are highly correlated with each other. Using Bonferroni correction would too conservative to use, requiring too many pageviews required for the experiment.
We are also choosing not to use Bonferroni correction because we are requiring both Net Conversion and Gross Conversion to show statistical and practical significance. Bonferroni correction would be useful in a case where any metric showing statistical significance is enough to trigger the launch of an experiment.
Effect size and sign test results showed that then net conversion metric did not render any practical or statiscal significance. However, it does suggest a negative decrease in net conversion because of the resulting negative practical significance value. We also did not meet our $d_{min}$ criteria. Because of this, I am choosing not to launch this experiment.
Gross Conversion, however did show practical and statistical significance, which indicates that the experiment did influence behavior in deciding whether to enroll in the trial. The d value for gross conversion is negative, which means that there were fewer users who opted to enroll in the trial session.
Although launching this experiment can in fact increase overall student experience and improve coaches' capacity to support students who are likely to complete the course, since our results potentially suggest a decrease in net conversion, there might be a risk of udacity losing potential people who enroll into the program.
On Udacity, for each project that needs to be completed, there is a recommended due date for each project specified for a given cohort. However, for each project there currently isn't something that users can use to see if they are on track to complete the project on a week to week basis. It is very easy for a student to fall off track and not meet the recommended due date for each project.
My hypothesis, is that if there was such a mechanism to help keep students on track on a week to week basis, our current evaluation net conversion metric will increase.
Null Hypothesis: A week to week schedule on what to complete, starting from the date a user starts the 14 day trial period, will not increase Retention.
Alt Hypothesis: A week to week schedule on what to complete, starting from the date a user starts the 14 day trial period, will increase Retention.
Unit of Diversion: User-id is the unit of diversion. We would like to compare the retention after the experiment with user-id's who are in the control group against the experimental, which has this "week by week calendar feature" available for them.
Invariant Metric: The number of user-ids, that is the number of users who enroll in the free trial will be used as an invariant metric, since the unit of diversion we are using is the cookie.
Evaluation Metric: Retention will be used as the evaluation metric, because we want to see if our experiment shows an increase of enrollment for users who have access to this week by week calendar resource.