Detailed A/B Test Experiment

Problem

  • There are 2 options for the landing page:
    • "start free trial"
      • If the visitor clicks this option, will be asked for credit card info, and after 14 days they will be charged automatically.
    • "access course materials"
      • If the visitor clicks this option, they can watch videos and take the quiz for free, but won't get coaching nor certificate.
  • The goal for A/B test here is to see which option could help maximize the course completion. Hmm, in fact this can be hard to guess, since when it's free students may not finish the course, even though there might be more clicks; when it's paid trail, students may be more likely to finish the course even though there might be less visitors.

Hypothesis

  • H0: P_control - P_experiment = 0
    • P_control is the conversion rate of control group, P_experiment is the conversion rate of experiment group.
  • H1: P_control - P_experiment = d
    • d is the detectable effect
  • When the results appear to be significant, we reject H0.

Metrics

Invariate Metrics

  • They are majorly used for sanity check.
  • Pick those metrics considered not to change, and make sire they won't change dramatically between the control and experiment groups.

Evaluation Metrics

  • The metrics are expected to see changes between the control and the experiment group.These metrics are related to the business goals.
  • Each metrics has a Dmin, indicating the min change that's practically significant to the business.

Overall Process

  1. Choose invariant metrics and evaluation metrics.
  2. Estimation of baselines and Sample Size
    • Estimate population metrics baseline
    • Estimate sample metrics baseline
    • Estimate sample size for each evaluation metrics, only keep the metrics that have practical amount of sample size.
  3. Verify null hypothese - Control Group vs. Experiment Group
    • Sanity check with invariant metrics
    • Differences in evaluation metrics, using confidence interval, Dmin to check both statistical, practical significance.
    • Differences in trending, using p_value to check statistical significance.

In [119]:
import pandas as pd
import math
import numpy as np
from scipy.stats import norm

Data Overview

  • Each pgeview is a unique cookie
  • Control group: "access course materials" option
  • Experiment group: "free trail" option

In [6]:
# Control group
control_df = pd.read_csv('ab_control.csv')
control_df.head()


Out[6]:
Date Pageviews Clicks Enrollments Payments
0 Sat, Oct 11 7723 687 134.0 70.0
1 Sun, Oct 12 9102 779 147.0 70.0
2 Mon, Oct 13 10511 909 167.0 95.0
3 Tue, Oct 14 9871 836 156.0 105.0
4 Wed, Oct 15 10014 837 163.0 64.0

In [9]:
# Experiment group
experiment_df = pd.read_csv('ab_experiment.csv')
experiment_df.head()


Out[9]:
Date Pageviews Clicks Enrollments Payments
0 Sat, Oct 11 7716 686 105.0 34.0
1 Sun, Oct 12 9288 785 116.0 91.0
2 Mon, Oct 13 10480 884 145.0 79.0
3 Tue, Oct 14 9867 827 138.0 92.0
4 Wed, Oct 15 9793 832 140.0 94.0

In [10]:
control_df.describe()


Out[10]:
Pageviews Clicks Enrollments Payments
count 37.000000 37.000000 23.000000 23.000000
mean 9339.000000 766.972973 164.565217 88.391304
std 740.239563 68.286767 29.977000 20.650202
min 7434.000000 632.000000 110.000000 56.000000
25% 8896.000000 708.000000 146.500000 70.000000
50% 9420.000000 759.000000 162.000000 91.000000
75% 9871.000000 825.000000 175.000000 102.500000
max 10667.000000 909.000000 233.000000 128.000000

In [11]:
experiment_df.describe()


Out[11]:
Pageviews Clicks Enrollments Payments
count 37.000000 37.000000 23.000000 23.000000
mean 9315.135135 765.540541 148.826087 84.565217
std 708.070781 64.578374 33.234227 23.060841
min 7664.000000 642.000000 94.000000 34.000000
25% 8881.000000 722.000000 127.000000 69.000000
50% 9359.000000 770.000000 142.000000 91.000000
75% 9737.000000 827.000000 172.000000 99.000000
max 10551.000000 884.000000 213.000000 123.000000

Step 1 - Choose Metrics

Invariate Metrics

  • Ck = unique daily cookies (pageviews) on a page
    • Dmin = 3000
  • Cl = unique daily clicks on the free trial button
    • Dmin = 240
  • CTP = Cl/Ck, free trial button click through probability
    • Dmin = 0.01

Evaluation Metrics

  • GConversion = enrolled/Cl
    • It's gross conversion
    • Dmin = 0.01
    • Probability of daily enrolled among daily clicked free trial button
  • Retention = paid/enrolled
    • Dmin = 0.01
    • Probability of daily paid among daily enrolled
  • NConversion = paid/Cl
    • It's net conversion
    • Dmin = 0.01
    • Probability of daily paid among daily clicked free trial button

Step 2.1 - Estimate Metrics Baseline Values

  • The baseline values are the values of these metrics before the change.
  • These values are given by the data provider Udacity, it's their rough estimation.

In [28]:
baseline = {'Cookies': 40000, 'Clicks': 3200, 'Enrollments': 660, 'CTP': 0.08,
           'GConversion': 0.20625, 'Retention': 0.53, 'NConversion': 0.109313}

Step 2.2 Estimate Standard Deviation & Sample Size

Estimate Standard Deviation of Metrics

  • This is for later estimating sample size, confidence interval.
  • The more variant of a metric, the more difficult to reach to a significant result.
  • In order to estimate variance, assume metrics probabilities `p_hat` are binomially distributed, (probability density function PDF is binomially distribution), then standard deviation is:
    • std = sqrt(p_hat*(1-p_hat)/n)
      • p_hat: baseline probability of the event to occur
      • n: sample size
    • The reason we assume it's binomial distribution for probability density function is because, the logic here is if not option A then option B.
  • The assumption is only valid when the unit of diversion of the experiment is equal to the unit of analysis (the denominator of the metric formula). If the assumption is not valid, the calculated std can be different and better to estimate empirically.

In [29]:
sample_baseline = baseline.copy()

sample_baseline['Cookies'] = 5000  # assume sample size is 5000
sample_baseline['Clicks'] = baseline['Clicks'] * 5000/baseline['Cookies']
sample_baseline['Enrollments'] = baseline['Enrollments'] * 5000/baseline['Cookies']

sample_baseline


Out[29]:
{'Cookies': 5000,
 'Clicks': 400.0,
 'Enrollments': 82.5,
 'CTP': 0.08,
 'GConversion': 0.20625,
 'Retention': 0.53,
 'NConversion': 0.109313}

In [33]:
def get_binomial_std(p_hat, n):
    """
    p_hat: baseline probability of the event to occur
    n: sample size
    return: the standard deviation
    """
    std = round(math.sqrt(p_hat*(1-p_hat)/n),4)
    
    return std

Gross Conversion std

  • p_hat = enrolled/Cl
    • probability of daily enrolled among daily clicked free trial button

In [34]:
gross_conversion = {}

gross_conversion['d_min'] = 0.01
gross_conversion['p_hat'] = sample_baseline['GConversion']
gross_conversion['n'] = sample_baseline['Clicks']
gross_conversion['std'] = get_binomial_std(gross_conversion['p_hat'],
                                          gross_conversion['n'])

gross_conversion


Out[34]:
{'d_min': 0.01, 'p_hat': 0.20625, 'n': 400.0, 'std': 0.0202}

Revention std


In [37]:
retention = {}

retention['d_min'] = 0.01
retention['p_hat'] = sample_baseline['Retention']
retention['n'] = sample_baseline['Enrollments']
retention['std'] = get_binomial_std(retention['p_hat'],
                                   retention['n'])

retention


Out[37]:
{'d_min': 0.01, 'p_hat': 0.53, 'n': 82.5, 'std': 0.0549}

Net Conversion std


In [54]:
net_conversion = {}

net_conversion['d_min'] = 0.0075
net_conversion['p_hat'] = sample_baseline['NConversion']
net_conversion['n'] = sample_baseline['Clicks']
net_conversion['std'] = get_binomial_std(net_conversion['p_hat'],
                                        net_conversion['n'])

net_conversion


Out[54]:
{'d_min': 0.0075, 'p_hat': 0.109313, 'n': 400.0, 'std': 0.0156}

Estimate Sample Size

Hypothesis

  • H0: P_control - P_experiment = 0
    • P_control is the conversion rate of control group, P_experiment is the conversion rate of experiment group.
  • H1: P_control - P_experiment = d
    • d is the detectable effect

Sample Size Formula

  • n = pow(Z1-α/2 * std1 + Z1-β * std2, 2)/pow(d, 2)

    • Z1-α/2 is the Z score for 1-α/2, α is the probability of Type I error
    • Z1-β is the Z score for 1-β (Power), β is the probability of Type II error
    • std1 = sqrt(2p*(1-p))
    • std2 = sqrt(p*(1-p) + (p+d)*(1-(p+d)))
      • p is the baseline conversion rate, it's the p_hat from above
      • d is the detectable effect, it's the d_min from above
  • This is the online calculator for sample size: https://www.evanmiller.org/ab-testing/sample-size.html

    • Given p, d, α and 1-β

In [42]:
def get_z_score(alpha):
    return norm.ppf(alpha)


def get_stds(p, d):
    std1 = math.sqrt(2*p*(1-p))
    std2 = math.sqrt(p*(1-p) + (p+d)*(1-(p+d)))
    
    std_lst = [std1, std2]
    return std_lst
    
    
def get_sample_size(std_lst, alpha, beta, d):
    n = pow(get_z_score(1-alpha/2)*std_lst[0] + get_z_score(1-beta)*std_lst[1], 2)/pow(d,2)
    return n

In [44]:
alpha = 0.05
beta = 0.2

Gross Conversion Sample Size

  • Calculated sample_size means estimated number of enrolled in each group
  • In order to get estimated page_views (unique cookies), needs to use sample_size/(400/5000) * 2 for both control and experiments groups
    • multuple 2 means for both groups

In [49]:
gross_conversion['sample_size'] = round(get_sample_size(get_stds(gross_conversion['p_hat'],
                                                          gross_conversion['d_min']), alpha, beta,
                                                 gross_conversion['d_min']))
gross_conversion['page_views'] = 2*round(gross_conversion['sample_size']/(gross_conversion['n']/5000))

gross_conversion


Out[49]:
{'d_min': 0.01,
 'p_hat': 0.20625,
 'n': 400.0,
 'std': 0.0202,
 'sample_size': 25835.0,
 'page_views': 645876.0}

Retention Sample Size

  • Calculated sample_size means estimated number of paid in each group
  • In order to get estimated page_views (unique cookies), needs to use sample_size/(82.5/5000) * 2 for both control and experiments groups
    • multuple 2 means for both groups
  • The required page_views is too large, if 40000 views a day, it will take 100 days to get the data. So this metric Retention might be given up.

In [52]:
retention['sample_size'] = round(get_sample_size(get_stds(retention['p_hat'],
                                                          retention['d_min']), alpha, beta,
                                                 retention['d_min']))
retention['page_views'] = 2*round(retention['sample_size']/(retention['n']/5000))
retention


Out[52]:
{'d_min': 0.01,
 'p_hat': 0.53,
 'n': 82.5,
 'std': 0.0549,
 'sample_size': 39087.0,
 'page_views': 4737818.0}

Net Conversion Sample Size

  • Calculated sample_size means estimated number of paid in each group
  • In order to get estimated page_views (unique cookies), needs to use sample_size/(400/5000) * 2 for both control and experiments groups
    • multuple 2 means for both groups
  • Assume 40000 page_views per day, in order to get such amount page views, it takes about 3 weeks.

In [55]:
net_conversion['sample_size'] = round(get_sample_size(get_stds(net_conversion['p_hat'],
                                                         net_conversion['d_min']), alpha, beta,
                                                         net_conversion['d_min']))
net_conversion['page_views'] = 2*round(net_conversion['sample_size']/(net_conversion['n']/5000))

net_conversion


Out[55]:
{'d_min': 0.0075,
 'p_hat': 0.109313,
 'n': 400.0,
 'std': 0.0156,
 'sample_size': 27413.0,
 'page_views': 685324.0}

Step 3 - Control Group vs. Experiment Group


In [56]:
control_df.head()


Out[56]:
Date Pageviews Clicks Enrollments Payments
0 Sat, Oct 11 7723 687 134.0 70.0
1 Sun, Oct 12 9102 779 147.0 70.0
2 Mon, Oct 13 10511 909 167.0 95.0
3 Tue, Oct 14 9871 836 156.0 105.0
4 Wed, Oct 15 10014 837 163.0 64.0

In [57]:
experiment_df.head()


Out[57]:
Date Pageviews Clicks Enrollments Payments
0 Sat, Oct 11 7716 686 105.0 34.0
1 Sun, Oct 12 9288 785 116.0 91.0
2 Mon, Oct 13 10480 884 145.0 79.0
3 Tue, Oct 14 9867 827 138.0 92.0
4 Wed, Oct 15 9793 832 140.0 94.0

Step 3.1 - Differences in Invariant Metrics (Sanity Check)

  • The goal is to verify the experiment is conducted as expected, and won't be affected by other factors. Also to make sure the data collection is correct.
  • Invariant Metrics
    • Ck = unique daily cookies (pageviews) on a page
    • Cl = unique daily clicks on the free trial button
    • CTP = Cl/Ck, free trial button click through probability
  • We need to compare the invariant metrics in both groups, to make sure the differences are not significant.

In [60]:
p=0.5
alpha=0.05

In [62]:
def get_std(p, total_size):
    std = math.sqrt(p*(1-p)/total_size)
    return std

def get_marginOferror(std, alpha):
    me = round(get_z_score(1-alpha/2)*std, 4)
    return me

Compare pageviews

  • We want to verify that the difference between pageview counts in the 2 groups are not significant.
  • When sample size n is large enough, it can be approximated as normal distribution. We want to test that pbserved p_hat = control group pageviews/both groups pageviews is not significantly different from p=0.5.
    • Margin of Error ME = Z1-α/2 * std
      • We need to calculate ME at 95% confidence interval
    • Then we will get Confidence Interval CI = [p_hat - ME, p_hat + ME]
      • If p=0.5 is within CI, then the difference between the 2 groups are expected

In [61]:
control_pageviews = control_df['Pageviews'].sum()
experiment_pageviews = experiment_df['Pageviews'].sum()

print(control_pageviews, experiment_pageviews)


345543 344660

In [67]:
total_pageviews = control_pageviews + experiment_pageviews
p_hat = control_pageviews/(total_pageviews)
std = get_std(p, total_pageviews)
me = get_marginOferror(std, alpha)

print('If ' + str(p) +' is within [' + str(round(p_hat - me, 4)) + ', ' + str(round(p_hat + me, 4)) + '], then the difference is expected.')


If 0.5 is within [0.4994, 0.5018], then the difference is expected.

Compare clicks

  • Similar to pageviews comparison above.

In [90]:
control_clicks = control_df['Clicks'].sum()
experiment_clicks = experiment_df['Clicks'].sum()

print(control_clicks, experiment_clicks)


28378 28325

In [91]:
total_clicks = control_clicks + experiment_clicks
p_hat = control_clicks/(total_clicks)
std = get_std(p, total_clicks)
me = get_marginOferror(std, alpha)

print('If ' + str(p) +' is within [' + str(round(p_hat - me, 4)) + ', ' + str(round(p_hat + me, 4)) + '], then the difference is expected.')


If 0.5 is within [0.4964, 0.5046], then the difference is expected.

Compare CTP (Click Through Probability)

  • Because CTP is a proportion in the population, so we need to use pooled standard deviation to calculate the margin of error.
    • p_pool = (experiment_clicks + control_clicks)/(experiment_pageviews + control_pageviews)
    • std_pool = sqrt(p_pool*(1-p_pool)*(1/experiment_pageviews + 1/control_pageviews))

In [92]:
control_ctp = control_clicks/control_pageviews
experiment_ctp = experiment_clicks/experiment_pageviews
p_pool = (control_clicks + experiment_clicks)/(control_pageviews + experiment_pageviews)
std_pool = math.sqrt(p_pool*(1-p_pool)*(1/experiment_pageviews + 1/control_pageviews))
me = get_marginOferror(std_pool, alpha)

diff = round(experiment_ctp - control_ctp, 4)

print('If ' + str(diff) +' is within [' + str(round(0 - me, 4)) + ', ' + str(round(0 + me, 4)) + '], then the difference is expected.')


If 0.0001 is within [-0.0013, 0.0013], then the difference is expected.

Summary for Sanity Check

  • We have checked all the 3 invariant metrics between the 2 groups, all showing the differences are not significant, so the experiments of using these 2 groups should be less likely affected by other factors.

Step 3.2 - Differences in Evaluation Metrics

  • Similar to the sanity check above, here is to check the differences in evaluation metrics between the 2 groups,to see:

    • Whether the differences are statistically significant.
    • Whether the differences are practically significant.
      • So that the changes are big enough to be beneficial to the business.
      • Difference is not included in the confidence interval [Dmin - ME, Dmin + ME]
  • As Step 2 has found, Gross Conversion and Net Conversion can be the evaluation metrics, while Retention is not. All because of the limitation in data collection in reality.

  • Evaluation Metrics

    • GConversion = enrolled/Cl
      • It's gross conversion
      • Dmin = 0.01
      • Probability of daily enrolled among daily clicked free trial button
    • NConversion = paid/Cl
      • It's net conversion
      • Dmin = 0.01
      • Probability of daily paid among daily clicked free trial button

In [81]:
print(control_df.isnull().sum())
print()
print(experiment_df.isnull().sum())


Date            0
Pageviews       0
Clicks          0
Enrollments    14
Payments       14
dtype: int64

Date            0
Pageviews       0
Clicks          0
Enrollments    14
Payments       14
dtype: int64

Compare Gross Conversion

  • The method here is almost the same as what's used in "Compare CTP" above.

  • Observation

    • As we can see in the result, the change in experiment group is both statistically and pratically significant.
    • From control group to experiment group, there is 2.06% decrease in enrollment, and it's significant. So less people will enroll when the website is showing free trail option, comparing to asscess to materials option.

In [93]:
control_clicks = control_df['Clicks'].loc[control_df['Enrollments'].notnull()].sum()
experiment_clicks = experiment_df['Clicks'].loc[experiment_df['Enrollments'].notnull()].sum()
print('Clicks', control_clicks, experiment_clicks)

control_enrolls = control_df['Enrollments'].sum()
experiment_enrolls = experiment_df['Enrollments'].sum()
print('Enrollments', control_enrolls, experiment_enrolls)

control_GC = control_enrolls/control_clicks
experiment_GC = experiment_enrolls/experiment_clicks
print('Gross Conversion', control_GC, experiment_GC)


Clicks 17293 17260
Enrollments 3785.0 3423.0
Gross Conversion 0.2188746891805933 0.19831981460023174

In [94]:
p_pool = (control_enrolls + experiment_enrolls)/(control_clicks + experiment_clicks)
std_pool = math.sqrt(p_pool*(1-p_pool)*(1/control_clicks + 1/experiment_clicks))
me = get_marginOferror(std_pool, alpha)

print(p_pool, std_pool, me)


0.20860706740369866 0.004371675385225936 0.0086

In [97]:
# Statistical significance
GC_diff = round(experiment_GC - control_GC, 4)

print('If ' + str(GC_diff) +' is within [' + str(round(0 - me, 4)) + ', ' + str(round(0 + me, 4)) + '], then the difference is expected, and the change is not significant.')


If -0.0206 is within [-0.0086, 0.0086], then the difference is expected, and the change is not significant.

In [100]:
# Practically significance
d_min = gross_conversion['d_min']

print('If ' + str(GC_diff) +' is within [' + str(round(d_min - me, 4)) + ', ' + str(round(d_min + me, 4)) + '], then the change is not practically significant.')


If -0.0206 is within [0.0014, 0.0186], then the change is not practically significant.

Compare Net Conversion

  • Observation
    • As we can see, it's not statistically significant but it's practically significant.
    • From control group to experiment group, there is 0.49% drop. Practically it's significant means there will be less payment for free trail option, comparing with access to materials option. This drop will affect the business.

In [105]:
control_clicks = control_df['Clicks'].loc[control_df['Payments'].notnull()].sum()
experiment_clicks = experiment_df['Clicks'].loc[experiment_df['Payments'].notnull()].sum()
print('Clicks', control_clicks, experiment_clicks)

control_paid = control_df['Payments'].sum()
experiment_paid = experiment_df['Payments'].sum()
print('Payments', control_paid, experiment_paid)

control_NC = control_paid/control_clicks
experiment_NC = experiment_paid/experiment_clicks
print('Net Conversion', control_NC, experiment_NC)


Clicks 17293 17260
Payments 2033.0 1945.0
Net Conversion 0.11756201931417337 0.1126882966396292

In [106]:
p_pool = (control_paid + experiment_paid)/(control_clicks + experiment_clicks)
std_pool = math.sqrt(p_pool*(1-p_pool)*(1/control_clicks + 1/experiment_clicks))
me = get_marginOferror(std_pool, alpha)

print(p_pool, std_pool, me)


0.1151274853124186 0.0034341335129324238 0.0067

In [107]:
# Statistical significance
NC_diff = round(experiment_NC - control_NC, 4)

print('If ' + str(NC_diff) +' is within [' + str(round(0 - me, 4)) + ', ' + str(round(0 + me, 4)) + '], then the difference is expected, and the change is not significant.')


If -0.0049 is within [-0.0067, 0.0067], then the difference is expected, and the change is not significant.

In [108]:
# Practically significance
d_min = net_conversion['d_min']

print('If ' + str(NC_diff) +' is within [' + str(round(d_min - me, 4)) + ', ' + str(round(d_min + me, 4)) + '], then the change is not practically significant.')


If -0.0049 is within [0.0008, 0.0142], then the change is not practically significant.
  • The purpose of this test is to check whether the decrease/increase trend is evident in daily data.
  • prob(success) = (n!/(x! * (n-x)!)) * pow(p, x) * pow(1-p, n-x)
    • x - number of "success", success means when experiment group increased from control group in the record
    • n - total records
    • p - the probability of being success, this is binomial distribution, so p=0.5
  • p-value is the sum of prob(success) from each success record. When p-value is smaller than alpha, then the success is significant.
  • This is the online calculator to get p-value: https://www.graphpad.com/quickcalcs/binomial1/
    • Given x, n, p
    • It provides both single & double sided p-value
  • Observation
    • Same as what found in Step 3.2, the changes in Gross Conversion is significant, while in Net Conversion is not statistically significant.
    • One thing to note here is, no matter to check prob(success) or prob(failure) here, the significance results are the same, even though p_value can be different.

In [110]:
control_experiment_df = control_df.join(experiment_df, lsuffix='_control', rsuffix='_experiment')
print(control_experiment_df.shape)
control_experiment_df.head()


(37, 10)
Out[110]:
Date_control Pageviews_control Clicks_control Enrollments_control Payments_control Date_experiment Pageviews_experiment Clicks_experiment Enrollments_experiment Payments_experiment
0 Sat, Oct 11 7723 687 134.0 70.0 Sat, Oct 11 7716 686 105.0 34.0
1 Sun, Oct 12 9102 779 147.0 70.0 Sun, Oct 12 9288 785 116.0 91.0
2 Mon, Oct 13 10511 909 167.0 95.0 Mon, Oct 13 10480 884 145.0 79.0
3 Tue, Oct 14 9871 836 156.0 105.0 Tue, Oct 14 9867 827 138.0 92.0
4 Wed, Oct 15 10014 837 163.0 64.0 Wed, Oct 15 9793 832 140.0 94.0

In [111]:
control_experiment_df.isnull().sum()


Out[111]:
Date_control               0
Pageviews_control          0
Clicks_control             0
Enrollments_control       14
Payments_control          14
Date_experiment            0
Pageviews_experiment       0
Clicks_experiment          0
Enrollments_experiment    14
Payments_experiment       14
dtype: int64

In [113]:
control_experiment_df.dropna(inplace=True)
print(control_experiment_df.shape)
control_experiment_df.isnull().sum()


(23, 10)
Out[113]:
Date_control              0
Pageviews_control         0
Clicks_control            0
Enrollments_control       0
Payments_control          0
Date_experiment           0
Pageviews_experiment      0
Clicks_experiment         0
Enrollments_experiment    0
Payments_experiment       0
dtype: int64

In [122]:
# If it's "success", assign 1, otherwise 0

control_experiment_df['GC_increase'] = np.where(
    control_experiment_df['Enrollments_experiment']/control_experiment_df['Clicks_experiment'] \
    > control_experiment_df['Enrollments_control']/control_experiment_df['Clicks_control'], 1, 0)

control_experiment_df['NC_increase'] = np.where(
    control_experiment_df['Payments_experiment']/control_experiment_df['Clicks_experiment'] \
    > control_experiment_df['Payments_control']/control_experiment_df['Clicks_control'], 1, 0)

control_experiment_df[['GC_increase', 'NC_increase']].head()


Out[122]:
GC_increase NC_increase
0 0 0
1 0 1
2 0 0
3 0 0
4 0 1

In [126]:
print(control_experiment_df['GC_increase'].value_counts())
print(control_experiment_df['NC_increase'].value_counts())


0    19
1     4
Name: GC_increase, dtype: int64
0    13
1    10
Name: NC_increase, dtype: int64

In [143]:
GC_success_ct = control_experiment_df['GC_increase'].value_counts()[1]
NC_success_ct = control_experiment_df['NC_increase'].value_counts()[1]

print(GC_success_ct, NC_success_ct)


4 10

In [144]:
p = 0.5
alpha = 0.05
n = control_experiment_df.shape[0]

print(n)


23

In [152]:
def get_probability(x, n):
    prob = round(math.factorial(n)/(math.factorial(x)*math.factorial(n-x))*pow(p,x)*pow(1-p, n-x), 4)
    return prob

def get_p_value(x, n):
    p_value = 0
    
    for i in range(0, x+1):
        p_value += get_probability(i, n)
        
    return round(p_value*2, 4)  # 2 side p_value

In [153]:
print ("GC Change is significant if", get_p_value(GC_success_ct,n), "is smaller than", alpha)
print ("NC Change is significant if", get_p_value(NC_success_ct,n), "is smaller than", alpha)


GC Change is significant if 0.0026 is smaller than 0.05
NC Change is significant if 0.6774 is smaller than 0.05

Reference