Hypothesis Testing

(We are covering, what is referred to as, frequentist method of Hypothesis testing)

We would like to know if the effects we see in the sample(observed data) are likely to occur in the population.

The way classical hypothesis testing works is by conducting a statistical test to answer the following question:

Given the sample and an effect, what is the probability of seeing that effect just by chance?

Here are the steps on how we would do this

Compute test statistic
Define null hypothesis
Compute p-value
Interpret the result

If p-value is very low(most often than now, below 0.05), the effect is considered statistically significant. That means that effect is unlikely to have occured by chance. The inference? The effect is likely to be seen in the population too.

This process is very similar to the proof by contradiction paradigm. We first assume that the effect is false. That's the null hypothesis. Next step is to compute the probability of obtaining that effect (the p-value). If p-value is very low(<0.05 as a rule of thumb), we reject the null hypothesis.



In [1]:

    
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib as mpl
%matplotlib inline



In [2]:

    
import seaborn as sns
sns.set(color_codes=True)



In [26]:

    
cars = pd.read_csv("cars_v1.csv", encoding="ISO-8859-1")



In [27]:

    
mileage_is_null = cars.Mileage.isnull()



In [ ]:

    
mileage_is_null



In [29]:

    
cars = cars.ix[~mileage_is_null]



In [30]:

    
cars.head()









    Out[30]:






  
    
      
      Make
      Model
      Price
      Type
      ABS
      BootSpace
      GearType
      AirBag
      Engine
      FuelCapacity
      Mileage
    
  
  
    
      0
      Ashok Leyland Stile
      Ashok Leyland Stile LE 8-STR (Diesel)
      750
      MPV
      No
      500.0
      Manual
      No
      1461.0
      50.0
      20.7
    
    
      1
      Ashok Leyland Stile
      Ashok Leyland Stile LS 8-STR (Diesel)
      800
      MPV
      No
      500.0
      Manual
      No
      1461.0
      50.0
      20.7
    
    
      2
      Ashok Leyland Stile
      Ashok Leyland Stile LX 8-STR (Diesel)
      830
      MPV
      No
      500.0
      Manual
      No
      1461.0
      50.0
      20.7
    
    
      3
      Ashok Leyland Stile
      Ashok Leyland Stile LS 7-STR (Diesel)
      850
      MPV
      No
      500.0
      Manual
      No
      1461.0
      50.0
      20.7
    
    
      4
      Ashok Leyland Stile
      Ashok Leyland Stile LS 7-STR Alloy (Diesel)
      880
      MPV
      No
      500.0
      Manual
      No
      1461.0
      50.0
      20.7

Is the average mileage of Automatic cars significantly different from the average mileage of Manual cars ?



In [31]:

    
cars_Automatic = cars[cars.GearType==' Automatic'].copy().reset_index()
cars_Manual = cars[cars.GearType==' Manual'].copy().reset_index()



In [32]:

    
print("Average Mileage of Automatic cars:", cars_Automatic.Mileage.mean())
print("Average Mileage of Manual cars:", cars_Manual.Mileage.mean())









    



Average Mileage of Automatic cars: 15.745466237942114
Average Mileage of Manual cars: 19.112636363636376



In [33]:

    
#Mean and standard deviation of Manual Cars
print("Mean of Manual Cars:", cars_Manual.Mileage.mean())
print("Standard Deviation of Manual Cars:", cars_Manual.Mileage.std())









    



Mean of Manual Cars: 19.112636363636376
Standard Deviation of Manual Cars: 3.7554266333765174

Exercise



In [10]:

    
#Mean and standard deviation of Automatic Cars



In [ ]:



In [34]:

    
#Confidence interval on the mean of manual cars
stats.norm.interval(0.95, loc=cars_Manual.Mileage.mean(), scale = cars_Manual.Mileage.std()/np.sqrt(len(cars_Manual)))









    Out[34]:





(18.70745412179004, 19.517818605482713)

Exercise



In [12]:

    
#Confidence interval on the mean of automatic cars









    Out[12]:





(15.258436053916315, 16.232496421967912)



In [ ]:

Effect Size



In [35]:

    
print("Effect size:", cars_Manual.Mileage.mean() - cars_Automatic.Mileage.mean())









    



Effect size: 3.367170125694262

Null Hypothesis: Mean prices aren't significantly different

Perform t-test and determine the p-value.



In [36]:

    
stats.ttest_ind(cars_Manual.Mileage, cars_Automatic.Mileage, equal_var=True)









    Out[36]:





Ttest_indResult(statistic=9.9313548109322571, pvalue=1.0337306243949073e-21)

p-value is the probability that the effective size was by chance. And here, p-value is almost 0.

Conclusion: The mileage difference is significant.

Assumption of t-test

One assumption is that the data used came from a normal distribution.
There's a Shapiro-Wilk test to test for normality. If p-value is less than 0.05, then there's a low chance that the distribution is normal.



In [37]:

    
stats.shapiro(cars_Manual.Mileage)









    Out[37]:





(0.9872176647186279, 0.005204247776418924)



In [38]:

    
stats.shapiro(cars_Automatic.Mileage)









    Out[38]:





(0.9908203482627869, 0.04923483356833458)

Errrrrr.....

A/B testing

Comparing two versions to check which one performs better. Eg: Show to people two variants for the same webpage that they want to see and find which one provides better conversion rate (or the relevant metric). wiki

Something to think about: Which of these give smaller p-values ?

Smaller effect size
Smaller standard error
Smaller sample size
Higher variance

Answer:

Chi-square tests

Chi-Square tests are used when the data are frequencies, rather than numerical score/price.

The following two tests make use of chi-square statistic

chi-square test for goodness of fit
chi-square test for independence

Chi-square test is a non-parametric test. They do not require assumptions about population parameters and they do not test hypotheses about population parameters.

Chi-Square test for goodness fit

$$ \chi^2 = \sum (O - E)^2/E $$

O is observed frequency
E is expected frequency
$ \chi $ is the chi-square statistic



In [ ]:

    
stats.chisquare(Observed, Expected)



In [ ]:

	Make	Model	Price	Type	ABS	BootSpace	GearType	AirBag	Engine	FuelCapacity	Mileage
0	Ashok Leyland Stile	Ashok Leyland Stile LE 8-STR (Diesel)	750	MPV	No	500.0	Manual	No	1461.0	50.0	20.7
1	Ashok Leyland Stile	Ashok Leyland Stile LS 8-STR (Diesel)	800	MPV	No	500.0	Manual	No	1461.0	50.0	20.7
2	Ashok Leyland Stile	Ashok Leyland Stile LX 8-STR (Diesel)	830	MPV	No	500.0	Manual	No	1461.0	50.0	20.7
3	Ashok Leyland Stile	Ashok Leyland Stile LS 7-STR (Diesel)	850	MPV	No	500.0	Manual	No	1461.0	50.0	20.7
4	Ashok Leyland Stile	Ashok Leyland Stile LS 7-STR Alloy (Diesel)	880	MPV	No	500.0	Manual	No	1461.0	50.0	20.7