Terminology

  1. Null Hypothesis
  2. Alternate Hypothesis
  3. p-value (Probability of observing the metric from the data at least as extreme as computed just by chance)
  4. Bootstrap
  5. Acceptance Region
  6. Rejection Region
  7. t-test
  8. One-tailed test
  9. Two-tailed test
  10. Significance test
  11. Confidence interval
  12. Power of a test
  13. type 1 error (Rejecting null hypothesis when it is true). Also called false positive.
  14. type 2 error (Failing to reject null hypothesis when it is false). Also called false negative

Some Practical thoughts

  1. Data could be biased. Confidence intervals may then not be representative.
  2. One way to handle biased data is to use bias-corrected-confidence-intervals.
  3. Outliers can impact confidence intervals.
  4. Too often, people remove outliers. But they might be encoding some necessary information.
  5. One way to handle outliers is to use ranking, instead of actual numbers.
  6. If sample size is small, bootstrapping underestimates the size of confidence interval.
  7. Better to use significance testing if sample size is small.
  8. Bootstrapping should not be used find maximum value (Eg: maximum sales of shoes, 5th largest sales of shoes, etc)
  9. Use rank transformation when using bootstrapping, if the data has outliers
  10. Lack of representativeness is a problem for any statistical technique
  11. The experiment should be random. (Eg: When doing A/B testing, randomize the subjects). Experimental bias can lead to wrong inferences.
  12. Resampling time series data is tricky. The assumption we used - that each data point is independent, doesn't hold good for time series data.
  13. Rank transformation changes the question. For our shoe sales example, a rank transformed analysis would be: "Do sales tend to be higher after price optimization?". (Our analysis was: "Does post-price-optimization sales have a higher mean sales?")
  14. Power of a test increases if sample size increases

Types of Error

  1. Sampling Bias
  2. Measurement Error
  3. Random Error