Workflow for statistics
Case 1: Suppose that you want to test differences in one variable between groups
- Step0: iid:
- The observations are not correlated (for instance they are not different years)
- Step1: Normality:
- Make sure this variable is normally distributed (qqplot + histogram)
- If it's lognormally distributed tranform it with the logarithm
- If it follow any other distribution you need some other transformation
- Step2: Create your groups.
- Step3: Equal variance:
- Levene test -> but a bit rigurous
- As a rule of thumb the variance of the group with highest variance shouldn't be greater than 4 times the variance of the group with lowest variance.
- Step4: Do ANOVA/Kruskal-Wallis (with two groups do t-test or MWU)
- Step5: Do Tukey test if normality distributed and equal variance. Otherwise you can compare each pair of groups with MWU test (and the level of significance should be 0.05/number of comparisons)
Case 2: Suppose that you want to find relationships between variables
- Step0:
- Do a correlation plot and a scatter matrix to understand how your variables correlate to each other.
- Step1:
- Step2:
- Check assumptions and modify your variables as needed. The two most common things that can help:
- Transform your variables so all of them have similar distributions
- Combine or drop some variables (see dimensionality reduction notebook)