Hypothesis Test

  • $\chi^2$ test
  • OR test
  • LD test
  • Bhattacharyya distance
  • Fisher product test
  • Multiple Testing Problem

Hypothesis

  • Choose sub-sample set randomly
  • Test hypothesis $H_0$

$\chi^2$ Test

For ${X_1,X_2,...,X_n}$ are independent random variables with the normal distribution $X_i \sim N(0,1)$, $$ Y = \sum_{i=1}^{n} X_i^2 \sim \chi^2(n) $$

For a allel sample $$ \sum_{i=1}^{M} \sum_{j=1}^{N} \frac {(O_{ij}-E_{ij})^2} {E_{ij}} \sim \chi^2_p((M-1)(N-1))$$ where $p=1-\alpha$, $M$ means type, $N$ means the individual amount of specific type。 $p$ will determine the level of relationship between specific type and final result.

Bonferroni correction

OR Test

For allel, there are two types ${A,T}$, we have ill samples $n_A$ with $A$, $n_T$ with $T$, healthy samples $m_A$ with $A$, $m_T$ with $T$, we have OR: $$ OR = \frac {n_A} {m_A} / \frac {n_T} {m_T} = \frac {n_A m_T} {n_T m_A} $$ $ OR > 1$ means $A$ make bigger influence, while $OR < 1$ means $A$ make less influence.

$$ OR' = max(OR,\frac{1}{OR}) $$

means the level of relationship among allel and phenotype

Bhattacharyya distance

$$ B = \frac{1}{4} \frac{(\mu_1-\mu_2)^2}{\sigma_1^2+\sigma_2^2}+\frac{1}{2} ln\left(\frac {\sigma_1^2+\sigma_2^2}{2\sigma_1\sigma_2}\right)^2 $$

Linkage disequilibrium (LD)

If $A$ and $B$ are independent random variables with frequency $f(A)$ and $f(B)$, we will have: $$ f(A,B) = f(A)f(B) $$

If $A$ and $B$ are linked, so we will have: $$ f(A,B) = f(A)f(B) + LD $$ meanwhile, $f(A|B) \ne f(A)$.

The relevant coefficient: $$ r^2 = \frac {LD^2} {f(A)f(a)f(B)f(b)} $$ Strong LD when $r^2$ is high

Fisher's Product Test

The $p$ value of all SNPs of gene, ${p_i}$, we will have: $$ X = -2 ln\left(\prod_{i=1}^{n} p_i\right) = -2 \sum_{i=1}^{n} ln(p_i) \sim \chi^2(2n) $$


In [ ]: