$\chi^2$ Distribution

Let $Z_1, Z_2, ..., Z_n$ be $iid~\mathcal{N}(0,1)$

Let $Q=Z_1^2 + Z_2^2 + ...+Z_n^2$

This $Q$ has the $\chi^2_n$ distribution (chi squared distribution with $n$ degrees of freedom)


Short Review on Exponential Random Variable

  • Not: $Z_1^2 + Z_2^2$ is exponential with

    • $\lambda = \displaystyle \frac{1}{2}$ because $\mathbf{E}[Z_1^2 + Z_2^2] = 2$

  • Consequently, $\chi^2_2 \sim \displaystyle Exp(\frac{1}{2})$; i.e. is $\displaystyle \Gamma(\alpha=1, \theta=\frac{1}{\lambda}=2)$

  • Therefore, also we know that adding independent Gammas together with same $\theta$s yieldx more Gammas.

  • So, if $n$ is even, $Q = Z_1^2 + Z_2^2 + ... + Z_{n-1}^2 + Z_n^2$

    • $Z_1^2 + Z_2^2 \sim \Gamma(1,2)$
    • $Z_3^2 + Z_4^2 \sim \Gamma(1,2)$
    • $....$
    • $Z_{n-1}^2 + Z_n^2 \sim \Gamma(1,2)$
      $= \text{sum of }\frac{n}{2} \text{independent }\Gamma(1,2)\text{'s}$ Therefore, $\chi_n^2 = \displaystyle \Gamma(\frac{n}{2}, 2)$
      (We just proved this when $n$ is even. Proving it for $n$ odd is requires working with MGFs)
  • What about a non-central $\chi^2$?

    Now consider a new $Q$: $$Q = \left(Z_1 + \sqrt{\delta}\right)^2 + Z_2^2 + Z_3^2 + ... + Z_n^2$$ when $\delta$ is a fixed constant.

    The distribution of $Q$ is called non-central chi-squared with $n$ degrees of freedom and non-centrality parameter $\delta$.

    NOte:

    • When $\delta=0 \longrightarrow \mathbf{E}[Q] = n$
    • When $\delta>0 \longrightarrow \mathbf{E}[Q] = n+\delta$ (proof: just expand the first square $\mathbf{E}[(Z_1+\sqrt{\delta})^2)] = \mathbf{E}[Z_1^2 + 2Z_1\sqrt{\delta} + \delta] = 1 + \delta$)

Theorem

Let $X_i~iid~ \mathcal{N}(\mu_i, \sigma^2)$ for $i=1,..,n$ ($sigma^2$ are all the same, but $\mu$s can be different)

Then, $Q = \displaystyle \sum_{i=1}^n \frac{X_i^2}{\sigma^2} \sim chi^2(\delta)$ where $\delta = \displaystyle \frac{1}{\sigma^2}\sum_{i=1}^n \mu_i^2$

  • Moreover, (now assume all $\mu_i$s are the same) with the empirical (sample) mean $\bar{X} = \displaystyle \frac{\sum_i X_i}{n}$ and the empirical (sample) variance $S^2 = \displaystyle \frac{\sum_i^n(X_i - \bar{X})^2}{n-1}$

    • Then, $\bar{X}\& S^2$ are independent. Moreover:
      • $\displaystyle \bar{X} \sim \mathcal{N}(\mu, \frac{\sigma^2}{n})$
      • $\displaystyle \frac{S^2(n-1)}{\sigma^2} \sim \chi^2_{n-1}$

Student's t-Distribution

By William Sealey Gosset

Good distribution when looking to:

  • Reject a $mean=0$ null hypothesis
  • Reject that two means from two separate experiments might be the same

Definition

Let $Z \sim \mathcal{N}(0,1)$

Let $V \sim \displaystyle \chi^2_\nu$ where $\nu$ is the numbder of degrees of freedom

$Z\&V$ are independent

Let $\theta$ be a fixed constant.

Let $\displaystyle T = \frac{Z + \theta}{\sqrt{V/\nu}}$

The distribution of $T$ is called Student's t-distribution with $\nu$ degrees of freedom, and non-centralit pararmeter $\theta$


This definition is relevant for the following situation. $X_1,X_2,...,X_n$ are $iid ~\mathcal{N}(\mu, \sigma^2)$ where $\mu$ is known but $\sigma^2$ is not.

Then, the operation by which one divides by $\sqrt{S^2}$ instead of $\sqrt{\sigma^2}$ is called studentization rather than standardization.

Example: with the above fata, let $\bar{X}$ & $S^2$ be the usual sample mean and sample variance. Then, we saw

$$\displaystyle \frac{\bar{X} - \mu}{\sqrt{\sigma^2/n}} \sim \mathcal{N}(0,1)$$

$$\displaystyle \frac{S^2(n-1)}{\sigma^2} \sim \chi^2_{n-1}$$

$$T = \frac{\displaystyle \frac{\bar{X} - \mu}{\sqrt{\sigma^2/n}}}{\displaystyle \sqrt{\frac{S^2 (n-1)}{\sigma^2 (n-1)}}} = \frac{\bar{X} - \mu}{\displaystyle \sqrt{S^2/n}}$$

This random variable $T$ is by definition a student's t random variable with $\nu = n-1$ and central.

This $T$ is useful to build confidence intervals and test hypotheses.

In previous example in [Ex. 9.4.1], We find a $95\%$ confidence interval for the mean $\mu$ of our data even though $\sigma^2$ is not known.

In example [Ex. 9.4.2] we have two separate independent normal datasets, with the same variance $\sigma^2$ for everyone. We do not need to know $\sigma^2$. We want a hypothesis test for whether the means of the two datasets are the same. One additional detail: we must describe a "pooled" estimator for $\sigma^2$ from both experiments.

F-Distribution

F stands for Fisher.

Setup: two sets of normal data with $n_1$ and $n_2$ are number of data points respectively

Let $S_1^2 ~\& ~S_2^2$ be the sample variances.

Let $\sigma_1^2 ~\&~\sigma_2^2$ be the unknown variance.

Therefore, with $\displaystyle V_i = \frac{S_i^2 (n_i-1)}{\sigma_i^2}$ for $i=1,2$

  • $V_i \sim chi^2_{n-1}$
Definition

If $V_1 ~\&~ V_2$ are respectively $\chi^2_{\nu_1} ~\&~ \chi^2_{\nu_2}$, then $F = \displaystyle \frac{V_1/\nu_1}{V_2/\nu_2}$ has a distribution which is called "F-distribution with d.f.'s $\nu_1~\&~\nu_2$".

  • Notation: $F(\nu_1, \nu_2)$

  • This $F$ can be helpful for estimating the ratio of the unknown variances.

For example, a hypothesis test might be whther the two variances $\sigma^2_1$ and $\sigma^2_2$ are the same or not. Therefore, we should the ratio $\displaystyle \frac{\sigma_2^2}{\sigma_1^2}$ is the quantity of interest. Let's build a confidence interval for it.

Consider the statistic:

$$R = \frac{V_1/(n_1-1)}{V_2 / (n_2-1)} = \frac{S_1^2 / \sigma_1^2}{S_2^2 / \sigma_2^2} = \frac{S_1^2}{S_2^2} \times \frac{\sigma_2^2}{\sigma_1^2}$$

For $\alpha$ small (for instance $\alpha=0.05$), we write $F_{\alpha/2}$ and $F_{1-\alpha/2}$ as the corresponding quantiles for the $F(n_1-1, n_2-1)$ distribution.

We want:

$$1-\alpha = \displaystyle \mathbf{P}\left[F_{\frac{\alpha}{2}} \le R \le F_{1 - \frac{\alpha}{2}}\right] \\ = \mathbf{P}\left[F_{\frac{\alpha}{2}} ~~ \le ~~ \frac{S_1^2}{S_2^2} \times \frac{\sigma_2^2}{\sigma_1^2} ~~ \le ~~ F_{1 - \frac{\alpha}{2}}\right] \\ = \mathbf{P}\left[\frac{S_1^2}{S_2^2} \times F_{\frac{\alpha}{2}} ~~ \le~~ \frac{\sigma_2^2}{\sigma_1^2} ~~ \le~~ \frac{S_1^2}{S_2^2} \times F_{1 - \frac{\alpha}{2}}\right]$$

These two numbers (left and right bounds given above) are the left and right ends of an $(1-\alpha)100\%$ confidence interval for $\displaystyle \frac{\sigma_2^2}{\sigma_1^2}$


In [ ]: