Let $Z_1, Z_2, ..., Z_n$ be $iid~\mathcal{N}(0,1)$
Let $Q=Z_1^2 + Z_2^2 + ...+Z_n^2$
This $Q$ has the $\chi^2_n$ distribution (chi squared distribution with $n$ degrees of freedom)
Short Review on Exponential Random Variable
Not: $Z_1^2 + Z_2^2$ is exponential with
Consequently, $\chi^2_2 \sim \displaystyle Exp(\frac{1}{2})$; i.e. is $\displaystyle \Gamma(\alpha=1, \theta=\frac{1}{\lambda}=2)$
Therefore, also we know that adding independent Gammas together with same $\theta$s yieldx more Gammas.
So, if $n$ is even, $Q = Z_1^2 + Z_2^2 + ... + Z_{n-1}^2 + Z_n^2$
What about a non-central $\chi^2$?
Now consider a new $Q$: $$Q = \left(Z_1 + \sqrt{\delta}\right)^2 + Z_2^2 + Z_3^2 + ... + Z_n^2$$ when $\delta$ is a fixed constant.
The distribution of $Q$ is called non-central chi-squared with $n$ degrees of freedom and non-centrality parameter $\delta$.
NOte:
Let $X_i~iid~ \mathcal{N}(\mu_i, \sigma^2)$ for $i=1,..,n$ ($sigma^2$ are all the same, but $\mu$s can be different)
Then, $Q = \displaystyle \sum_{i=1}^n \frac{X_i^2}{\sigma^2} \sim chi^2(\delta)$ where $\delta = \displaystyle \frac{1}{\sigma^2}\sum_{i=1}^n \mu_i^2$
Moreover, (now assume all $\mu_i$s are the same) with the empirical (sample) mean $\bar{X} = \displaystyle \frac{\sum_i X_i}{n}$ and the empirical (sample) variance $S^2 = \displaystyle \frac{\sum_i^n(X_i - \bar{X})^2}{n-1}$
Good distribution when looking to:
Let $Z \sim \mathcal{N}(0,1)$
Let $V \sim \displaystyle \chi^2_\nu$ where $\nu$ is the numbder of degrees of freedom
$Z\&V$ are independent
Let $\theta$ be a fixed constant.
Let $\displaystyle T = \frac{Z + \theta}{\sqrt{V/\nu}}$
The distribution of $T$ is called Student's t-distribution with $\nu$ degrees of freedom, and non-centralit pararmeter $\theta$
This definition is relevant for the following situation. $X_1,X_2,...,X_n$ are $iid ~\mathcal{N}(\mu, \sigma^2)$ where $\mu$ is known but $\sigma^2$ is not.
Then, the operation by which one divides by $\sqrt{S^2}$ instead of $\sqrt{\sigma^2}$ is called studentization rather than standardization.
Example: with the above fata, let $\bar{X}$ & $S^2$ be the usual sample mean and sample variance. Then, we saw
$$\displaystyle \frac{\bar{X} - \mu}{\sqrt{\sigma^2/n}} \sim \mathcal{N}(0,1)$$ $$\displaystyle \frac{S^2(n-1)}{\sigma^2} \sim \chi^2_{n-1}$$This random variable $T$ is by definition a student's t random variable with $\nu = n-1$ and central.
This $T$ is useful to build confidence intervals and test hypotheses.
In previous example in [Ex. 9.4.1], We find a $95\%$ confidence interval for the mean $\mu$ of our data even though $\sigma^2$ is not known.
In example [Ex. 9.4.2] we have two separate independent normal datasets, with the same variance $\sigma^2$ for everyone. We do not need to know $\sigma^2$. We want a hypothesis test for whether the means of the two datasets are the same. One additional detail: we must describe a "pooled" estimator for $\sigma^2$ from both experiments.
Setup: two sets of normal data with $n_1$ and $n_2$ are number of data points respectively
Let $S_1^2 ~\& ~S_2^2$ be the sample variances.
Let $\sigma_1^2 ~\&~\sigma_2^2$ be the unknown variance.
Therefore, with $\displaystyle V_i = \frac{S_i^2 (n_i-1)}{\sigma_i^2}$ for $i=1,2$
If $V_1 ~\&~ V_2$ are respectively $\chi^2_{\nu_1} ~\&~ \chi^2_{\nu_2}$, then $F = \displaystyle \frac{V_1/\nu_1}{V_2/\nu_2}$ has a distribution which is called "F-distribution with d.f.'s $\nu_1~\&~\nu_2$".
Notation: $F(\nu_1, \nu_2)$
This $F$ can be helpful for estimating the ratio of the unknown variances.
For example, a hypothesis test might be whther the two variances $\sigma^2_1$ and $\sigma^2_2$ are the same or not. Therefore, we should the ratio $\displaystyle \frac{\sigma_2^2}{\sigma_1^2}$ is the quantity of interest. Let's build a confidence interval for it.
Consider the statistic:
$$R = \frac{V_1/(n_1-1)}{V_2 / (n_2-1)} = \frac{S_1^2 / \sigma_1^2}{S_2^2 / \sigma_2^2} = \frac{S_1^2}{S_2^2} \times \frac{\sigma_2^2}{\sigma_1^2}$$For $\alpha$ small (for instance $\alpha=0.05$), we write $F_{\alpha/2}$ and $F_{1-\alpha/2}$ as the corresponding quantiles for the $F(n_1-1, n_2-1)$ distribution.
We want:
$$1-\alpha = \displaystyle \mathbf{P}\left[F_{\frac{\alpha}{2}} \le R \le F_{1 - \frac{\alpha}{2}}\right] \\ = \mathbf{P}\left[F_{\frac{\alpha}{2}} ~~ \le ~~ \frac{S_1^2}{S_2^2} \times \frac{\sigma_2^2}{\sigma_1^2} ~~ \le ~~ F_{1 - \frac{\alpha}{2}}\right] \\ = \mathbf{P}\left[\frac{S_1^2}{S_2^2} \times F_{\frac{\alpha}{2}} ~~ \le~~ \frac{\sigma_2^2}{\sigma_1^2} ~~ \le~~ \frac{S_1^2}{S_2^2} \times F_{1 - \frac{\alpha}{2}}\right]$$These two numbers (left and right bounds given above) are the left and right ends of an $(1-\alpha)100\%$ confidence interval for $\displaystyle \frac{\sigma_2^2}{\sigma_1^2}$
In [ ]: