Definition: Let $X=(X_1,...,X_n)$ be some data (IID). Let $\theta$ be some parameter in the common law of this data. Let $\hat{\theta}(x)$ be an estimator.
We say $\hat{\theta}(x)$ is sonsistent for $\theta$ if $\hat{\theta}(x) \to \theta \text{ as }n\to \infty$ in probability.
Recall: convergence in probability (for a sequence $\hat{\theta}_n(x)$)
$$\mathbf{P}[|\hat{\theta}_n(x) - \theta|> \epsilon] \to 0$$Typically, to prove consistency, one appeals to the so called ("Markov" also known as Chebyshev) inequality: $\displaystyle \mathbf{P}[|\hat{\theta}_n(x) - \theta|>\epsilon] \le \frac{\mathbf{E}[|\hat{\theta}_n(x) - \theta|]}{\epsilon}$
or $\displaystyle \mathbf{P}[|\hat{\theta}_n(x) - \theta|^p>\epsilon^p] \le \frac{\mathbf{E}[|\hat{\theta}_n(x) - \theta|^p]}{\epsilon^p}$ where $p$ is a power typically $\ge 1$
$X$ is a iid sample of size n from $Uniform(0,\theta)$. Recall, $M = \max(X_1,...,X_n)$.
We saw $\mathbf{E}[M] = -\frac{\theta}{n+1} + \theta$, and $\mathbf{Var}[M] = \frac{\theta^2 n}{(n+1)^2 (n+2)}\sim \frac{\theta^2}{n^2}$
Let $p=2$ in the previous strategy.
$$\mathbf{E}\left[|M-\theta|^2\right] = \mathbf{E}\left[|M - \mathbf{E}[M] +\mathbf{E}[M] -\theta|^2\right]$$using triangle inequality
$$\sqrt{\mathbf{E}\left[|M-\theta|^2\right]} \le \sqrt{\mathbf{E}\left[|M - \mathbf{E}[M]|^2\right]} + \mathbf{E}\left[ |-\frac{\theta}{n+1}|^2\right]\\ = \sqrt{\mathbf{Var}[M]} + \frac{\theta}{n+1} \\ \sim \sqrt{\frac{\theta^2}{n^2}} + \frac{\theta}{n+1} = \frac{\theta}{n} + \frac{\theta}{n+1} \sim \frac{2\theta}{n} \to 0$$This proves $\mathbf{P}[|M-\theta| > \epsilon] \le \displaystyle \frac{\frac{2\theta}{n}}{\epsilon}$. Therefore, $M$ is consistent for $\theta$.
Another way of proof:
In some cases, calculations are more explicit. In the previous example, we know that CDF of $M$ is $\mathcal{F}_M(x) = \left(\frac{x}{\theta}\right)^n$ for $x\in [0,\theta]$. Then, we see that as $n\to \infty$,
$$\left\{\begin{array}{lrr}\mathcal{F}_M(x) \to 0 & if & x\in [0,\theta] \\ \mathcal{F}_M(x) \to 1 & if & x = \theta\end{array}\right.$$Thus, $\mathbf{P}[|M-\theta| > \epsilon] = \mathbf{P}[\theta - M > \epsilon] = \mathbf{P}[M < \theta - \epsilon] = \mathcal{F}_M(\theta - M) \to 0 \text{ as }n\to \infty$
This proves consistency.
Notice: compared to the first method, this method only gives us info about the CDFL thus it only proves convergence in probability. The other method actually gives us convergence in $L^2(\Omega)$
Recall: cauchy distribution: $f(x) = \displaystyle \frac{1}{\pi} \frac{1}{1 + (x-\theta)^2}$
We know that Cauchy has $\infty$ expectation, so no classical method of memonets. Note: method of moment will not work here, because Cuachy distribution doesn;t have meoments.
Turns out sample median $M \to \theta$, so $M$ is a consistent estimator of $\theta$.
Recall:
Given a sample of size $n$, i.e., given $X_n = \left(X_{1n}, ..., X_{kn}\right)$
See book for joint PMF of all $X_{jn}$s.
Question: estimate $p_1, p_2, ..., p_k$
Note: we don't need the joint PMF of $X_{jn}$ because we know that distribution of $X_{jn}$ is $Binomial(n, p_j)$.
Since, we do not have an actual iid sample from $X_{jn}$, we cannot use the method of moments estimator or probbaly not even MLE. But we can use $\hat{p}_{jn} = \frac{X_{jn}}{n}$ to represent an estimtor dor $p_j$.
Notice: $\mathbf{E}[\hat{p}_{jn}] \displaystyle = \frac{\mathbf{E}[X_{jn}]}{n} = \frac{np_j}{n} = p_j$
So, it's unbiased.
The next step is to use Markov to see if $\hat{p}_{jn}$ is consistent:
Like before, we use with power $p=2$ because it makes the equation much simpler and we can variance: $$\mathbf{E}[|\hat{p}_{jn} - p_j|^2] = \mathbf{Var}[\hat{p}_{jn}] \\ = \frac{1}{n^2} n p_j (1-p_j) \\ = \frac{p_j (1-p_j)}{n}$$
By Markov: $\displaystyle \mathbf{P}[|\hat{p}_{jn} - p_j| > \epsilon] \le \frac{p_j(1-p_j)\frac{1}{n}}{\epsilon} \to 0 \text { as }n\to \infty$
So $\hat{p}_{jn}$ is consistent.
Let $\hat{\theta}_{n}$ be an estimator of $\theta$. Assume $z_n = \displaystyle \frac{\hat{\theta}_{n} - \theta}{\sigma}\sqrt{n} \to \mathcal{N}(0,1)$ (in distribution).
Let $h(\theta)$ be a $C^1$ functoin (means $h'(\theta)$ exists and is continuous)
Let $w_n =\displaystyle \frac{h(\hat{\theta}_{n}(x)) - h(\theta)}{\sigma h'(\theta)}\sqrt{n}$, then, $w_n \to \mathcal{N}(0,1)$ in distribution.
Why is this theorem important?
because in many examples like several examples we saw today, the variance of $\hat{\theta}_{n}(x)$ is of order $\displaystyle \frac{1}{n}$. Therefore, assuming $\hat{\theta}_{n}(x)$ is consistent, we would expect that $\mathbf{Var}[\hat{\theta}_{n}(x) - \theta]$ of order $1$.
Therefore, the theprem's assumption is that the CLT holds for $Z_n$, and this assumption is tyical (and often holds).
Another point, the value $\sigma$ in the definition of $z_n$ is called the asymptotic variance of $\hat{\theta}_{n}(x)$.
The conclusion of this theorem is that $h(\hat{\theta}_{n}(x))$ is also an estimator, consistent for $h(\theta)$ and it also satisfies a CLT and its asymptotic variance is $\displaystyle \sigma h'(\theta)$
Note: Idea behind the previous theorem's assumption special case $X_1,X_2,..., X_n$ iid, let $\bar{X}_n = \text{sample mean}$. Write $\mathbf{E}[X_i] = \mu$. Assume $\bar{X}_n \to \mu (\text{consistency})$.
Under the theorem's assumption for $\hat{\theta}_{n}(x) = \bar{X}_n$
Then, we can write down the following confidence interval for $\mu$ based on $\bar{X}_n$ (this is approximate)
In the theorem's assumption, we notice that the variance $\mathbf{Var}[\bar{X}_n] = \frac{\sigma^2}{n}$ where $\sigma^2 = \mathbf{Var}[X_i]$. This explains why it is useful to write down $Z_n = \frac{\bar{X}_n - \mu}{\sigma} \sqrt{n}$. $Z_n$ is the standardized version of $\bar{X}_n$.
Therefore, by the CLT assumption in the theorem,
$$\mathbf{P}[-1.96 \le Z_n \le 1.96] \approx 0.95$$This is the same thing as $$\Longleftrightarrow \mathbf{P}\left[ -1.96 \le \frac{\mu - \bar{X}_n}{\sigma} \sqrt{n} \le 1.96\right] \approx 0.95$$
$$\Longleftrightarrow \mathbf{P}\left[ \frac{-1.96 ~\sigma}{\sqrt{n}} + \bar{X}_n \le \mu\le \frac{1.96~\sigma}{\sqrt{n}} + \bar{X}_n\right] \approx 0.95$$therefore, $95\%$ confidence interval for $\mu$ is $\displaystyle \left[\frac{-1.96~\sigma}{\sqrt{n}} + \bar{X}_n, \frac{1.96~\sigma}{\sqrt{n}} + \bar{X}_n\right]$
Minor annoying problem: we don't necessary know $\sigma$. One idea to solve this is to replace $Z_n$ by not writing $\sigma$ in it but rather use $\displaystyle S_n = \text{sample st. dev} = \sqrt{\frac{1}{n-1} \sum_{k=1}^n (X_k - \bar{X}_n)^2}$.
We distingish this new estimator by defining $\displaystyle \hat{Z}_n = \frac{\bar{X}_n - \mu}{S_n}\sqrt{n}$.
It turns out that in most cases $Z_n$ satisfies a CLT. This is because of the so-called "Slutzky's theorem":
$M = \max(X_1,.., X_n)$ where $X_i$s a re iid $~Uniform[0,\theta]$.
Recall that CDF of $M$ is $F_M(x) = \left(\frac{x}{\theta}\right)^n$
Therefore, $\mathbf{P}[c \le M \le \theta] = 1 - F_M(x) = 1- \left(\frac{c}{\theta}\right)^n$
Similarly, $\mathbf{P}[c \le \frac{M}{\theta} \le 1] = 1 - c^n$
Now, solve for $\theta$:
$$\mathbf{P}[M \le \theta \le \frac{M}{c}] = 1 - c^n$$Therefore, to get a confidence interval at confidence level $\gamma\%$, we say the confidecen interval for $\theta$ is $\left[M, \frac{M}{c}\right]$ where we choose $c$ so that $\displaystyle 1 - c^n = \frac{\gamma}{100}$. Thus $c = \left(1 - \frac{\gamma}{100}\right)^{1/n}$
For example, the $90\%$ confidecne interval for $\theta$ is $\left[M, M~0.1^{-1/n}\right]$
In [ ]: