Consistency in Estimation

Consistency

Definition: Let $X=(X_1,...,X_n)$ be some data (IID). Let $\theta$ be some parameter in the common law of this data. Let $\hat{\theta}(x)$ be an estimator.

We say $\hat{\theta}(x)$ is sonsistent for $\theta$ if $\hat{\theta}(x) \to \theta \text{ as }n\to \infty$ in probability.

Recall: convergence in probability (for a sequence $\hat{\theta}_n(x)$)

$$\mathbf{P}[|\hat{\theta}_n(x) - \theta|> \epsilon] \to 0$$

Typically, to prove consistency, one appeals to the so called ("Markov" also known as Chebyshev) inequality: $\displaystyle \mathbf{P}[|\hat{\theta}_n(x) - \theta|>\epsilon] \le \frac{\mathbf{E}[|\hat{\theta}_n(x) - \theta|]}{\epsilon}$

or $\displaystyle \mathbf{P}[|\hat{\theta}_n(x) - \theta|^p>\epsilon^p] \le \frac{\mathbf{E}[|\hat{\theta}_n(x) - \theta|^p]}{\epsilon^p}$ where $p$ is a power typically $\ge 1$

Example:

$X$ is a iid sample of size n from $Uniform(0,\theta)$. Recall, $M = \max(X_1,...,X_n)$.

We saw $\mathbf{E}[M] = -\frac{\theta}{n+1} + \theta$, and $\mathbf{Var}[M] = \frac{\theta^2 n}{(n+1)^2 (n+2)}\sim \frac{\theta^2}{n^2}$

Let $p=2$ in the previous strategy.

$$\mathbf{E}\left[|M-\theta|^2\right] = \mathbf{E}\left[|M - \mathbf{E}[M] +\mathbf{E}[M] -\theta|^2\right]$$

using triangle inequality

$$\sqrt{\mathbf{E}\left[|M-\theta|^2\right]} \le \sqrt{\mathbf{E}\left[|M - \mathbf{E}[M]|^2\right]} + \mathbf{E}\left[ |-\frac{\theta}{n+1}|^2\right]\\ = \sqrt{\mathbf{Var}[M]} + \frac{\theta}{n+1} \\ \sim \sqrt{\frac{\theta^2}{n^2}} + \frac{\theta}{n+1} = \frac{\theta}{n} + \frac{\theta}{n+1} \sim \frac{2\theta}{n} \to 0$$

This proves $\mathbf{P}[|M-\theta| > \epsilon] \le \displaystyle \frac{\frac{2\theta}{n}}{\epsilon}$. Therefore, $M$ is consistent for $\theta$.

Another way of proof:

In some cases, calculations are more explicit. In the previous example, we know that CDF of $M$ is $\mathcal{F}_M(x) = \left(\frac{x}{\theta}\right)^n$ for $x\in [0,\theta]$. Then, we see that as $n\to \infty$,

$$\left\{\begin{array}{lrr}\mathcal{F}_M(x) \to 0 & if & x\in [0,\theta] \\ \mathcal{F}_M(x) \to 1 & if & x = \theta\end{array}\right.$$

Thus, $\mathbf{P}[|M-\theta| > \epsilon] = \mathbf{P}[\theta - M > \epsilon] = \mathbf{P}[M < \theta - \epsilon] = \mathcal{F}_M(\theta - M) \to 0 \text{ as }n\to \infty$

This proves consistency.

Notice: compared to the first method, this method only gives us info about the CDFL thus it only proves convergence in probability. The other method actually gives us convergence in $L^2(\Omega)$

Median of IID Cauchy Data

Recall: cauchy distribution: $f(x) = \displaystyle \frac{1}{\pi} \frac{1}{1 + (x-\theta)^2}$

We know that Cauchy has $\infty$ expectation, so no classical method of memonets. Note: method of moment will not work here, because Cuachy distribution doesn;t have meoments.

Turns out sample median $M \to \theta$, so $M$ is a consistent estimator of $\theta$.

Multinomial distribution

Recall:

  • Discrete distribution
  • While Bernoulli is for 2 categories (success/failure), multinomial has more than 2 categories.
  • This is a random vector $X=(X_1,..,X_k)$ where $X_j$ represents the number of elemenets in a sample of size $n$ where each element is independent of all others and has probability $p_j$ of being of type $j$.

Given a sample of size $n$, i.e., given $X_n = \left(X_{1n}, ..., X_{kn}\right)$

See book for joint PMF of all $X_{jn}$s.

Question: estimate $p_1, p_2, ..., p_k$

Note: we don't need the joint PMF of $X_{jn}$ because we know that distribution of $X_{jn}$ is $Binomial(n, p_j)$.

Since, we do not have an actual iid sample from $X_{jn}$, we cannot use the method of moments estimator or probbaly not even MLE. But we can use $\hat{p}_{jn} = \frac{X_{jn}}{n}$ to represent an estimtor dor $p_j$.

Notice: $\mathbf{E}[\hat{p}_{jn}] \displaystyle = \frac{\mathbf{E}[X_{jn}]}{n} = \frac{np_j}{n} = p_j$

So, it's unbiased.

The next step is to use Markov to see if $\hat{p}_{jn}$ is consistent:

Like before, we use with power $p=2$ because it makes the equation much simpler and we can variance: $$\mathbf{E}[|\hat{p}_{jn} - p_j|^2] = \mathbf{Var}[\hat{p}_{jn}] \\ = \frac{1}{n^2} n p_j (1-p_j) \\ = \frac{p_j (1-p_j)}{n}$$

By Markov: $\displaystyle \mathbf{P}[|\hat{p}_{jn} - p_j| > \epsilon] \le \frac{p_j(1-p_j)\frac{1}{n}}{\epsilon} \to 0 \text { as }n\to \infty$

So $\hat{p}_{jn}$ is consistent.

Theorem:

Let $\hat{\theta}_{n}$ be an estimator of $\theta$. Assume $z_n = \displaystyle \frac{\hat{\theta}_{n} - \theta}{\sigma}\sqrt{n} \to \mathcal{N}(0,1)$ (in distribution).

Let $h(\theta)$ be a $C^1$ functoin (means $h'(\theta)$ exists and is continuous)

Let $w_n =\displaystyle \frac{h(\hat{\theta}_{n}(x)) - h(\theta)}{\sigma h'(\theta)}\sqrt{n}$, then, $w_n \to \mathcal{N}(0,1)$ in distribution.

Why is this theorem important?

because in many examples like several examples we saw today, the variance of $\hat{\theta}_{n}(x)$ is of order $\displaystyle \frac{1}{n}$. Therefore, assuming $\hat{\theta}_{n}(x)$ is consistent, we would expect that $\mathbf{Var}[\hat{\theta}_{n}(x) - \theta]$ of order $1$.

Therefore, the theprem's assumption is that the CLT holds for $Z_n$, and this assumption is tyical (and often holds).

Another point, the value $\sigma$ in the definition of $z_n$ is called the asymptotic variance of $\hat{\theta}_{n}(x)$.

The conclusion of this theorem is that $h(\hat{\theta}_{n}(x))$ is also an estimator, consistent for $h(\theta)$ and it also satisfies a CLT and its asymptotic variance is $\displaystyle \sigma h'(\theta)$

Note: Idea behind the previous theorem's assumption special case $X_1,X_2,..., X_n$ iid, let $\bar{X}_n = \text{sample mean}$. Write $\mathbf{E}[X_i] = \mu$. Assume $\bar{X}_n \to \mu (\text{consistency})$.

Under the theorem's assumption for $\hat{\theta}_{n}(x) = \bar{X}_n$

Then, we can write down the following confidence interval for $\mu$ based on $\bar{X}_n$ (this is approximate)

In the theorem's assumption, we notice that the variance $\mathbf{Var}[\bar{X}_n] = \frac{\sigma^2}{n}$ where $\sigma^2 = \mathbf{Var}[X_i]$. This explains why it is useful to write down $Z_n = \frac{\bar{X}_n - \mu}{\sigma} \sqrt{n}$. $Z_n$ is the standardized version of $\bar{X}_n$.

Therefore, by the CLT assumption in the theorem,

$$\mathbf{P}[-1.96 \le Z_n \le 1.96] \approx 0.95$$

This is the same thing as $$\Longleftrightarrow \mathbf{P}\left[ -1.96 \le \frac{\mu - \bar{X}_n}{\sigma} \sqrt{n} \le 1.96\right] \approx 0.95$$

$$\Longleftrightarrow \mathbf{P}\left[ \frac{-1.96 ~\sigma}{\sqrt{n}} + \bar{X}_n \le \mu\le \frac{1.96~\sigma}{\sqrt{n}} + \bar{X}_n\right] \approx 0.95$$

therefore, $95\%$ confidence interval for $\mu$ is $\displaystyle \left[\frac{-1.96~\sigma}{\sqrt{n}} + \bar{X}_n, \frac{1.96~\sigma}{\sqrt{n}} + \bar{X}_n\right]$

Minor annoying problem: we don't necessary know $\sigma$. One idea to solve this is to replace $Z_n$ by not writing $\sigma$ in it but rather use $\displaystyle S_n = \text{sample st. dev} = \sqrt{\frac{1}{n-1} \sum_{k=1}^n (X_k - \bar{X}_n)^2}$.

We distingish this new estimator by defining $\displaystyle \hat{Z}_n = \frac{\bar{X}_n - \mu}{S_n}\sqrt{n}$.

It turns out that in most cases $Z_n$ satisfies a CLT. This is because of the so-called "Slutzky's theorem":

Slutzky's theorem

Assume $T_n \to T$ in distribution.

Assume $W_n \to c \text{(a constant)} $ in probability.

Assume that $h(t,w)$ is a continuous function and $h(T_n, W_n)\to H$ in distribution.

Then, $h(T_n, W_n) \to h(T,C)$

Example:

$M = \max(X_1,.., X_n)$ where $X_i$s a re iid $~Uniform[0,\theta]$.

Recall that CDF of $M$ is $F_M(x) = \left(\frac{x}{\theta}\right)^n$

Therefore, $\mathbf{P}[c \le M \le \theta] = 1 - F_M(x) = 1- \left(\frac{c}{\theta}\right)^n$

Similarly, $\mathbf{P}[c \le \frac{M}{\theta} \le 1] = 1 - c^n$

Now, solve for $\theta$:

$$\mathbf{P}[M \le \theta \le \frac{M}{c}] = 1 - c^n$$

Therefore, to get a confidence interval at confidence level $\gamma\%$, we say the confidecen interval for $\theta$ is $\left[M, \frac{M}{c}\right]$ where we choose $c$ so that $\displaystyle 1 - c^n = \frac{\gamma}{100}$. Thus $c = \left(1 - \frac{\gamma}{100}\right)^{1/n}$

For example, the $90\%$ confidecne interval for $\theta$ is $\left[M, M~0.1^{-1/n}\right]$


In [ ]: