Sufficiency and Factorization

Definition:

Let $X \in \mathbb{R}^n$ (our data).

Let $T(X)$ be an estimator for $\theta$ (or $g(\theta)$ if needed).

We say $T(X)$ is sufficient for $\theta$ if $f(x|T(X);\theta)$ does not depend on $\theta$

Idea: we assume $x$ is our observation. We look at how much information we get from observing only $T(X)$.

If the observation of $X$ brings nothing new as far as determining $\theta$ is concerned, then it must be that $T(X)$ is sufficient to determine $\theta$

Another formal definition: (Neyman Factorization)

Let $f (x; \theta)$ be the density (or probability) function for $\mathbf{X}=(X_1 , . . . , X_n )$ for $\theta \in \Omega$ , and let $T (X)$ be a statistic. We say $T (X)$ is sufficient for $\theta$ if and only if there exist functions $g(t; \theta)$ and $h(x)$ exists such that

$$f (x; \theta) = g(T (x); \theta)h(x)$$

Example:

Let $X\sim ~iid~ Bernoulli(\theta)$

We define $T = X_1 + X_2 + ... + X_n$ (which is also known as Binomial random variable).

Is $T$ sufficient for $\theta$?

We compute the conditional PMF of $X$ given $T$ assuming $\theta$ is the parmater:

$$f\left(x| T=t; \theta\right) = \frac{f(x)\cap f(T=t)}{f(T=t)} \\= \frac{\theta^{x_1} (1-\theta)^{1-x_1}~\theta^{x_2} (1-\theta)^{1-x_2} ~..~ \theta^{x_n} (1-\theta)^{1-x_n} }{\left(\begin{array}{c}n\\t\end{array}\right)\theta^t (1-\theta)^{n-t}}\\ ~ ~\text{ (where }x_1+x_2+..x_n=t\text{)} \\ \\ = \frac{\theta^{t} (1-\theta)^{n-t} }{\left(\begin{array}{c}n\\t\end{array}\right)\theta^t (1-\theta)^{n-t}} = \frac{1}{\left(\begin{array}{c}n\\t\end{array}\right)}$$

So since this does not depend on $\theta$, we conclude that $T$ is a sufficient statistic for $\theta$

Recap:

Let $X$ be data, $\theta$ be a parameter. If the conditional density $f(x|T(X)=t; \theta)$ does not depend on $\theta$ 9in other words, it depends only on $T(X)$) this means that the knowledge of $T(X)$ gives you all the informaiton you can possible gather from data $x$ regarding the value of $\theta$.

Note: we also have Neyman-factorization:

When $f(x;\theta) = g(T(x),\theta) h(x)$, then $T(X)$ is sufficient (and converse is also true)