Let $X_1, X_2, \dots $ be i.i.d. random variables all with with mean $\mu$ and variance $\sigma^2$.
Let the sample mean be denoted by $\bar{X_n} = \frac{1}{n}\sum_{i=1}^{n}X_j$, where $n$ is the sample size.
What can we say about $\bar{X_n}$ when $n$ gets large?
Another way to think about the Law of Large Numbers is to see that
\begin{align} \displaystyle{\lim_{n \to \infty}} \, \bar{X_n} - \mu &= 0 \\ \end{align}However, this is only thinking point-wise.
One way to study the distribution of $\bar{X_n}$ is to multiply $(\bar{X_n} - \mu)$ by some variable that itself goes to $\infty$.
Consider:
\begin{align} n^{?} \, (\bar{X_n} - \mu) \end{align}We could learn more about the distribution of $\bar{X_n}$ by selecting some power of $n > 0$, and thinking about what happens.
Here is the standardized version: \begin{align} &\quad \frac{\sum_{j=1}^{n} X_j - n\mu}{\sqrt{n} \, \sigma} \rightarrow \mathcal{N}(0,1) \end{align}
Let $S_n = \sum_{j=1}^{n} X_j$, show $M \left[ \frac{S_n}{\sqrt{n}} \right] \rightarrow M \left[ \mathcal{N}(0,1) \right]$.
Here are some quick facts about Moment Generating Functions to keep in mind as we go along our proof:
But $\frac{t^2}{2}$ is the log of $e^{t^2 / 2}$, and that is the MGF of $\mathcal{N}(0,1)$. QED.
Let $X \sim \operatorname{Bin}(n,p)$, and think of $X = \sum_{j=1}^{n} X_j, \, X \sim \operatorname{Bern}(p)$ i.i.d.
By the Central Limit Theorem, we can approximate $X$ with a Normal distribution if $n$ is large enough, and if we standardize $X$ first.
\begin{align} P(a \leq X \leq b) &= P\left( \frac{a - np}{\sqrt{npq}} \leq \frac{X - np}{\sqrt{npq}} \leq \frac{b - np}{\sqrt{npq}} \right) \\ &\approx \Phi\left( \frac{b - np}{\sqrt{npq}} \right) - \Phi\left( \frac{a - np}{\sqrt{npq}} \right) \end{align}Contrast the above Normal approximation of a binomial with a Poisson approximation. With a Poisson approximation of a binomial, we assumed:
But in the case of a Normal approximation, while we do wish $n$ to be large, it is best if $p$ is close to $\frac{1}{2}$. Why?
Remember that the Normal distribution in a CLR is symmetric about $\mu = 0$. If $p$ is too far away from the mean, then the distribution will get very skewed. That is bad. If $n$ is really, really, really large, then the CLR would still work no matter what $p$ might be, but you will need to be careful when $n$ is not that large.
Now in the above example, we are approximating a discrete distribution using a continuous one.
What would we do if we instead started with something like $P(X=a)$, where $a$ is some integer?
\begin{align} P(X=a) &= P(a - \epsilon \leq X \leq a + \epsilon) \end{align}... where $\epsilon$ is a very small value that allows us to look at a range centered at $a$ instead of a single value $a$. Now we can continue using the Normal approximation.