Standard normal distribution takes a bell curve. It is also called as **gaussian distribution**. Values in nature are believed to take a normal distribution. The equation for normal distribution is

where $\mu$ is mean

$\sigma$ is standard deviation

$\pi$ = 3.14159..

$e$ = 2.71828.. (natural log)

```
In [2]:
```import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

```
In [10]:
```vals = np.random.standard_normal(100000)
len(vals)

```
Out[10]:
```

```
In [19]:
```fig, ax = plt.subplots(1,1)
hist_vals = ax.hist(vals, bins=200, color='red', density=True)

```
```

The above is the standard normal distribution. Its mean is 0 and SD is 1. About `95%`

values fall within $\mu \pm 2 SD$ and `98%`

within $\mu \pm 3 SD$

The area under this curve is `1`

which gives the probability of values falling within the range of standard normal.

A common use is to find the probability of a value falling at a particular range. For instance, find $p(-2 \le z \le 2)$ which is the probability of a value falling within $\mu \pm 2SD$. This calculated by summing the area under the curve between these bounds.

$$p(-2 \le z \le 2) = 0.9544$$which is `95.44%`

probability. Its `z`

score is `0.9544`

.

Similarly $$p(z \ge 5.1) = 0.00000029$$

By rule of thumb, a `z`

score greater than `0.005`

is considered **significant** as such a value has a very low probability of occuring. Thus, there is less chance of it occurring **randomly** and hence, there is probably a force acting on it (significant force, not random chance).

If the distribution of a phenomena follows normal dist, then you can transform it to standard normal, so you can measure the `z`

scores. To do so,
$$std normal value = \frac{observed - \mu}{\sigma}$$ You subtract the mean and divide by SD of the distribution.

**Example**
Let X be age of US presidents at inaugration. $X \in N(\mu = 54.8, \sigma=6.2)$. What is the probability of choosing a president at random that is less than `44`

years of age.

We need to find $p(x<44)$. First we need to transform to standard normal.

$$p(z< \frac{44-54.8}{6.2})$$$$p(z<-1.741) = 0.0409 \approx 4\%$$```
In [2]:
```import scipy.stats as st
# compute the p value for a z score
st.norm.cdf(-1.741)

```
Out[2]:
```

Let us try for some common `z scores`

:

```
In [4]:
```[st.norm.cdf(-3), st.norm.cdf(-1), st.norm.cdf(0), st.norm.cdf(1), st.norm.cdf(2)]

```
Out[4]:
```

`norm.cdf()`

function gives the **cumulative probability** (**left tail**) from `-3`

to `3`

approx. If you need right tailed distribution, you simply subtract this value from `1`

.

```
In [7]:
```# Find Z score for a probability of 0.97 (2sd)
st.norm.ppf(0.97)

```
Out[7]:
```

```
In [8]:
```[st.norm.ppf(0.95), st.norm.ppf(0.97), st.norm.ppf(0.98), st.norm.ppf(0.99)]

```
Out[8]:
```

As is the `ppf()`

function gives only positive `z`

scores, you need to apply $\pm$ to it.

Transforming features to standard normal has applications in machine learning. As each feature has a different unit, their range, standard deviation vary. Hence we scale them all to standard normal distribution with **mean=0** and **SD=1**. This way a learner finds those variables that are truly influencial and not simply because it has a larger range.

To accomplish this easily, we use `scikit-learn`

's `StandardScaler`

object as shown below:

```
In [3]:
```demo_dist = 55 + np.random.randn(200) * 3.4
std_normal = np.random.randn(200)

```
In [6]:
```[demo_dist.mean(), demo_dist.std(), demo_dist.min(), demo_dist.max()]

```
Out[6]:
```

```
In [7]:
```[std_normal.mean(), std_normal.std(), std_normal.min(), std_normal.max()]

```
Out[7]:
```

Now let us use scikit-learn to easily transform this dataset

```
In [8]:
```from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

```
In [22]:
```demo_dist = demo_dist.reshape(200,1)
demo_dist_scaled = scaler.fit_transform(demo_dist)

```
In [23]:
```[round(demo_dist_scaled.mean(),3), demo_dist_scaled.std(), demo_dist_scaled.min(), demo_dist_scaled.max()]

```
Out[23]:
```

```
In [49]:
```fig, axs = plt.subplots(2,2, figsize=(15,8))
p1 = axs[0][0].scatter(sorted(demo_dist), sorted(std_normal))
axs[0][0].set_title("Scatter of original dataset against standard normal")
p1 = axs[0][1].scatter(sorted(demo_dist_scaled), sorted(std_normal))
axs[0][1].set_title("Scatter of scaled dataset against standard normal")
p2 = axs[1][0].hist(demo_dist, bins=50)
axs[1][0].set_title("Histogram of original dataset against standard normal")
p3 = axs[1][1].hist(demo_dist_scaled, bins=50)
axs[1][1].set_title("Histogram of scaled dataset against standard normal")

```
Out[49]:
```

As you see above, the shape of distribution is the same, just the values are scaled.

```
In [20]:
```demo_dist = 55 + np.random.randn(200) * 3.4
std_normal = np.random.randn(200)

```
In [21]:
```demo_dist = sorted(demo_dist)
std_normal = sorted(std_normal)

```
In [22]:
```plt.scatter(demo_dist, std_normal)

```
Out[22]:
```

As can be seen, we use statistics to estimate population mean $\mu$ from sample mean $\bar x$. Standard error of $\bar x$ represents on average, how far will it be from $\mu$.

As you suspect, the quality of $\bar x$, or its **standard error** will depend on the sample size. In addition, it also depends on population standard deviation. Thus for a tighter population, it is much easy to estimate mean from a small sample as there are fewer outliers.

Nonetheless,
$$SE(\bar x) = \frac{\sigma}{\sqrt{n}}$$ where $\sigma$ is **population SD** and $n$ is sample size.

Empirically, $SE(\bar x)$ is same as the **SD** of a distribution of sample means. If you were to collect a number of samples, find their means to form a distribution, the SD of this distribution represents the **standard error** of that estimate (mean in this case).

From a population, many samples of size > `30`

is drawn and their means are computed and plotted, then with $\bar x$ or $\bar y$ -> mean of a sample and $n$ -> size of 1 sample, $\sigma_{\bar x}$ or $\sigma_{\bar y}$ is SD of distribution of samples, you can observe that

- $\mu_{\bar x} = \mu$ (mean of distribution of sample means equals population mean)
- $\frac{\sigma}{\sqrt n} = \sigma_{\bar x}$ (SD of population over sqrt of sample size equals SD of sampling distribution)
- relationship between population mean and mean of a single sample is
- $$ \mu = \bar y \pm z(\frac{\sigma}{\sqrt n})$$

The `z`

table and normal distribution are used to derive confidence intervals. Popular intervals and their corresponding `z`

scores are

interval | z-value |
---|---|

99% | $\pm 2.576$ |

98% | $\pm 2.326$ |

95% | $\pm 1.96$ |

90% | $\pm 1.645$ |

As you imagine, these are the values of `z`

on X axis of the standard normal distribution and the area they cover.

For a normal distribution, confidence intervals for an estimate (such as mean) can be given as
$$CI = \bar x \pm z\frac{s}{\sqrt{n}}$$
where $s$ is sample SD that is substituted in place of population SD, if sample size is larger than **30**.

**Example**
The average TV viewing times of `40`

adults sampled in Iowa is `7.75`

hours per week. The SD of this sample is `12.5`

. **Find the 95% CI population's average TV viewing times**.

$\bar x = 40$, $s=12.5$, $n=40$, $Z=1.96$ for 95% CI. Thus $$95\%CI = 7.75 \pm 1.96\frac{12.5}{\sqrt{40}}$$ $$95\%CI = (3.877 | 11.623)$$

Thus the `95`

% CI is pretty wide. Intuitively, if SD of sample is smaller, then so is the CI.

```
In [ ]:
```