Covariance

Definiion:

Let $X$ and $Y$ be two random variables on the same sample space. Then, their covariance is $$\mathbf{Cov}[X,Y] = \mathbf{E}\left[(X-\mu_X)(Y-\mu_y)\right]$$

where $\mu_X = \mathbf{E}[X]$ and $\mu_Y = \mathbf{E}[Y]$.

Fact: if $X$ and $Y$ are independent, then $\mathbf{Cov}[X,Y] = 0$

Note: the converse is false. Meaning that if the covariance is $0$, we cannot conclude that they are independent.

Only in two case, if covarince is zero, it means independent.

Fact: Similar to the formula $\mathbf{Var}[X] = \mathbf{E}[X^2] - \mu^2$, we have $\mathbf{Cov}[X,Y] = \mathbf{E}[XY] - \mu_X\mu_Y$

Fact: $\mathbf{Var}[X+Y] = \mathbf{Var}[X] + \mathbf{Var}[Y] + 2\mathbf{Cov}[X,Y]$

As a corollary, $$\mathbf{Var}\left[\sum_{i=1}^n X_i\right] = \sum_{i=1}^n \mathbf{Var}[X_i] + {\sum_{i=1}^n \sum_{j=1}^n}_{i\ne j} \mathbf{Cov} [X_i, X_j] = \\ \sum_{i=1}^n \mathbf{Var}[X_i] + \sum_{i=1}^n \sum_{j=1}^n \mathbf{Cov} [X_i, X_j]$$

Sample without replacement

We pick five people out of a group with $8$ women, and $12$ men.

The number $X$ of women in this sample has the hypergeometric distribution from a population of size $N=20$ with proportion (success probability on first draw) is $ = p = \frac{n}{N} = \frac{8}{20}$

$$\mathbf{Var}[X] = n \frac{N-n}{N-1}$$

The expression $\frac{N-n}{N-1}$ is called the finite sample correction.

For this example, $N=20$ and $n=8$. Tjhe finiate sample correction is $\frac{20-8}{20-1} = \frac{12}{19} \ne 1-p$

If $N$ is very large, and $p=n/N$ is not very close to $0$ or to $1$, then finite sample correction is $\frac{1-p}{1-1/N} \approx 1-p$

The case of sampling without replacement with $N\gg1$ and $p$ not close to $0$ or $1$ is almost exactly like counting Binomial successes. $\mathbf{Var}[X] \approx N p(1-p)$ which the right hand side is variance of $Binom(N,p)$

Take home message:

Sampling with replacement for successes & failure is Binomial distribution.
Sampling wihtout replacement instead is a special case if hypergeometric distribution, which is more complicated than Binomial.
When $N\gg 1$ and $p=\frac{n}{N}$ is fixed, then sampling with and without replacement are alomst the same

Correlation Coefficient

Let $X$ and $Y$ be given with $\mathbf{E}[X] = \mu_X$, $\mathbf{E}[Y] =\mu_Y$, $\mathbf{Var}[X] = \sigma_X^2$ and $\mathbf{Var}[Y] = \sigma_Y^2$.

Let $\displaystyle Z_X = \frac{X - \mu_X}{\sigma_X}$, and $\displaystyle Z_Y = \frac{Y - \mu_Y}{\sigma_Y}$.

We see that

$\mathbf{E}[Z_X] = \mathbf{E}[Z_Y] = 0$
$\mathbf{Var}[Z_X] = \mathbf{Var}[Z_Y] = 1$

This operation is called standardization. Sometimes, people call it normalization, but it is not a good practice.

Definition: Correlation coefficient

$$\rho(X,Y) = \mathbf{Cov}[Z_X, Z_Y]$$

Some properties of correlation

$\displaystyle \rho(X,Y) = \frac{\mathbf{Cov}[X, Y]}{\sigma_X \ \sigma_Y}$
$\rho(aX+b, cY+d) = \rho(X,Y)$

$-1 \le \rho(X,Y) \le +1$
- Note, if $\rho(X,Y)=1$, then $X$ and $Y$ are linearly related. $\exists \alpha_0\& \alpha_1 \ \text{ such that }\ Y = \alpha_0 + \alpha_1 X$ and $\alpha_1 > 0$
- If, $\rho(X,Y)$ is close to $1$, then $Y = \alpha_0 + \alpha_1 X + error$. The whole idea behind linear regression, is to make the error term as small as possible. (Minimizing sum of squares by choosing appropriate $\alpha_0$ and $\alpha_1$)



In [ ]: