# Lesson 07 - Sampling Distributions

• By knowing the mean and standard deviation of normal distribution of a population we can compare any point in the distribution as we can find percentage less than and percentage greater than any value
• We know that measures of center can be used to compare distributions. We can compare distributions of samples with other distributions by finding means

## Gambling in Vegas

• We roll a tetrahedral die (4 sides)
• We can get 1, 2, 3, 4
• Expected value $\mu = (1 + 2 + 3 + 4) / 4 = 2.5$
• We win the gamble if average of 2 rolls is at least 3

Here we need to note that our population is 4 (1 to 4). And we are taking samples of size 2 from this population.

We have the following cases

If we take samples of this then we are essentailly taking mean of sampling distribution.

• Mean of sample means (i.e. mean of the above 16 cases) M = 2.5
• Distribution of sample means is Sampling Distribution
• For our case it can be seen at http://www.wolframalpha.com/input/?i=1,+1.5,+2,+2.5,+1.5,+2,+2.5,+3,+2,+2.5,+3,+3.5,+2.5,+3,+3.5,+4
• Taking a lot of samples and finding the mean of sample means we can see that M == $\mu$
• To compare this sample with other sample we also need SD of population.
• Let's call Standard deviation of distribution of sample means of sample size n as SE
• $(\sigma / SE) = \sqrt{n}$

## Central limit theorem

For any distribution if we draw a lot of samples and draw their sampling distribution it will turn out to be approximately normal given that the sample size is big enough.

## Finding Standard error

Let's try and find standard deviation of sampling distribution created when rolling 2 dies and taking their sum.

The population standard deviation can be found using the STDEVP function in google spreadsheets



In :

import math

def mean_standard_deviation(population):
mean = sum(population) / len(population)
differences = [element - mean for element in population]
squared_differences = [diff ** 2 for diff in differences]
mean_squared_differences = sum(squared_differences) / len(squared_differences)
SD = math.sqrt(mean_squared_difference)
return mean, SD

def standard_deviation_sample(population, sample_size):
mean_population, sd_population = mean_standard_deviation(population)
return sd_population / math.sqrt(sample_size)

population = list(range(1, 7))
print(standard_deviation_sample(population, 2))
print(standard_deviation_sample(population, 5))




1.2076147288491197
0.7637626158259733



As sample size increases the SD of sampling distribution decreases and hence it becomes skinnier.

Sampling applet is present at http://onlinestatbook.com/stat_sim/sampling_dist/index.html

## Importance of Sampling distribution

It helps us find where a sample that we have lies on the sampling distibution. That helps us understand whether our sample is normal or is there anything special going on compared to other possible samples.