Stats practice

Testing for normality



In [1]:

    
%matplotlib inline
from matplotlib import pyplot as plt



In [2]:

    
from random import normalvariate, uniform, weibullvariate



In [3]:

    
# Make several sets of data; one randomly sampled 
# from a normal distribution and others that aren't.

n = 100
d_norm = [normalvariate(0,1) for x in range(n)]
d_unif = [uniform(0,1) for x in range(n)]
d_weib = [weibullvariate(1,1.5) for x in range(n)]



In [4]:

    
fig,ax = plt.subplots(1,1,figsize=(5,5))
bins = 20
xmin,xmax = -3,3
ax.hist(d_norm,histtype='step',bins=bins,range=(xmin,xmax),lw=2,
        color='red',label='normal')
ax.hist(d_unif,histtype='step',bins=bins,range=(xmin,xmax),lw=2,
        color='green',label='uniform')
ax.hist(d_weib,histtype='step',bins=bins,range=(xmin,xmax),lw=2,
        color='blue',label='Weibull')
ax.legend(loc='upper left',fontsize=10);

Make probability plots



In [5]:

    
from scipy.stats import norm,probplot



In [6]:

    
dists = (d_norm,d_unif,d_weib)
labels = ('Normal','Uniform','Weibull')
fig,axarr = plt.subplots(1,3,figsize=(14,4))
for d,ax,l in zip(dists,axarr.ravel(),labels):
    probplot(d, dist=norm, plot=ax)
    ax.set_title(l)

Interesting. Normal distribution follows the quantiles well and has the highest $R^2$ value, but both the uniform and Weibull distributions aren't very different. Need to temper what I think of as a convincing $R^2$ value.

Run Anderson-Darling test



In [7]:

    
from scipy.stats import anderson

Note that critical and significance values are always the same in the Anderson-Darling test regardless of the input. The A^2 value must be compared to them; if the test statistic is greater than the critical value at a given significance, then the null hypothesis is rejected with that level of confidence.



In [8]:

    
for d,l in zip(dists,labels):
    a2, crit, sig = anderson(d,dist='norm')
    if a2 > crit[2]:
        print "Anderson-Darling value for {:7} is A^2={:.3f}; reject H0 at 95%.".format(l,a2)
    else:
        print "Anderson-Darling value for {:7} is A^2={:.3f}; cannot reject H0 at 95%.".format(l,a2)









    



Anderson-Darling value for Normal  is A^2=0.304; cannot reject H0 at 95%.
Anderson-Darling value for Uniform is A^2=1.308; reject H0 at 95%.
Anderson-Darling value for Weibull is A^2=1.449; reject H0 at 95%.

Practice problems

Gender ratio

In a certain country, girls are highly prized. Every couple having children wants exactly one girl. When they begin having children, if they have a girl, they stop. If they have a boy, they keep having children until they get a girl.

What is the expected ratio of boys to girls in the country?



In [3]:

    
from numpy.random import binomial



In [7]:

    
# Monte Carlo solution

N = 100000
p_girl = 0.5
p_boy = 1 - p_girl

n_girl = 0
n_boy = 0

for i in range(N):
    has_girl = False
    
    while not has_girl:
        child = binomial(1,p_girl)
        if child:
            n_girl += 1
            has_girl = True
        else:
            n_boy += 1

n_child = n_girl + n_boy
print "Gender ratio is {:.1f}%/{:.1f}% boy/girl.".format(n_boy * 100./n_child, n_girl * 100./n_child)









    



Gender ratio is 50.2%/49.8% boy/girl.



In [ ]: