In [ ]:

    
%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt

</font>

np.random.seed
Sets seed to allow reproducible randomness
List and array operations:
- np.random.choice,np.random.shuffle
Shuffle or choose random entries from lists
Distributions:
- np.random.normal,np.random.binomial,
- np.random.uniform,np.random.poisson
Sample random numbers from distributions
plt.hist
Plots histograms

So what is all of this useful for, anyway?

Adding Noise

If you are developing an analysis pipeline, you probably want to simulate the noise you expect from your experimental data.

Quick noise example



In [ ]:

    
x = np.arange(-10,10,0.2)
y = np.cos(x)

noisy_y = y + np.random.normal(0,0.3,len(y))

plt.plot(x,y)
plt.plot(x,noisy_y)

Simulating sampling

You might want to do an experiment computationally before you actually do the experiment to make sure you'll be able to detect what you want to detect

You are measuring the length of microtubule bundles in S. pombe yeast.

The average length of these bundles is $5.00 \pm 1.4 \mu m$.
You introduce a mutation and expect you the microtubules will now be longer: $5.5 \pm 1.4 \mu m$.
You only have time to measure bundle length for 5 wildtype and 5 mutant cells.

Assuming your expectation is right, will you be able to tell that the mutant had any effect?

Simulate the sampling

5 samples from $5 \pm 1.4$
5 samples from $5.5 \pm 1.4$



In [ ]:

How do we test to see if these are different?



In [ ]:

    
import scipy.stats

Figure out how to use scipy.stats.ttest_ind.

Determine the p-value you for a t-test between your 5 wildtype and 5 mutant measurements.
Can you figure out how many samples you need to meausure to reliably get a p-value < 0.05 for this expected difference in means?



In [ ]:



In [ ]:

Stats provides a wide variety of statistical tests

t-test: scipy.stats.ttest_ind
One-way ANOVA: scipy.stats.f_oneway
Wilcoxan Rank: scipy.stats.ranksums
$\chi^{2}$: scipy.stats.chisquare
Pearson's Correlation: scipy.stats.pearsonr

Stats also provides access to probability distributions



In [ ]:

    
d = scipy.stats.norm()
x = np.arange(-5,5,0.01)
prob_density = d.pdf(x)
cum_density = d.cdf(x)



In [ ]:

    
def plot_distrib(d,x,name):
    """
    Function that plots the probability density and cumulative density
    functions of a distribution over the range defined in x.
    """

    fig, ax = plt.subplots(1,2,figsize=(12,5))
    ax[0].plot(x,d.pdf(x),"k-")
    ax[0].set_title("Probability Density Function ({})".format(name))
    ax[0].set_xlabel("x")
    ax[0].set_ylabel("P(X == x)")

    ax[1].plot(x,d.cdf(x),"k-")
    ax[1].set_title("Cumulative Density Function ({})".format(name))
    ax[1].set_xlabel("x")
    ax[1].set_ylabel("P(X $\geq$ x)")
    
    bottom = np.arange(np.min(x),d.interval(0.95)[0],0.01)
    top    = np.arange(d.interval(0.95)[1],np.max(x),0.01)
    
    ax[0].fill_between(bottom,d.pdf(bottom),color="gray")
    ax[0].fill_between(top,d.pdf(top),color="gray")



In [ ]:

    
plot_distrib(d,x,"Normal $\mu = 0$, $\sigma = 1$")



In [ ]:

    
pareto = scipy.stats.pareto(1)
x = range(1,50)
plot_distrib(pareto,x,"Pareto c = 1")

Summary

np.random lets you sample things (flip coins, add noise, simulate experiments)
scipy.stats lets you do statistics to analyze simulated or experimental results.