In [ ]:
%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt

</font>

  • np.random.seed
    Sets seed to allow reproducible randomness
  • List and array operations:
    • np.random.choice,np.random.shuffle
    Shuffle or choose random entries from lists
  • Distributions:
    • np.random.normal,np.random.binomial,
    • np.random.uniform,np.random.poisson
    Sample random numbers from distributions
  • plt.hist
    Plots histograms

So what is all of this useful for, anyway?

Adding Noise

If you are developing an analysis pipeline, you probably want to simulate the noise you expect from your experimental data.

Quick noise example


In [ ]:
x = np.arange(-10,10,0.2)
y = np.cos(x)

noisy_y = y + np.random.normal(0,0.3,len(y))

plt.plot(x,y)
plt.plot(x,noisy_y)

Simulating sampling

You might want to do an experiment computationally before you actually do the experiment to make sure you'll be able to detect what you want to detect

You are measuring the length of microtubule bundles in S. pombe yeast.

  • The average length of these bundles is $5.00 \pm 1.4 \mu m$.
  • You introduce a mutation and expect you the microtubules will now be longer: $5.5 \pm 1.4 \mu m$.
  • You only have time to measure bundle length for 5 wildtype and 5 mutant cells.

Assuming your expectation is right, will you be able to tell that the mutant had any effect?

Simulate the sampling

  • 5 samples from $5 \pm 1.4$
  • 5 samples from $5.5 \pm 1.4$

In [ ]:

How do we test to see if these are different?


In [ ]:
import scipy.stats

Figure out how to use scipy.stats.ttest_ind.

  • Determine the p-value you for a t-test between your 5 wildtype and 5 mutant measurements.
  • Can you figure out how many samples you need to meausure to reliably get a p-value < 0.05 for this expected difference in means?

In [ ]:


In [ ]:

Stats provides a wide variety of statistical tests

  • t-test: scipy.stats.ttest_ind
  • One-way ANOVA: scipy.stats.f_oneway
  • Wilcoxan Rank: scipy.stats.ranksums
  • $\chi^{2}$: scipy.stats.chisquare
  • Pearson's Correlation: scipy.stats.pearsonr

Stats also provides access to probability distributions


In [ ]:
d = scipy.stats.norm()
x = np.arange(-5,5,0.01)
prob_density = d.pdf(x)
cum_density = d.cdf(x)

In [ ]:
def plot_distrib(d,x,name):
    """
    Function that plots the probability density and cumulative density
    functions of a distribution over the range defined in x.
    """

    fig, ax = plt.subplots(1,2,figsize=(12,5))
    ax[0].plot(x,d.pdf(x),"k-")
    ax[0].set_title("Probability Density Function ({})".format(name))
    ax[0].set_xlabel("x")
    ax[0].set_ylabel("P(X == x)")

    ax[1].plot(x,d.cdf(x),"k-")
    ax[1].set_title("Cumulative Density Function ({})".format(name))
    ax[1].set_xlabel("x")
    ax[1].set_ylabel("P(X $\geq$ x)")
    
    bottom = np.arange(np.min(x),d.interval(0.95)[0],0.01)
    top    = np.arange(d.interval(0.95)[1],np.max(x),0.01)
    
    ax[0].fill_between(bottom,d.pdf(bottom),color="gray")
    ax[0].fill_between(top,d.pdf(top),color="gray")

In [ ]:
plot_distrib(d,x,"Normal $\mu = 0$, $\sigma = 1$")

In [ ]:
pareto = scipy.stats.pareto(1)
x = range(1,50)
plot_distrib(pareto,x,"Pareto c = 1")

Summary

  • np.random lets you sample things (flip coins, add noise, simulate experiments)
  • scipy.stats lets you do statistics to analyze simulated or experimental results.