break statementwhile loop and know when it is more appropiate than the for loopscipy.stats module
In [1]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
import matplotlib
In [2]:
#Iterate through my favorite pets
best_pets = ['fox', 'cat', 'dog', 'crow']
index = 0 #Keep track of the index
for pet in best_pets:
print(pet, index)
index += 1
In [3]:
#Now stop and report the dog index
best_pets = ['fox', 'cat', 'dog', 'crow']
index = 0
for pet in best_pets:
if pet == 'dog':
print('The dog has index {}'.format(index))
break
index += 1
print(best_pets[index])
Here's an example where we want to know when summing integers reaches 50
In [4]:
#Sum integers and stop when the sum is 50
ints = range(100)
isum = 0
for i in ints:
isum += i
if(isum > 50):
break
print('{} was the integer that pushed us over 50'.format(i))
In [5]:
#Sum integers and stop when the sum is 10000
ints = range(100)
isum = 0
for i in ints:
isum += i
if(isum > 10000):
break
print('{} was the integer that pushed us to {}'.format(i, isum))
We needed to know before starting the loop how many integers to loop over!
We can get around this using a while loop
In [6]:
i = 0
isum = 0
while isum < 10000:
isum += i
i += 1
print('{} was the integer that pushed us to {}'.format(i, isum))
Assuming $p=0.1$. Find the smallest $x$ such that: $$ P(n < x) \geq 0.9$$
First, let's just sum to $1$ to make sure our code is correct
In [7]:
Q = range(1, 100)
p = 0.1
psum = 0
for n in Q:
psum += (1 - p) ** (n - 1) * p
print(psum)
Now we'll stop early, when we've reached the $0.9$ probability.
I write min, because otherwise you could trivially just pick $x= \infty$ which is not what we want.
In [1]:
Q = range(1, 100)
p = 0.1
psum = 0
for n in Q:
psum += (1 - p) ** (n - 1) * p
if(psum >= 0.9):
break
print('The number which pushed our sum over 0.9 is {}. The sum is {}'.format(n, psum))
With a prediction level of 90%, success will occur within 22 trials/attempts
The 90% prediction interval is [1,22]
In [17]:
#geometric probability of 5 with p=0.2
p = 0.2
n = 5
(1 - p )**(n - 1) * p
Out[17]:
In [18]:
from scipy import stats as ss
ss.geom.pmf(5, p=0.2)
Out[18]:
You can plot, since this is a numpy supporting library
In [19]:
n = np.arange(1,15)
pn = ss.geom.pmf(n, p=0.1)
plt.plot(n, pn, 'bo')
plt.show()
Scipy stats has all the cumulative distributions as well.
In [20]:
#amount of probability between n=1 (inc) and n=22
ss.geom.cdf(22, p=0.1)
Out[20]:
Finally, there is a function to ask about prediction intervals.
In [21]:
#what should the rv for the cdf be a certain number?
In [22]:
ss.geom.ppf(0.9, p=0.1)
Out[22]:
For now, I will have you use scipy stats to check your work. You may not use it on homework unless specified or you're working with the normal distribution.
One note about scipy stats is that you need to visit the documentation website to get complete function help. The docstrings often don't provide enough info.
There is not much to do with this equation however, since we must integrate it to compute probabilities
For a normal distribution, let's try $\mu = 2$, $\sigma = 0.5$. What is the probability of a sample falling between $2$ and $3$?
$$P(2 < x < 3) = \int_2^3 \frac{1}{\sqrt{2\pi0.5^2}} e^{-\frac{(x - 2)^2}{2\times0.5^2}} \, dx$$We can use the ss.norm.cdf, where CDF is the cumulative distribution function. This is the definition of the CDF:
Using some math, you can show that:
$$P(a < x < b) = CDF(b) - CDF(a)$$so for our example:
In [1]:
from scipy import stats as ss
prob = ss.norm.cdf(3, scale=0.5, loc=2) - ss.norm.cdf(2, scale=0.5, loc=2)
print(prob)
Since different normal distributions have different means and such, they invented the idea of Standard Normal Distributions:
$$ Z = \frac{x - \mu}{\sigma} $$and then you use this $Z$ as being a sample in the $\cal{N}(0,1)$ distribution. These are called $Z$ scores. The input, $x$, should always be the bounds of some interval. $Z$ scores are not used to compute probability densities, only for intervals to compute probabilities.
Let's see how this looks. Let's say we have a random variable distributed according to $\cal{N}(-2, 3)$. That is a shorthand for $\mu=-2$, $\sigma = 3$. Let's say we want to know $P( x > 0)$. We can rewrite that as:
$$P(x > 0) = 1 - P(x < 0) = 1 - \int_{-\infty}^0 \cal{N}(-2, 3) \, dx$$Now it's in a form where we can use the CDF:
In [4]:
1 - ss.norm.cdf(0, scale=3, loc=-2)
Out[4]:
Now we can do the same problem using $Z$-scores. Let's compute the $Z$-score:
$$Z = \frac{x - \mu}{\sigma} = \frac{0 - -2}{3} = \frac{2}{3}$$
In [5]:
Z = 2 / 3.
1 - ss.norm.cdf(Z)
Out[5]:
Notice that by default, scipy assumes a standard normal distribution!
Let's do this with $Z$ scores.
$$Z_{hi} = \frac{5 - 5}{1.5}$$$$Z_{lo} = \frac{3 - 5}{1.5}$$$$P(3 < x < 5) = \int_{Z_{lo}}^{Z_{hi}} \cal{N}(0,1)\,dx$$
In [6]:
Zhi = (5 - 5)/1.5
Zlo = (3 - 5)/1.5
prob = ss.norm.cdf(Zhi) - ss.norm.cdf(Zlo)
print(prob)
We can estimate how bad it is, by seeing what the probability of having negative snowfall is.
$$P(x < 0) = \int_{-\infty}^{0} \cal{N}(5,1.5)$$
In [7]:
ss.norm.cdf(0, loc=5, scale=1.5)
Out[7]:
In [8]:
Z = (0 - 5)/1.5
ss.norm.cdf(Z)
Out[8]:
Looks great! There is about as much a chance of negative snowfall in our equation as reality.
In [14]:
sigma = 1.
mu = 3
x = np.linspace(0,6, 100)
dist = (mu - x)**2 / (2 * sigma**2)
plt.plot(x, dist)
plt.show()
In [16]:
from math import *
Q = 1 / np.sqrt(2 * sigma**2 * pi)
norm = Q * np.exp(-dist)
plt.plot(x, norm)
plt.show()
In [20]:
plt.plot(x, ss.norm.pdf(x, loc=3, scale=0.5), label=r'$\sigma=0.5$')
plt.plot(x, ss.norm.pdf(x, loc=3, scale=1.0), label=r'$\sigma=1.0$')
plt.plot(x, ss.norm.pdf(x, loc=3, scale=1.5), label=r'$\sigma=1.5$')
plt.legend()
plt.show()
In [2]:
ss.norm.ppf(0.99, scale=1.5, loc=5)
Out[2]:
So a single sample will fall between $-\infty$ and $8.5$ with 99% probability
To flip your interval, you have to rely on normalization. Let's say we want the lowest amount of snowall with 99% probability. We can flip that and say instead we want the highest snowfall with 1% probability
In [3]:
ss.norm.ppf(0.01, scale=1.5, loc=5)
Out[3]:
So the snow will be between $-\infty$ and 1.5 inches with 1% probability, and from normalization be between 1.5 and $\infty$ with 99% probability.