In [1]:
import pandas as pd
import numpy as np
from math import *

from ggplot import *
%matplotlib inline

In [4]:
x = np.random.uniform(size=100000)
df = pd.DataFrame({'x':x})

In [5]:
df


Out[5]:
x
0 0.072823
1 0.333318
2 0.589766
3 0.725427
4 0.254256
5 0.994160
6 0.356303
7 0.968033
8 0.526983
9 0.232518
10 0.527262
11 0.683994
12 0.069104
13 0.442857
14 0.972388
15 0.369894
16 0.624445
17 0.072818
18 0.508043
19 0.267760
20 0.634653
21 0.382272
22 0.185888
23 0.670246
24 0.447122
25 0.930639
26 0.037773
27 0.762964
28 0.338128
29 0.308757
... ...
99970 0.968960
99971 0.976037
99972 0.599949
99973 0.062972
99974 0.222659
99975 0.425420
99976 0.364649
99977 0.853154
99978 0.868834
99979 0.302208
99980 0.065341
99981 0.176708
99982 0.400384
99983 0.133862
99984 0.561546
99985 0.907003
99986 0.873324
99987 0.553295
99988 0.743294
99989 0.208195
99990 0.467326
99991 0.074584
99992 0.230796
99993 0.313990
99994 0.585190
99995 0.411894
99996 0.795058
99997 0.907080
99998 0.568661
99999 0.434893

100000 rows × 1 columns


In [6]:
ggplot(aes(x='x'), data=df) + geom_histogram()


stat_bin: binwidth defaulted to range/30.
    Use 'binwidth = x' to adjust this.
Out[6]:
<ggplot: (281526977)>

In [9]:
%%latex
x = y


x = y

Inversion sampling example

First find the normalizing constant: $$ \begin{align} f_X(X=x) &&= cx^2, 0 \leq x \leq 2 \\ 1 &&= c\int_0^2 x^2 dx \\ &&= c[\frac{1}{3}x^3 + d]_0^2 \\ &&= c[\frac{8}{3} + d - d] \\ &&= c[\frac{8}{3}] \\ f_X(X=x) &&= \frac{3}{8}x^2, 0 \leq x \leq 2 \end{align} $$

Next find the cumulative distribution function:

  • $F_X(X=x) = \int_0^x \frac{3}{8}x^2dx$
  • $=\frac{3}{8}[\frac{1}{3}x^3 + d]_0^x$
  • $=\frac{3}{8}[\frac{1}{3}x^3 + d - d]$
  • $=\frac{1}{8}x^3$

We can randomly generate values from a standard uniform distribution and set equal to the CDF. Solve for $x$. Plug the randomly generated values into the equation and plot the histogram or density of $x$ to get the shape of the distribution:

  • $u = \frac{1}{8}x^3$
  • $x^3 = 8u$
  • $x = 2u^{\frac{1}{3}}$

In [13]:
%%latex

\begin{align*}
    f_X(X=x) &= cx^2, 0 \leq x \leq 2 \\
    1 &= c\int_0^2 x^2 dx \\
    &= c[\frac{1}{3}x^3 + d]_0^2 \\
    &= c[\frac{8}{3} + d - d] \\
    &= c[\frac{8}{3}] \\
    f_X(X=x) &= \frac{3}{8}x^2, 0 \leq x \leq 2
\end{align*}


\begin{align*} f_X(X=x) &= cx^2, 0 \leq x \leq 2 \\ 1 &= c\int_0^2 x^2 dx \\ &= c[\frac{1}{3}x^3 + d]_0^2 \\ &= c[\frac{8}{3} + d - d] \\ &= c[\frac{8}{3}] \\ f_X(X=x) &= \frac{3}{8}x^2, 0 \leq x \leq 2 \end{align*}

In [41]:
u = np.random.uniform(size=100000)
x = 2 * u**.3333

df = pd.DataFrame({'x':x})
print df.describe()
ggplot(aes(x='x'), data=df) + geom_histogram()


                   x
count  100000.000000
mean        1.499664
std         0.387398
min         0.021286
25%         1.260849
50%         1.587595
75%         1.816844
max         1.999999
Out[41]:
<ggplot: (286026045)>

Joint Distribution

Find the normalizing constant:

  • $f_{X,Y}(X=x,Y=y) = c(2x + y), 0 \leq x \leq 2, 0 \leq y \leq 2$
  • $1 = \int_0^2 \int_0^2 c(2x + y) dy dx$
  • $ = c\int_0^2 [2xy + \frac{1}{2}y^2 + d]_0^2 dx$
  • $ = c\int_0^2 [4x + \frac{1}{2}4 + d - d] dx$
  • $ = c\int_0^2 (4x + 2) dx$
  • $ = c[2x^2 + 2x + d]_0^2$
  • $ = c[2(4) + 2(2) + d - d]$
  • $ = 12c$
  • $c = \frac{1}{12}$
  • $f_{X,Y}(X=x,Y=y) = \frac{1}{12}(2x + y), 0 \leq x \leq 2, 0 \leq y \leq 2$

In [5]:
x = np.random.uniform(size=10000)
y = np.random.uniform(size=10000)

Find the marginal distribution:

  • $f_{X,Y}(X=x,Y=y) = \frac{1}{12}(2x + y), 0 \leq x \leq 2, 0 \leq y \leq 2$
  • $f_X(X=x) = \int_0^2 \frac{1}{12}(2x + y) dy$
  • $ = \frac{1}{12}[2xy + \frac{1}{2}y^2 + d]_0^2$
  • $ = \frac{1}{12}[4x + 2 + d - d]$
  • $ = \frac{4x + 2}{12}$
  • $ = \frac{2x + 1}{6}$

Inversion sampling example:

  • $F_X(X=x) = \int_0^x \dfrac{2x+1}{6}dx$
  • $= \frac{1}{6}[x^2 + x + d]_0^x$
  • $= \frac{x(x + 1)}{6}$
  • $u = \frac{x^2 + x}{6}$
  • $0 = x^2 + x - 6u$
  • $x = \frac{-1 \pm \sqrt{1 + 4 \times 6u}}{2}$

In [8]:
u = np.random.uniform(size=100000)
x = (-1 + (1 + 24*u)**.5) / 2
df = pd.DataFrame({'x':x})
ggplot(aes(x='x'), data=df) + geom_histogram()


Out[8]:
<ggplot: (279378645)>

In [ ]: