It would be nice to do a Judea Pearl-type DAG
Let's say we're interested in predicting a college-football game. What are all the things that influence the outcome? Here's a list of things that come to mind:
Obviously, this list is incomplete: there are missing variables (perhaps each team's previous-week result) and some variables are aggregates of more finely grained variables (for example, offensive abilities is a combination of passing abilities and rushing abilities.) But to make things easy, pretend that only these variables determine the outcome of football games and they do so in the following way:
$MOV = (A_o − B_d) − (B_o − A_d) + (A_s − B_s) + (A_h − B_h) + H + R + X$,
where $MOV$ is Team A's margin of victory. $MOV$ can take positive and negative values. A negative $MOV$ means Team B wins.
We can use this equation to make predictions. For example, given two equal teams ($A_o = B_o$, $A_d = B_d$, $A_s = B_s$, and $A_h = B_h$) and unbiased refs ($R = 0$), A will win by $H + X$ points.
Equations require consistency of units:
We call this precisely defined relationship between the causes (on the right) and the effect (on the left) the data-generating process. This equation is easily interpeted:
A random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process.
It is easy to confuse random variables with algebraic variables, but the two differ. The value of an algebraic variable is deterministic (i.e. the variable can take multiple values, but given inputs to the deterministic process there is only one possible value that the algebraic variable can take) while the value of a random variable is at least partly determined by a random process (i.e. even if a deterministic process underlies a random variable, knowing inputs to the deterministic process is not good enough to know the value of the random variable with certainty.) Here are a few examples:
Often we treat deterministic processes as random because it is simpler to think of them that way. For example, if we knew the exact weight and measurement of a die and the speed, height, rotation, etc. at which it was tossed, we might be able to figure out exactly which side would come up (this has been demonstrated using the coin toss). But getting that information and doing those calculations is a burden, and treating it as random is simpler.
Formally, a random variable is a function from a probability space, typically to the real numbers, that is measurable. (For finite probability spaces, the measurable requirement is superfluous.) Random variables can be classified as either discrete (a random variable that may assume either a finite number of values or an infinite sequence of values) or as continuous (a variable that may assume any numerical value in an interval or collection of intervals). A random variable's possible values might represent the possible outcomes of a yet-to-be-performed experiment, or the potential values of a quantity whose already-existing value is uncertain (for example, as a result of incomplete information or imprecise measurements).
In [ ]:
In [2]:
import numpy
import matplotlib.pyplot as plt
%matplotlib inline
x = numpy.linspace(0.1,10,99)
y = x**2
plt.plot(x,y)
plt.show()
To force large $x$ toward 0, $k$ needs to be positive. And if $k$ is positive, $x \geq 1$.
Let's find the normalizing constant ($c$):
$1 = \int_{1}^{\infty} c x^{-k} dx$
$1 = c \bigl[ \frac{1}{1-k} x^{1-k} + d \bigr]_{1}^{\infty}$
$1 = \frac{c}{1-k} \bigl[ 0 - 1 \bigr]$
$1 = \frac{c}{k-1}$
$c = k-1$
So the power law density function is $\begin{equation} f_X(X=x | k)=\begin{cases} (k-1)x^{-k} & \text{if }1 \leq x < \infty \text{ and } k > 0 \\ 0 & \text{otherwise}. \end{cases} \end{equation}$
Here's what $f_X(X=x | k=2)$ looks like:
In [12]:
power_k2 = lambda x: x**-2 if x>=1 else 0
x = numpy.linspace(0.1,10,99)
y = [power_k2(z) for z in x]
plt.plot(x,y)
plt.show()
Nassim Taleb offered the following quiz that uses a power distribution. Note his typo ("$q=.07$" should be "$q=.007$"). Using our equation, what is $k$? <img src="taleb_tweet.png", width=500>
First integrate the distribution from some $y$ to infinity: $1-F_X(X=x|k) = 1 - \int_1^y (k-1)x^{-k} dx$
$1-F_X(X=x|k) = 1 - \biggl[ (k-1) \bigl[ \frac{1}{1-k} x^{1-k} \bigr]_1^y \biggr]$
$1-F_X(X=x|k) = 1 - \biggl[ (-1) \bigl[ y^{1-k} - 1 \bigr] \biggr]$
$1-F_X(X=x|k) = 1 - \biggl[ 1 - y^{1-k} \biggr]$
$1-F_X(X=x|k) = y^{1-k}$
Then
$.45 = .007^{1-k}$
$\ln{.45} = (1-k) \ln{.007}$
$k = 1 - \frac{ \ln{.45} }{ \ln{.007} }$
$k = .84$
Is this a mixture distribution?
What happens as data move from simple to complex? We look at it using a convex set of a simple distribution (uniform) and a complex distribution (power).
First the uniform:
$\begin{equation} f_X(X=x)=\begin{cases} 1 & \text{if }1 \leq x \leq 2 \\ 0 & \text{otherwise}. \end{cases} \end{equation}$
Then the power:
$\begin{equation} g_X(X=x | k)=\begin{cases} (k-1)x^{-k} & \text{if }1 \leq x < \infty \text{ and } k \geq 0 \\ 0 & \text{otherwise}. \end{cases} \end{equation}$
And the convex set:
$\begin{equation} h_X(X=x | \alpha, k)=\begin{cases} \alpha + (1-\alpha)(k-1)x^{-k} & \text{if }1 \leq x < 2 \text{ and } k > 0 \\ (1-\alpha)(k-1)x^{-k} & \text{if }2 \leq x < \infty \text{ and } k > 0 \\ 0 & \text{otherwise}. \end{cases} \end{equation}$
(No normalizing constant needed because those were included in the input distributions.)
In [9]:
def convex_dist(x,alpha,k):
if x>=1 and x<2 and k>0:
return alpha + (1-alpha)*(k-1)*x**-k
elif x>=2 and k>0:
return (1-alpha)*(k-1)*x**-k
else:
return 0
x = numpy.linspace(0.1,10,99)
y0 = [convex_dist(z,0,2) for z in x]
y5 = [convex_dist(z,0.5,2) for z in x]
y1 = [convex_dist(z,1,2) for z in x]
plt.plot(x,y0)
plt.plot(x,y5)
plt.plot(x,y1)
plt.show()