This dry definition provides a function with the right properties to describe our intuitive understanding of probability.
Let $\Omega$ be the set of states available to a system of fixed energy, e.g. a box full of gas particles.
With one additional (physics) assumption, that it's equally probable for the system to occupy any state in $\Omega$, this is the microcanonical ensemble in statistical mechanics.
Very often the type of event we're interested in lives in a continuous sample space. E.g., the Hubble parameter is $h=0.7$.
Our axioms mostly translate straightforwardly; in this example $P(\Omega)=1$ becomes the normalization condition
$\int_{-\infty}^{\infty} p(h=x)dx$ = 1
We can always describe the discrete case as a continuous one where $p$ is a sum of Dirac delta functions.
If $X$ takes real values, then $p(X=x)$ is a probability density function, or PDF.
The first bullet is highly relevant if we ever want to change variables, e.g. $x\rightarrow y(x)$
The cumulative distribution function (CDF) is the probability that $X\leq x$.
The quantile function is the inverse of the CDF, $F^{-1}(P)$.
The marginal probability of $y$, $p(y)$, means the probability of $y$ irrespective of what $x$ is.
The conditional probability of $y$ given a value of $x$, $p(y|x)$, is most easily understood this way
i.e., $p$ of getting $x$ AND $y$ can be factorized into the product of
$p(y|x)$ is a (normalized) slice through $p(x,y)$ rather than an integral.
$x$ and $y$ are independent if $p(y|x) = p(y)$.
Equivalently, $p(x,y) = p(x)\,p(y)$.
Take the coin tossing example from earlier, where $P(\mathrm{heads})=q$ and $P(\mathrm{tails})=1-q$ for a given toss. Assume that this holds independently for each toss.
Find:
The answer to the previous exercise is the PDF of the binomial distribution
$P(n|q,N) = {N \choose n} q^n (1-q)^{N-n}$
To introduce some notation, we might write this as
$n \sim \mathrm{Binom}(q,N)$
Here the squiggle means "is a random variable that is distributed as" (as opposed to "has the same order of magnitude as" or "scales with", the common usages in physics).
Recall that a key assumption was that each toss (trial) was independent. If we write the mean number of heads as $\mu=qN$ and also assume that $q$ is small while $N$ is large, then a series of irritating limits and substitutions yields the Poisson distribution
$P(n|\mu) = \frac{\mu^n e^{-\mu}}{n!}$
This is an extremely important result, given that most astronomy and physics experiments boil down to counting events that are rare compared with the number of time intervals in which they might happen (and be recorded).
The Poisson distribution has the following (probably familiar) properties:
Another important theorem states, in its most common form:
Among other things, this implies that a Poisson distribution with large enough $\mu$ closely resembles a Gaussian.
This is a powerful result, but we need to keep some things in mind.
Go farther with the previous exercise! Consider the function $b=\tan(\theta)$, which is sometimes used to reparametrize the slope of a line ($b$) with the angle the line makes in a plot ($\theta$).
In [ ]: