Notation in Statistics

Understanding mathematics and statistics when you are new to an idea or topic is difficult enough, but we (yes, myself included) need to be careful as mathematicians/statisticians to introduce ideas with consistent notation that allows a reader to focus on relating the notation to an idea rather than deciphering our inconsistent notation.

In this set of notes, I would like to identify a few of the consistencies regarding common statistical notation. A misunderstanding in notation can frequently lead to a misconception of an idea altogether. With that, let's get started.

Parameters vs Statistics

Parameters are defined as numeric summaries about a population. Alternatively, statistics are defined as numeric summaries about a sample. In notation, we represent parameters and statistics differently.

Parameters are frequently identified by greek symbols. For example, $\mu$ is the symbol for a population mean (a parameter). However, we use $\bar{X}$ to identify a sample mean (a statistic). We might also choose to represent a sample mean with the notation of $\hat{\mu}$, where the 'hat' suggests this is a predicted value of the parameter of interest. Let's consider some common statistic-parameter pairs:

Description	Parameter	Statistic
Mean	$\mu$	$\bar{X}$
St. Dev.	$\sigma$	$s$
Variance	$\sigma^2$	$s^2$
Proportion	$\pi$	$p$
Intercept	$\beta_0$	$b_0$
Slope	$\beta_1$	$b_1$

We could also create the above table using our 'hat' notation in the following way:

Description	Parameter	Statistic
Mean	$\mu$	$\hat{\mu}$
St. Dev.	$\sigma$	$\hat{\sigma}$
Variance	$\sigma^2$	$\hat{\sigma^2}$
Proportion	$\pi$	$\hat{\pi}$
Intercept	$\beta_0$	$\hat{\beta_0}$
Slope	$\beta_1$	$\hat{\beta_1}$

Although you might be comfortable with either representation, the second table is very nice in its consistency in moving from a parameter to an estimate of that parameter (and maybe preferable for this reason).

Given the above notation as a basis, notice that we provide model representations for our data (statistical distributions, linear models, etc.) using population parameters. The reason being that these are the 'true' representations of the data in the population that we believe to be true. Similarly, hypothesis testing and confidence intervals are always performed on parameters.

Performing hypothesis tests for statistics doesn't make much sense (as statistics are calculated from our data), so the following statement is pretty ridiculous:

$H_0: \bar{X} = 5$

The value of a sample mean either is 5 upon collecting the data or not. Calculating a p-value associated with such a claim doesn't make sense. However, a statement like the following is asking the question of whether a parameter about a population is a particular value:

$H_0: \mu = 5$

Based on observed data, we can associate a probability of obtaining our data if the above claim is true.

Uppercase vs. Lowercase

In statistical notation, an uppercase and lowercase representation of the same value actually has different meaning. That is $X$ is not the same as $x$. Common practice is that if a particular letter represents a random variable, we capitalize the letter. However, if we observe an occurence of this random variable, it will be notated by a lowercase letter letter.

For example, consider we are modeling revenue for each month. We could say that revenue (uppercase $X$) is a random variable for month to month. Then we observe the revenue of \$100k (lowercase $x$). In the case of functions, we frequently use a lowercase letter to identify a particular function, and an uppercase letter to disucss the cumulative distribution function (cdf) of this function. An example of this idea is shown directly on the wiki page for cdfs.

Example

Consider we have data that follows a Beta distribution. We might write this as:

$X \sim Beta(\alpha, \beta)$

Notice, our data ($X$) has a capital letter associated with how we discuss it. The parameters ($\alpha$, $\beta$) are represented by greek symbols. If you are not familiar with a Beta distribution, it essentially models data that falls between the values of 0 and 1 with some probability. You can think of the parameters as having $\alpha$ successes in $\alpha$ + $\beta$ trials as our expected success rate. We might observe $x$ (lowercase) from this distribution equal to 0.32. We might observe a number of other $x$ values from this distribution. From those values, we could try to estimate $\alpha$ and $\beta$ using a method of moments), maximum likelihood approach or (if we are feeling a little fast and loose) bayesian approach as $\hat{\alpha}$ and $\hat{\beta}$.

If you have questions, please do not hesitate to contact me.