Quantiles

By Evgenia "Jenny" Nitishinskaya and Delaney Granizo-Mackenzie

Notebook released under the Creative Commons Attribution 4.0 License.


Quantiles are values defined similarly to the median. Two common kinds are quartiles, which divide the data set into fourths, and percentiles, which divide it into hundredths. The median can also be called the second quartile or the 50th percentile. In general, the $y$th percentile is the value such that $y$ percent of the observations are less than it, and the $n$th quartile is the value such that $n$ fourths of the observations are less than it.

As with the median, there is not always a value that divides the data set perfectly, especially for small data sets. To find the position of the $y$th percentile in a list of observations $X_1, X_2, \ldots, X_n$ sorted in increasing order is $$ L_y = (n + 1) \frac{y}{100} $$ where $n$ is the length of our list of observations. If this gives a fractional position, we interpolate between the positions before and after. For instance, if $L_y = 7.85$, the $y$th percentile would be $.85$ of the way along from $X_7$ to $X_8$, which evaluates to $X_7 + .85(X_8 - X_7)$.


In [31]:
import numpy as np

# Generate 21 random integers and sort them
X = np.random.randint(100, size=21)
X.sort()
print X

print '50th percentile:', np.percentile(X, 50)
print '95th percentile:', np.percentile(X, 95)
print '11th percentile:', np.percentile(X, 11)


[ 1  4 10 14 22 23 24 27 35 40 44 45 53 60 61 71 73 79 88 93 93]
50th percentile: 44.0
95th percentile: 93.0
11th percentile: 10.8

Quantiles are often used to report rankings or to refer to groups by their performance. For instance, if you are examining companies of different sizes, you can take the 95th percentile of companies by market value of equity as your sample of large companies.