Getting started with mpl-probscale

Installation

mpl-probscale is developed on Python 3.6. It is also tested on Python 3.4, 3.5, and even 2.7 (for the time being).

From conda

Official releases of mpl-probscale can be found on conda-forge:

conda install --channel=conda-forge mpl-probscale

Fairly recent builds of the development verions are available on my channel:

conda install --channel=conda-forge mpl-probscale

From PyPI

Official source releases are also available on PyPI pip install probscale

From source

mpl-probscale is a pure python package. It should be fairly trivial to install from source on any platform. To do that, download or clone from github, unzip the archive if necessary then do:

cd mpl-probscale # or wherever the setup.py got placed
pip install .

I recommend pip install . over python setup.py install for reasons I don't fully understand.


In [ ]:
%matplotlib inline

In [ ]:
import warnings
warnings.simplefilter('ignore')

import numpy
from matplotlib import pyplot
from scipy import stats
import seaborn

clear_bkgd = {'axes.facecolor':'none', 'figure.facecolor':'none'}
seaborn.set(style='ticks', context='talk', color_codes=True, rc=clear_bkgd)

Background

Built-in matplotlib scales

To the casual user, you can set matplotlib scales to either "linear" or "log" (logarithmic). There are others (e.g., logit, symlog), but I haven't seen them too much in the wild.

Linear scales are the default:


In [ ]:
fig, ax = pyplot.subplots()
seaborn.despine(fig=fig)

Logarithmic scales can work well when your data cover several orders of magnitude and don't have to be in base 10.


In [ ]:
fig, (ax1, ax2) = pyplot.subplots(nrows=2, figsize=(8,3))
ax1.set_xscale('log')
ax1.set_xlim(left=1e-3, right=1e3)
ax1.set_xlabel("Base 10")
ax1.set_yticks([])

ax2.set_xscale('log', basex=2)
ax2.set_xlim(left=2**-3, right=2**3)
ax2.set_xlabel("Base 2")
ax2.set_yticks([])

seaborn.despine(fig=fig, left=True)

Probabilty Scales

mpl-probscale lets you use probability scales. All you need to do is import it.

Before importing, there is no probability scale available in matplotlib:


In [ ]:
try:
    fig, ax = pyplot.subplots()
    ax.set_xscale('prob')
except ValueError as e:
    pyplot.close(fig)
    print(e)

To access probability scales, simply import the probscale module.


In [ ]:
import probscale
fig, ax = pyplot.subplots(figsize=(8, 3))
ax.set_xscale('prob')
ax.set_xlim(left=0.5, right=99.5)
ax.set_xlabel('Normal probability scale (%)')
seaborn.despine(fig=fig)

Probability scales default to the standard normal distribution (note that the formatting is a percentage-based probability)

You can even use different probability distributions, though it can be tricky. You have to pass a frozen distribution from either scipy.stats or paramnormal to the dist kwarg in ax.set_[x|y]scale.

Here's a standard normal scale with two different beta scales and a linear scale for comparison.


In [ ]:
fig, (ax1, ax2, ax3, ax4) = pyplot.subplots(figsize=(9, 5), nrows=4)

for ax in [ax1, ax2, ax3, ax4]:
    ax.set_xlim(left=2, right=98)
    ax.set_yticks([])

ax1.set_xscale('prob')
ax1.set_xlabel('Normal probability scale, as percents')

beta1 = stats.beta(a=3, b=2)
ax2.set_xscale('prob', dist=beta1)
ax2.set_xlabel('Beta probability scale (α=3, β=2)')

beta2 = stats.beta(a=2, b=7)
ax3.set_xscale('prob', dist=beta2)
ax3.set_xlabel('Beta probability scale (α=2, β=7)')

ax4.set_xticks(ax1.get_xticks()[12:-12])
ax4.set_xlabel('Linear scale (for reference)')

seaborn.despine(fig=fig, left=True)

Ready-made probability plots

mpl-probscale ships with a small viz module that can help you make a probability plot of a sample.

With only the sample data, probscale.probplot will create a figure, compute the plotting position and non-exceedance probabilities, and plot everything:


In [ ]:
numpy.random.seed(0)
sample = numpy.random.normal(loc=4, scale=2, size=37)

fig = probscale.probplot(sample)
seaborn.despine(fig=fig)

You should specify the matplotlib axes on which the plot should occur if you want to customize the plot using matplotlib commands directly:


In [ ]:
fig, ax = pyplot.subplots(figsize=(7, 3))

probscale.probplot(sample, ax=ax)

ax.set_ylabel('Normal Values')
ax.set_xlabel('Non-exceedance probability')
ax.set_xlim(left=1, right=99)
seaborn.despine(fig=fig)

Lots of other options are directly accessible from the probplot function signature.


In [ ]:
fig, ax = pyplot.subplots(figsize=(3, 7))

numpy.random.seed(0)
new_sample = numpy.random.lognormal(mean=2.0, sigma=0.75, size=37)

probscale.probplot(
    new_sample,
    ax=ax,
    probax='y', # flip the plot
    datascale='log', # scale of the non-probability axis
    bestfit=True, # draw a best-fit line
    estimate_ci=True,
    datalabel='Lognormal Values',  # labels and markers...
    problabel='Non-exceedance probability',
    scatter_kws=dict(marker='d', zorder=2, mew=1.25, mec='w', markersize=10),
    line_kws=dict(color='0.17', linewidth=2.5, zorder=0, alpha=0.75),
)

ax.set_ylim(bottom=1, top=99)
seaborn.despine(fig=fig)

Percentile and Quanitile plots

For convenience, you can do percetile and quantile plots with the same function.

.. note:: The percentile and probability axes are plotted against the same values. The difference is only that "percentiles" are plotted on a linear scale.

In [ ]:
fig, (ax1, ax2, ax3) = pyplot.subplots(nrows=3, figsize=(8, 7))

probscale.probplot(sample, ax=ax1, plottype='pp', problabel='Percentiles')
probscale.probplot(sample, ax=ax2, plottype='qq', problabel='Quantiles')
probscale.probplot(sample, ax=ax3, plottype='prob', problabel='Probabilities')

ax2.set_xlim(left=-2.5, right=2.5)
ax3.set_xlim(left=0.5, right=99.5)
fig.tight_layout()
seaborn.despine(fig=fig)

Working with seaborn FacetGrids

Good news, everyone. The probplot function generally works as expected with FacetGrids.


In [ ]:
plot = (
    seaborn.load_dataset("tips")
        .assign(pct=lambda df: 100 * df['tip'] / df['total_bill'])
        .pipe(seaborn.FacetGrid, hue='sex', col='time', row='smoker', margin_titles=True, aspect=1., size=4)
        .map(probscale.probplot, 'pct', bestfit=True, scatter_kws=dict(alpha=0.75), probax='y')
        .add_legend()
        .set_ylabels('Non-Exceedance Probabilty')
        .set_xlabels('Tips as percent of total bill')
        .set(ylim=(0.5, 99.5), xlim=(0, 100))
)