Matplotlib is highly customizable, and having a huge code base means it might not be easy to find what I need quickly.

A recurring problem that I often face is customizing figure legend. Although Matplotlib website provides excellent document, I decided to write down some tricks that I found useful on the topic of handling figure legends.

First, as always, load in useful libraries and enable matplotlib magic.


In [77]:
%matplotlib inline

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns
from itertools import izip
import pandas as pd
import numpy as np

The first thing I found useful is to create a figure legend out of nowhere.

In this example, I synthesized poll data with 'yes' or 'no' as the answer, and try to plot color-coded bar graph from these two data points.


In [54]:
ax = plt.subplot(111)
answers = ['yes','no']
votes = [10,20]
sns.barplot(x=answers, y = votes, palette=colors, ax=ax)
sns.despine()


From the above plot, legend cannot be plotted using ax.legend(), since they were not labelled. In this case, I need to use patches from matplotlib to make the legend handles, and add to the figure by ax.legend()


In [57]:
ax = plt.subplot(111)
answers = ['yes','no']
votes = [10,20]
sns.barplot(x=answers, y = votes, palette=colors, ax=ax)
sns.despine()
colors = ['blue','red']
pat = [mpatches.Patch(color=col, label=lab) for col, lab in zip(colors, answers)]
ax.legend(handles=pat, bbox_to_anchor = (1,0.5))


Out[57]:
<matplotlib.legend.Legend at 0x7463d0d55f10>

Another frequently-encountered problem is the duplicate legend labels.

To illustrate this problem, I will simulate a data of movement of 10 particles of two types bwtween two time points in a 2D space (x1, y1 are the initial coordinates; x2, y2 are the new coordinates; the column 'label' indicates the particle types). I am also writing a color encoder function for assigning distintive color to each particle type.


In [95]:
def color_encoder(xs, colors=sns.color_palette('Dark2',8)):
    '''
    color encoding a categoric vector
    '''
    xs = pd.Series(xs)
    encoder = {x:col for x, col in izip(xs.unique(), colors)}
    return xs.map(encoder)

sim = pd.DataFrame(np.random.rand(10,4), columns = ['x1','x2', 'y1','y2']) \
    .assign(label = lambda d: np.random.binomial(1, 0.5, 10)) \
    .assign(color = lambda d: color_encoder(d.label))
sim.head()


Out[95]:
x1 x2 y1 y2 label color
0 0.902625 0.755530 0.211558 0.512878 1 (0.105882352941, 0.619607843137, 0.466666666667)
1 0.327010 0.275663 0.876240 0.821259 1 (0.105882352941, 0.619607843137, 0.466666666667)
2 0.193913 0.934108 0.746931 0.826095 1 (0.105882352941, 0.619607843137, 0.466666666667)
3 0.190888 0.263192 0.331592 0.081737 0 (0.850980392157, 0.372549019608, 0.0078431372549)
4 0.884696 0.221513 0.346046 0.071234 0 (0.850980392157, 0.372549019608, 0.0078431372549)

To plot the movement, I iterate over the pandas DataFrame object and plotted a line between the initial and the new coodinate for each particle at a time.


In [96]:
fig = plt.figure()
ax = fig.add_subplot(111)
for index , row in sim.iterrows():
    ax.plot([row['x1'], row['x2']], [row['y1'],row['y2']], 
               label = row['label'], 
               color = row['color'])
ax.legend()
sns.despine()


And the default legend is producing a handler for each line. To simplify the legend, I found a elegant solution of stackoverflow, that used dict object in python to remove redundant legend labels.


In [97]:
fig = plt.figure()
ax = fig.add_subplot(111)
for index , row in sim.iterrows():
    ax.plot([row['x1'], row['x2']], [row['y1'],row['y2']], 
               label = row['label'], 
               color = row['color'])
ax.legend()
handles, labels = ax.get_legend_handles_labels()  
lgd = dict(zip(labels, handles))
plt.legend(lgd.values(), lgd.keys())
sns.despine()