In [1]:
from __future__ import print_function

When analyzing data, I usually use the following three modules. I use pandas for data management, filtering, grouping, and processing. I use numpy for basic array math. I use toyplot for rendering the charts.

In [2]:
import pandas
import numpy
import toyplot
import toyplot.pdf
import toyplot.png
import toyplot.svg

print('Pandas version:  ', pandas.__version__)
print('Numpy version:   ', numpy.__version__)
print('Toyplot version: ', toyplot.__version__)

Pandas version:   0.19.2
Numpy version:    1.12.0
Toyplot version:  0.14.0-dev

Load in the "auto" dataset. This is a fun collection of data on cars manufactured between 1970 and 1982. The source for this data can be found at

The data are stored in a text file containing columns of data. We use the pandas.read_table() method to parse the data and load it in a pandas DataFrame. The file does not contain a header row, so we need to specify the names of the columns manually.

In [3]:
column_names = ['MPG',
                'Model Year',
                'Car Name']
data = pandas.read_table('',

In this plot we are going to show the trend of the average miles per gallon (MPG) rating for subsequent model years. This time period saw a significant increase in MPG driven by the U.S. fuel crisis. We can use the pivot_table feature of pandas to get this information from the data. (Excel and other spreadsheets have similar functionality.)

In [4]:
average_mpg_per_year = data.pivot_table(columns='Model Year',

Model Year
70    17.689655
71    21.250000
72    18.714286
73    17.100000
74    22.703704
75    20.266667
76    21.573529
77    23.375000
78    24.061111
79    25.093103
80    33.696552
81    30.334483
82    31.709677
Name: MPG, dtype: float64

Now use toyplot to plot this trend on a standard x-y chart.

In [5]:
canvas = toyplot.Canvas('4in', '2.6in')

axes = canvas.cartesian(bounds=(41,-11,1,-41),
                        xlabel = 'Model Year',
                        ylabel = 'Average MPG')

axes.plot(average_mpg_per_year.index + 1900, average_mpg_per_year)

# It's usually best to make the y-axis 0-based.
axes.y.domain.min = 0

1970197419781982Model Year0102030Average MPG

In [6]:
toyplot.pdf.render(canvas, 'XY_Trend.pdf')
toyplot.svg.render(canvas, 'XY_Trend.svg')
toyplot.png.render(canvas, 'XY_Trend.png', scale=5)

In [ ]: