In [1]:
from __future__ import print_function

When analyzing data, I usually use the following three modules. I use pandas for data management, filtering, grouping, and processing. I use numpy for basic array math. I use toyplot for rendering the charts.


In [2]:
import pandas
import numpy
import toyplot
import toyplot.pdf
import toyplot.png
import toyplot.svg

print('Pandas version:  ', pandas.__version__)
print('Numpy version:   ', numpy.__version__)
print('Toyplot version: ', toyplot.__version__)


Pandas version:   0.19.2
Numpy version:    1.12.0
Toyplot version:  0.14.0-dev

Load in the "auto" dataset. This is a fun collection of data on cars manufactured between 1970 and 1982. The source for this data can be found at https://archive.ics.uci.edu/ml/datasets/Auto+MPG.

The data are stored in a text file containing columns of data. We use the pandas.read_table() method to parse the data and load it in a pandas DataFrame. The file does not contain a header row, so we need to specify the names of the columns manually.


In [3]:
column_names = ['MPG',
                'Cylinders',
                'Displacement',
                'Horsepower',
                'Weight',
                'Acceleration',
                'Model Year',
                'Origin',
                'Car Name']
data = pandas.read_table('auto-mpg.data',
                         delim_whitespace=True,
                         names=column_names,
                         index_col=False)

In this plot we are going to show the trend of the average miles per gallon (MPG) rating for subsequent model years. This time period saw a significant increase in MPG driven by the U.S. fuel crisis. We can use the pivot_table feature of pandas to get this information from the data. (Excel and other spreadsheets have similar functionality.)


In [4]:
average_mpg_per_year = data.pivot_table(columns='Model Year',
                                        values='MPG',
                                        aggfunc='mean')
average_mpg_per_year


Out[4]:
Model Year
70    17.689655
71    21.250000
72    18.714286
73    17.100000
74    22.703704
75    20.266667
76    21.573529
77    23.375000
78    24.061111
79    25.093103
80    33.696552
81    30.334483
82    31.709677
Name: MPG, dtype: float64

Now use toyplot to plot this trend on a standard x-y chart.


In [5]:
canvas = toyplot.Canvas('4in', '2.6in')

axes = canvas.cartesian(bounds=(41,-11,1,-41),
                        xlabel = 'Model Year',
                        ylabel = 'Average MPG')

axes.plot(average_mpg_per_year.index + 1900, average_mpg_per_year)

# It's usually best to make the y-axis 0-based.
axes.y.domain.min = 0


1970197419781982Model Year0102030Average MPG

In [6]:
toyplot.pdf.render(canvas, 'XY_Trend.pdf')
toyplot.svg.render(canvas, 'XY_Trend.svg')
toyplot.png.render(canvas, 'XY_Trend.png', scale=5)

In [ ]: