In [1]:
    
from __future__ import print_function
    
When analyzing data, I usually use the following three modules. I use pandas for data management, filtering, grouping, and processing. I use numpy for basic array math. I use toyplot for rendering the charts.
In [2]:
    
import pandas
import numpy
import toyplot
import toyplot.pdf
import toyplot.png
import toyplot.svg
print('Pandas version:  ', pandas.__version__)
print('Numpy version:   ', numpy.__version__)
print('Toyplot version: ', toyplot.__version__)
    
    
Load in the "auto" dataset. This is a fun collection of data on cars manufactured between 1970 and 1982. The source for this data can be found at https://archive.ics.uci.edu/ml/datasets/Auto+MPG.
The data are stored in a text file containing columns of data. We use the pandas.read_table() method to parse the data and load it in a pandas DataFrame. The file does not contain a header row, so we need to specify the names of the columns manually.
In [3]:
    
column_names = ['MPG',
                'Cylinders',
                'Displacement',
                'Horsepower',
                'Weight',
                'Acceleration',
                'Model Year',
                'Origin',
                'Car Name']
data = pandas.read_table('auto-mpg.data',
                         delim_whitespace=True,
                         names=column_names,
                         index_col=False)
    
In this plot we are going to show the trend of the average miles per gallon (MPG) rating for subsequent model years. This time period saw a significant increase in MPG driven by the U.S. fuel crisis. We can use the pivot_table feature of pandas to get this information from the data. (Excel and other spreadsheets have similar functionality.)
In [4]:
    
average_mpg_per_year = data.pivot_table(columns='Model Year',
                                        values='MPG',
                                        aggfunc='mean')
average_mpg_per_year
    
    Out[4]:
Now use toyplot to plot this trend on a standard x-y chart.
In [5]:
    
canvas = toyplot.Canvas('4in', '2.6in')
axes = canvas.cartesian(bounds=(41,-11,1,-41),
                        xlabel = 'Model Year',
                        ylabel = 'Average MPG')
axes.plot(average_mpg_per_year.index + 1900, average_mpg_per_year)
# It's usually best to make the y-axis 0-based.
axes.y.domain.min = 0
    
    
In [6]:
    
toyplot.pdf.render(canvas, 'XY_Trend.pdf')
toyplot.svg.render(canvas, 'XY_Trend.svg')
toyplot.png.render(canvas, 'XY_Trend.png', scale=5)
    
In [ ]: