In [1]:
from __future__ import print_function
When analyzing data, I usually use the following three modules. I use pandas for data management, filtering, grouping, and processing. I use numpy for basic array math. I use toyplot for rendering the charts.
In [2]:
import pandas
import numpy
import toyplot
import toyplot.pdf
import toyplot.png
import toyplot.svg
print('Pandas version: ', pandas.__version__)
print('Numpy version: ', numpy.__version__)
print('Toyplot version: ', toyplot.__version__)
Load in the "auto" dataset. This is a fun collection of data on cars manufactured between 1970 and 1982. The source for this data can be found at https://archive.ics.uci.edu/ml/datasets/Auto+MPG.
The data are stored in a text file containing columns of data. We use the pandas.read_table() method to parse the data and load it in a pandas DataFrame. The file does not contain a header row, so we need to specify the names of the columns manually.
In [3]:
column_names = ['MPG',
'Cylinders',
'Displacement',
'Horsepower',
'Weight',
'Acceleration',
'Model Year',
'Origin',
'Car Name']
data = pandas.read_table('auto-mpg.data',
delim_whitespace=True,
names=column_names,
index_col=False)
In this plot we are going to show the trend of the average miles per gallon (MPG) rating for subsequent model years. This time period saw a significant increase in MPG driven by the U.S. fuel crisis. We can use the pivot_table feature of pandas to get this information from the data. (Excel and other spreadsheets have similar functionality.)
In [4]:
average_mpg_per_year = data.pivot_table(columns='Model Year',
values='MPG',
aggfunc='mean')
average_mpg_per_year
Out[4]:
Now use toyplot to plot this trend on a standard x-y chart.
In [5]:
canvas = toyplot.Canvas('4in', '2.6in')
axes = canvas.cartesian(bounds=(41,-11,1,-41),
xlabel = 'Model Year',
ylabel = 'Average MPG')
axes.plot(average_mpg_per_year.index + 1900, average_mpg_per_year)
# It's usually best to make the y-axis 0-based.
axes.y.domain.min = 0
In [6]:
toyplot.pdf.render(canvas, 'XY_Trend.pdf')
toyplot.svg.render(canvas, 'XY_Trend.svg')
toyplot.png.render(canvas, 'XY_Trend.png', scale=5)
In [ ]: