In [1]:
from __future__ import division, unicode_literals
import pandas as pd
import numpy as np
import matplotlib
%matplotlib inline
matplotlib.style.use('ggplot')
In [2]:
df = pd.read_excel('./input/complete_data.xls')
df.head()
Out[2]:
There are several charting library in Python, and we will look at some of the most popular today. We will start with building some absolute basics, like bar and line charts, then move on to something a little more fun. All the charts charts below are feed our DataFrame object we created earlier.
The first is MatPlotLib, the grand-daddy of them all. Its powerful, flexable, and totally ubiquitous. Unfortunatly, with great power comes great complexity. Here is an example of how to draw a simple line chart using the matplotlib api only. Today MatPlotLib is usually used as a base libray that is further abstracted by higher level tools.
In [13]:
import datetime
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.cbook as cbook
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
yearsFmt = mdates.DateFormatter('%Y')
# load up some sample data to plot
datafile = cbook.get_sample_data('goog.npy')
r = np.load(datafile, encoding='bytes').view(np.recarray)
fig, ax = plt.subplots()
ax.plot(r.date, r.adj_close)
# format the ticks
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
ax.xaxis.set_minor_locator(months)
datemin = datetime.date(r.date.min().year, 1, 1)
datemax = datetime.date(r.date.max().year + 1, 1, 1)
ax.set_xlim(datemin, datemax)
# format the coords message box
def price(x):
return '$%1.2f' % x
ax.format_xdata = mdates.DateFormatter('%Y-%m-%d')
ax.format_ydata = price
ax.grid(True)
# rotates and right aligns the x labels, and moves the bottom of the
# axes up to make room for them
fig.autofmt_xdate()
plt.show()
Pandas uses matplotlib internally as its low level graphing api, but presents us with an api that abstracts away the complexity. This leaves us with a simple api to produce charts and graphs, while still having the raw power to customize should we be so bold.
http://pandas.pydata.org/pandas-docs/stable/visualization.html
In [5]:
sales_by_month = df[['date', 'quantity']].set_index('date')
sales_by_month.resample('M', how=sum).plot(title="Total Sales by Month")
Out[5]:
In [4]:
summary = df[['ext price', 'name']].groupby('name').sum()
summary.plot(kind='bar', title="Total Sales by Account")
Out[4]:
In [6]:
purchase_patterns = df[['ext price','date']]
purchase_patterns.head()
purchase_plot = purchase_patterns['ext price'].hist(bins=40)
purchase_plot.set_title("Purchase Patterns")
purchase_plot.set_xlabel("Order Amount($)")
purchase_plot.set_ylabel("Number of orders")
Out[6]:
In [7]:
sales_by_sku=df[['name', 'sku', 'ext price']].groupby(['name', 'sku']).sum()
sales_by_sku.head()
Out[7]:
In [8]:
sales_by_sku.unstack().head()
Out[8]:
In [9]:
my_plot = sales_by_sku.unstack().plot(kind='bar',stacked=True, title="Total Sales by Customer and SKU", legend=None)
my_plot.set_xlabel("Customers")
my_plot.set_ylabel("Sales")
Out[9]:
Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, and to extend this capability with high-performance interactivity over very large or streaming datasets.
In [10]:
from bokeh.charts import Bar, Histogram, show, output_notebook
output_notebook()
In [11]:
b = Bar(summary, label='name', values='ext price', title="Total Sales by Account")
show(b)
Out[11]:
In [12]:
hist = Histogram(df, values='ext price', bins=40, legend=True)
show(hist)
Out[12]: