The pivot table is a powerful tool to summarize and present data. Pandas has a function which allows you to quickly convert a DataFrame to a pivot table - http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot_table.html

This function is very useful but sometimes it can be tricky to remember how to use it to get the data formatted in a way you need.

This notebook will walk through how to use the pivot_table.

The full blog post for this article is here - http://pbpython.com/pandas-pivot-table-explained.html

Read in the data



In [1]:

    
import pandas as pd
import numpy as np

Read in our sales funnel data into our DataFrame



In [2]:

    
df = pd.read_excel("../in/sales-funnel.xlsx")
df.head()









    Out[2]:






  
    
      
      Account
      Name
      Rep
      Manager
      Product
      Quantity
      Price
      Status
    
  
  
    
      0
       714466
                    Trantow-Barrows
        Craig Booker
       Debra Henley
               CPU
       1
       30000
       presented
    
    
      1
       714466
                    Trantow-Barrows
        Craig Booker
       Debra Henley
          Software
       1
       10000
       presented
    
    
      2
       714466
                    Trantow-Barrows
        Craig Booker
       Debra Henley
       Maintenance
       2
        5000
         pending
    
    
      3
       737550
       Fritsch, Russel and Anderson
        Craig Booker
       Debra Henley
               CPU
       1
       35000
        declined
    
    
      4
       146832
                       Kiehn-Spinka
       Daniel Hilton
       Debra Henley
               CPU
       2
       65000
             won

For convenience sake, let's define the status column as a category and set the order we want to view.

This isn't strictly required but helps us keep the order we want as we work through analyzing the data.



In [3]:

    
df["Status"] = df["Status"].astype("category")
df["Status"].cat.set_categories(["won","pending","presented","declined"],inplace=True)

Pivot the data

As we build up the pivot table, I think it's easiest to take it one step at a time. Add items one at a time and check each step to verify you are getting the results you expect.

The simplest pivot table must have a dataframe and an index. In this case, let's use the Name as our index.



In [4]:

    
pd.pivot_table(df,index=["Name"])









    Out[4]:






  
    
      
      Account
      Price
      Quantity
    
    
      Name
      
      
      
    
  
  
    
      Barton LLC
       740150
        35000
       1.000000
    
    
      Fritsch, Russel and Anderson
       737550
        35000
       1.000000
    
    
      Herman LLC
       141962
        65000
       2.000000
    
    
      Jerde-Hilpert
       412290
         5000
       2.000000
    
    
      Kassulke, Ondricka and Metz
       307599
         7000
       3.000000
    
    
      Keeling LLC
       688981
       100000
       5.000000
    
    
      Kiehn-Spinka
       146832
        65000
       2.000000
    
    
      Koepp Ltd
       729833
        35000
       2.000000
    
    
      Kulas Inc
       218895
        25000
       1.500000
    
    
      Purdy-Kunde
       163416
        30000
       1.000000
    
    
      Stokes LLC
       239344
         7500
       1.000000
    
    
      Trantow-Barrows
       714466
        15000
       1.333333

You can have multiple indexes as well. In fact, most of the pivot_table args can take multiple values via a list.



In [5]:

    
pd.pivot_table(df,index=["Name","Rep","Manager"])









    Out[5]:






  
    
      
      
      
      Account
      Price
      Quantity
    
    
      Name
      Rep
      Manager
      
      
      
    
  
  
    
      Barton LLC
      John Smith
      Debra Henley
       740150
        35000
       1.000000
    
    
      Fritsch, Russel and Anderson
      Craig Booker
      Debra Henley
       737550
        35000
       1.000000
    
    
      Herman LLC
      Cedric Moss
      Fred Anderson
       141962
        65000
       2.000000
    
    
      Jerde-Hilpert
      John Smith
      Debra Henley
       412290
         5000
       2.000000
    
    
      Kassulke, Ondricka and Metz
      Wendy Yule
      Fred Anderson
       307599
         7000
       3.000000
    
    
      Keeling LLC
      Wendy Yule
      Fred Anderson
       688981
       100000
       5.000000
    
    
      Kiehn-Spinka
      Daniel Hilton
      Debra Henley
       146832
        65000
       2.000000
    
    
      Koepp Ltd
      Wendy Yule
      Fred Anderson
       729833
        35000
       2.000000
    
    
      Kulas Inc
      Daniel Hilton
      Debra Henley
       218895
        25000
       1.500000
    
    
      Purdy-Kunde
      Cedric Moss
      Fred Anderson
       163416
        30000
       1.000000
    
    
      Stokes LLC
      Cedric Moss
      Fred Anderson
       239344
         7500
       1.000000
    
    
      Trantow-Barrows
      Craig Booker
      Debra Henley
       714466
        15000
       1.333333

This is interesting but not particularly useful. What we probably want to do is look at this by Manager and Director. It's easy enough to do by changing the index.



In [6]:

    
pd.pivot_table(df,index=["Manager","Rep"])









    Out[6]:






  
    
      
      
      Account
      Price
      Quantity
    
    
      Manager
      Rep
      
      
      
    
  
  
    
      Debra Henley
      Craig Booker
       720237.0
       20000.000000
       1.250000
    
    
      Daniel Hilton
       194874.0
       38333.333333
       1.666667
    
    
      John Smith
       576220.0
       20000.000000
       1.500000
    
    
      Fred Anderson
      Cedric Moss
       196016.5
       27500.000000
       1.250000
    
    
      Wendy Yule
       614061.5
       44250.000000
       3.000000

You can see that the pivot table is smart enough to start aggregating the data and summarizing it by grouping the reps with their managers. Now we start to get a glimpse of what a pivot table can do for us.

For this purpose, the Account and Quantity columns aren't really useful. Let's remove it by explicitly defining the columns we care about using the values field.



In [7]:

    
pd.pivot_table(df,index=["Manager","Rep"],values=["Price"])









    Out[7]:






  
    
      
      
      Price
    
    
      Manager
      Rep
      
    
  
  
    
      Debra Henley
      Craig Booker
       20000
    
    
      Daniel Hilton
       38333
    
    
      John Smith
       20000
    
    
      Fred Anderson
      Cedric Moss
       27500
    
    
      Wendy Yule
       44250

The price column automatically averages the data but we can do a count or a sum. Adding them is simple using aggfunc.



In [8]:

    
pd.pivot_table(df,index=["Manager","Rep"],values=["Price"],aggfunc=np.sum)









    Out[8]:






  
    
      
      
      Price
    
    
      Manager
      Rep
      
    
  
  
    
      Debra Henley
      Craig Booker
        80000
    
    
      Daniel Hilton
       115000
    
    
      John Smith
        40000
    
    
      Fred Anderson
      Cedric Moss
       110000
    
    
      Wendy Yule
       177000

aggfunc can take a list of functions. Let's try a mean using the numpy functions and len to get a count.



In [9]:

    
pd.pivot_table(df,index=["Manager","Rep"],values=["Price"],aggfunc=[np.mean,len])









    Out[9]:






  
    
      
      
      mean
      len
    
    
      
      
      Price
      Price
    
    
      Manager
      Rep
      
      
    
  
  
    
      Debra Henley
      Craig Booker
       20000
       4
    
    
      Daniel Hilton
       38333
       3
    
    
      John Smith
       20000
       2
    
    
      Fred Anderson
      Cedric Moss
       27500
       4
    
    
      Wendy Yule
       44250
       4

If we want to see sales broken down by the products, the columns variable allows us to define one or more columns.



In [10]:

    
pd.pivot_table(df,index=["Manager","Rep"],values=["Price"],
               columns=["Product"],aggfunc=[np.sum])









    Out[10]:






  
    
      
      
      sum
    
    
      
      
      Price
    
    
      
      Product
      CPU
      Maintenance
      Monitor
      Software
    
    
      Manager
      Rep
      
      
      
      
    
  
  
    
      Debra Henley
      Craig Booker
        65000
       5000
        NaN
       10000
    
    
      Daniel Hilton
       105000
        NaN
        NaN
       10000
    
    
      John Smith
        35000
       5000
        NaN
         NaN
    
    
      Fred Anderson
      Cedric Moss
        95000
       5000
        NaN
       10000
    
    
      Wendy Yule
       165000
       7000
       5000
         NaN

The NaN's are a bit distracting. If we want to remove them, we could use fill_value to set them to 0.



In [11]:

    
pd.pivot_table(df,index=["Manager","Rep"],values=["Price"],
               columns=["Product"],aggfunc=[np.sum],fill_value=0)









    Out[11]:






  
    
      
      
      sum
    
    
      
      
      Price
    
    
      
      Product
      CPU
      Maintenance
      Monitor
      Software
    
    
      Manager
      Rep
      
      
      
      
    
  
  
    
      Debra Henley
      Craig Booker
        65000
       5000
          0
       10000
    
    
      Daniel Hilton
       105000
          0
          0
       10000
    
    
      John Smith
        35000
       5000
          0
           0
    
    
      Fred Anderson
      Cedric Moss
        95000
       5000
          0
       10000
    
    
      Wendy Yule
       165000
       7000
       5000
           0

I think it would be useful to add the quantity as well. Add Quantity to the values list.



In [12]:

    
pd.pivot_table(df,index=["Manager","Rep"],values=["Price","Quantity"],
               columns=["Product"],aggfunc=[np.sum],fill_value=0)









    Out[12]:






  
    
      
      
      sum
    
    
      
      
      Price
      Quantity
    
    
      
      Product
      CPU
      Maintenance
      Monitor
      Software
      CPU
      Maintenance
      Monitor
      Software
    
    
      Manager
      Rep
      
      
      
      
      
      
      
      
    
  
  
    
      Debra Henley
      Craig Booker
        65000
       5000
          0
       10000
       2
       2
       0
       1
    
    
      Daniel Hilton
       105000
          0
          0
       10000
       4
       0
       0
       1
    
    
      John Smith
        35000
       5000
          0
           0
       1
       2
       0
       0
    
    
      Fred Anderson
      Cedric Moss
        95000
       5000
          0
       10000
       3
       1
       0
       1
    
    
      Wendy Yule
       165000
       7000
       5000
           0
       7
       3
       2
       0

What's interesting is that you can move items to the index to get a different visual representation. We can add the Products to the index.



In [13]:

    
pd.pivot_table(df,index=["Manager","Rep","Product"],
               values=["Price","Quantity"],aggfunc=[np.sum],fill_value=0)









    Out[13]:






  
    
      
      
      
      sum
    
    
      
      
      
      Price
      Quantity
    
    
      Manager
      Rep
      Product
      
      
    
  
  
    
      Debra Henley
      Craig Booker
      CPU
        65000
       2
    
    
      Maintenance
         5000
       2
    
    
      Software
        10000
       1
    
    
      Daniel Hilton
      CPU
       105000
       4
    
    
      Software
        10000
       1
    
    
      John Smith
      CPU
        35000
       1
    
    
      Maintenance
         5000
       2
    
    
      Fred Anderson
      Cedric Moss
      CPU
        95000
       3
    
    
      Maintenance
         5000
       1
    
    
      Software
        10000
       1
    
    
      Wendy Yule
      CPU
       165000
       7
    
    
      Maintenance
         7000
       3
    
    
      Monitor
         5000
       2

For this data set, this representation makes more sense. Now, what if I want to see some totals? margins=True does that for us.



In [14]:

    
pd.pivot_table(df,index=["Manager","Rep","Product"],
               values=["Price","Quantity"],
               aggfunc=[np.sum,np.mean],fill_value=0,margins=True)









    Out[14]:






  
    
      
      
      
      sum
      mean
    
    
      
      
      
      Price
      Quantity
      Price
      Quantity
    
    
      Manager
      Rep
      Product
      
      
      
      
    
  
  
    
      Debra Henley
      Craig Booker
      CPU
        65000
        2
       32500.000000
       1.000000
    
    
      Maintenance
         5000
        2
        5000.000000
       2.000000
    
    
      Software
        10000
        1
       10000.000000
       1.000000
    
    
      Daniel Hilton
      CPU
       105000
        4
       52500.000000
       2.000000
    
    
      Software
        10000
        1
       10000.000000
       1.000000
    
    
      John Smith
      CPU
        35000
        1
       35000.000000
       1.000000
    
    
      Maintenance
         5000
        2
        5000.000000
       2.000000
    
    
      Fred Anderson
      Cedric Moss
      CPU
        95000
        3
       47500.000000
       1.500000
    
    
      Maintenance
         5000
        1
        5000.000000
       1.000000
    
    
      Software
        10000
        1
       10000.000000
       1.000000
    
    
      Wendy Yule
      CPU
       165000
        7
       82500.000000
       3.500000
    
    
      Maintenance
         7000
        3
        7000.000000
       3.000000
    
    
      Monitor
         5000
        2
        5000.000000
       2.000000
    
    
      All
      
      
       522000
       30
       30705.882353
       1.764706

Let's move the analysis up a level and look at our pipeline at the manager level. Notice how the status is ordered based on our earlier category definition.



In [15]:

    
pd.pivot_table(df,index=["Manager","Status"],values=["Price"],
               aggfunc=[np.sum],fill_value=0,margins=True)









    Out[15]:






  
    
      
      
      sum
    
    
      
      
      Price
    
    
      Manager
      Status
      
    
  
  
    
      Debra Henley
      declined
        70000
    
    
      pending
        50000
    
    
      presented
        50000
    
    
      won
        65000
    
    
      Fred Anderson
      declined
        65000
    
    
      pending
         5000
    
    
      presented
        45000
    
    
      won
       172000
    
    
      All
      
       522000

A really handy feature is the ability to pass a dictionary to the aggfunc so you can perform different functions on each of the values you select. This has a side-effect of making the labels a little cleaner.



In [16]:

    
pd.pivot_table(df,index=["Manager","Status"],columns=["Product"],values=["Quantity","Price"],
               aggfunc={"Quantity":len,"Price":np.sum},fill_value=0)









    Out[16]:






  
    
      
      
      Price
      Quantity
    
    
      
      Product
      CPU
      Maintenance
      Monitor
      Software
      CPU
      Maintenance
      Monitor
      Software
    
    
      Manager
      Status
      
      
      
      
      
      
      
      
    
  
  
    
      Debra Henley
      declined
        70000
           0
          0
           0
       2
       0
       0
       0
    
    
      pending
        40000
       10000
          0
           0
       1
       2
       0
       0
    
    
      presented
        30000
           0
          0
       20000
       1
       0
       0
       2
    
    
      won
        65000
           0
          0
           0
       1
       0
       0
       0
    
    
      Fred Anderson
      declined
        65000
           0
          0
           0
       1
       0
       0
       0
    
    
      pending
            0
        5000
          0
           0
       0
       1
       0
       0
    
    
      presented
        30000
           0
       5000
       10000
       1
       0
       1
       1
    
    
      won
       165000
        7000
          0
           0
       2
       1
       0
       0

You can provide a list of aggfunctions to apply to each value too:



In [17]:

    
table = pd.pivot_table(df,index=["Manager","Status"],columns=["Product"],values=["Quantity","Price"],
               aggfunc={"Quantity":len,"Price":[np.sum,np.mean]},fill_value=0)
table









    Out[17]:






  
    
      
      
      Price
      Quantity
    
    
      
      
      mean
      sum
      len
    
    
      
      Product
      CPU
      Maintenance
      Monitor
      Software
      CPU
      Maintenance
      Monitor
      Software
      CPU
      Maintenance
      Monitor
      Software
    
    
      Manager
      Status
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Debra Henley
      declined
       35000
          0
          0
           0
        70000
           0
          0
           0
       2
       0
       0
       0
    
    
      pending
       40000
       5000
          0
           0
        40000
       10000
          0
           0
       1
       2
       0
       0
    
    
      presented
       30000
          0
          0
       10000
        30000
           0
          0
       20000
       1
       0
       0
       2
    
    
      won
       65000
          0
          0
           0
        65000
           0
          0
           0
       1
       0
       0
       0
    
    
      Fred Anderson
      declined
       65000
          0
          0
           0
        65000
           0
          0
           0
       1
       0
       0
       0
    
    
      pending
           0
       5000
          0
           0
            0
        5000
          0
           0
       0
       1
       0
       0
    
    
      presented
       30000
          0
       5000
       10000
        30000
           0
       5000
       10000
       1
       0
       1
       1
    
    
      won
       82500
       7000
          0
           0
       165000
        7000
          0
           0
       2
       1
       0
       0

It can look daunting to try to pull this all together at once but as soon as you start playing with the data and slowly add the items, you can get a feel for how it works.

Advanced Pivot Table Filtering

Once you have generated your data, it is in a DataFrame so you can filter on it using your normal DataFrame functions.



In [18]:

    
table.query('Manager == ["Debra Henley"]')









    Out[18]:






  
    
      
      
      Price
      Quantity
    
    
      
      
      mean
      sum
      len
    
    
      
      Product
      CPU
      Maintenance
      Monitor
      Software
      CPU
      Maintenance
      Monitor
      Software
      CPU
      Maintenance
      Monitor
      Software
    
    
      Manager
      Status
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Debra Henley
      declined
       35000
          0
       0
           0
       70000
           0
       0
           0
       2
       0
       0
       0
    
    
      pending
       40000
       5000
       0
           0
       40000
       10000
       0
           0
       1
       2
       0
       0
    
    
      presented
       30000
          0
       0
       10000
       30000
           0
       0
       20000
       1
       0
       0
       2
    
    
      won
       65000
          0
       0
           0
       65000
           0
       0
           0
       1
       0
       0
       0



In [19]:

    
table.query('Status == ["pending","won"]')









    Out[19]:






  
    
      
      
      Price
      Quantity
    
    
      
      
      mean
      sum
      len
    
    
      
      Product
      CPU
      Maintenance
      Monitor
      Software
      CPU
      Maintenance
      Monitor
      Software
      CPU
      Maintenance
      Monitor
      Software
    
    
      Manager
      Status
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Debra Henley
      pending
       40000
       5000
       0
       0
        40000
       10000
       0
       0
       1
       2
       0
       0
    
    
      won
       65000
          0
       0
       0
        65000
           0
       0
       0
       1
       0
       0
       0
    
    
      Fred Anderson
      pending
           0
       5000
       0
       0
            0
        5000
       0
       0
       0
       1
       0
       0
    
    
      won
       82500
       7000
       0
       0
       165000
        7000
       0
       0
       2
       1
       0
       0

I hope this tutorial has shown you how to use pivot tables on your data sets.

	Account	Name	Rep	Manager	Product	Quantity	Price	Status
0	714466	Trantow-Barrows	Craig Booker	Debra Henley	CPU	1	30000	presented
1	714466	Trantow-Barrows	Craig Booker	Debra Henley	Software	1	10000	presented
2	714466	Trantow-Barrows	Craig Booker	Debra Henley	Maintenance	2	5000	pending
3	737550	Fritsch, Russel and Anderson	Craig Booker	Debra Henley	CPU	1	35000	declined
4	146832	Kiehn-Spinka	Daniel Hilton	Debra Henley	CPU	2	65000	won

	Account	Price	Quantity
Name
Barton LLC	740150	35000	1.000000
Fritsch, Russel and Anderson	737550	35000	1.000000
Herman LLC	141962	65000	2.000000
Jerde-Hilpert	412290	5000	2.000000
Kassulke, Ondricka and Metz	307599	7000	3.000000
Keeling LLC	688981	100000	5.000000
Kiehn-Spinka	146832	65000	2.000000
Koepp Ltd	729833	35000	2.000000
Kulas Inc	218895	25000	1.500000
Purdy-Kunde	163416	30000	1.000000
Stokes LLC	239344	7500	1.000000
Trantow-Barrows	714466	15000	1.333333

			Account	Price	Quantity
Name	Rep	Manager
Barton LLC	John Smith	Debra Henley	740150	35000	1.000000
Fritsch, Russel and Anderson	Craig Booker	Debra Henley	737550	35000	1.000000
Herman LLC	Cedric Moss	Fred Anderson	141962	65000	2.000000
Jerde-Hilpert	John Smith	Debra Henley	412290	5000	2.000000
Kassulke, Ondricka and Metz	Wendy Yule	Fred Anderson	307599	7000	3.000000
Keeling LLC	Wendy Yule	Fred Anderson	688981	100000	5.000000
Kiehn-Spinka	Daniel Hilton	Debra Henley	146832	65000	2.000000
Koepp Ltd	Wendy Yule	Fred Anderson	729833	35000	2.000000
Kulas Inc	Daniel Hilton	Debra Henley	218895	25000	1.500000
Purdy-Kunde	Cedric Moss	Fred Anderson	163416	30000	1.000000
Stokes LLC	Cedric Moss	Fred Anderson	239344	7500	1.000000
Trantow-Barrows	Craig Booker	Debra Henley	714466	15000	1.333333

		mean	len
		Price	Price
Manager	Rep
Debra Henley	Craig Booker	20000	4
	Daniel Hilton	38333	3
	John Smith	20000	2
Fred Anderson	Cedric Moss	27500	4
Fred Anderson	Wendy Yule	44250	4

		sum
		Price
	Product	CPU	Maintenance	Monitor	Software
Manager	Rep
Debra Henley	Craig Booker	65000	5000	NaN	10000
	Daniel Hilton	105000	NaN	NaN	10000
	John Smith	35000	5000	NaN	NaN
Fred Anderson	Cedric Moss	95000	5000	NaN	10000
Fred Anderson	Wendy Yule	165000	7000	5000	NaN

		sum
		Price				Quantity
	Product	CPU	Maintenance	Monitor	Software	CPU	Maintenance	Monitor	Software
Manager	Rep
Debra Henley	Craig Booker	65000	5000	0	10000	2	2	0	1
	Daniel Hilton	105000	0	0	10000	4	0	0	1
	John Smith	35000	5000	0	0	1	2	0	0
Fred Anderson	Cedric Moss	95000	5000	0	10000	3	1	0	1
Fred Anderson	Wendy Yule	165000	7000	5000	0	7	3	2	0

		sum
		Price
Manager	Status
Debra Henley	declined	70000
	pending	50000
	presented	50000
	won	65000
Fred Anderson	declined	65000
	pending	5000
	presented	45000
	won	172000
All		522000

		Price								Quantity
		mean				sum				len
	Product	CPU	Maintenance	Monitor	Software	CPU	Maintenance	Monitor	Software	CPU	Maintenance	Monitor	Software
Manager	Status
Debra Henley	declined	35000	0	0	0	70000	0	0	0	2	0	0	0
	pending	40000	5000	0	0	40000	10000	0	0	1	2	0	0
	presented	30000	0	0	10000	30000	0	0	20000	1	0	0	2
	won	65000	0	0	0	65000	0	0	0	1	0	0	0
Fred Anderson	declined	65000	0	0	0	65000	0	0	0	1	0	0	0
	pending	0	5000	0	0	0	5000	0	0	0	1	0	0
	presented	30000	0	5000	10000	30000	0	5000	10000	1	0	1	1
	won	82500	7000	0	0	165000	7000	0	0	2	1	0	0