Pivot tables

Adapted from http://nbviewer.jupyter.org/url/pbpython.com/extras/Pandas-Pivot-Table-Explained.ipynb Documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot_table.html



In [ ]:

    
import pandas as pd
import numpy as np



In [ ]:

    
df = pd.read_excel('sales-funnel.xlsx')
df.head()



In [ ]:

    
df["Status"] = df["Status"].astype("category")
df["Status"].cat.set_categories(["won","pending","presented","declined"],inplace=True)
df.head()



In [ ]:

    
df.pivot_table(index="Name")  # change index and show aggregate mean of numerical features



In [ ]:

    
pivot1 = df.pivot_table(index=["Name", "Rep", "Manager"])   # can have multi index
pivot1

How pivot tables work

Sets the index to the desired features, then groupby the new multi index and aggregate and sort the features.



In [ ]:

    
df.set_index(["Name", "Rep", "Manager"])



In [ ]:

    
df2 = df.set_index(["Name", "Rep", "Manager"])
gr = df2.groupby(level=[0,1,2])
pivot2 = gr.agg('mean').sort_index(axis=1)
pivot2



In [ ]:

    
all(pivot2 == pivot1)

Can change the default aggregate function

Default is mean by can change that to something else.



In [ ]:

    
pd.pivot_table(df, index=["Manager","Rep"], values=["Price"], aggfunc='sum')

Can also provide apply a list of aggregate functions (though not a list of strings, nor a mix of strings and functions)



In [ ]:

    
pd.pivot_table(df, index=["Manager","Rep"], values=["Price"], aggfunc=[np.mean, np.sum])

Can provide a dict that maps features to one or more aggregate functions.



In [ ]:

    
table = pd.pivot_table(df,index=["Manager","Status"],columns=["Product"],values=["Quantity","Price"],
               aggfunc={"Quantity":len,"Price":[np.sum,np.mean]},fill_value=0)
table



In [ ]:

    
pd.pivot_table(df, index=["Manager","Rep"], aggfunc={"Quantity":len,"Price":[np.sum,np.mean]})

Can decompose aggregate values accross columns by another feature



In [ ]:

    
pd.pivot_table(df, index=["Manager","Rep"], aggfunc={"Quantity":len,"Price":[np.sum,np.mean]}, columns="Product")

Can replace the "scary" NaN values by something else.



In [ ]:

    
table = pd.pivot_table(df, index=["Manager","Rep"], aggfunc={"Quantity":len,"Price":[np.sum,np.mean]}, columns="Product", fill_value=0)



In [ ]: