Pandas 피봇과 그룹 연산

피봇 테이블

피봇 테이블(pivot table)이란 데이터 열(column) 중에서 두 개를 키(key)로 사용하여 데이터를 선택하는 방법을 말한다.

피봇 테이블을 사용하기 위해서는 키가 될 수 있는 두 개의 열(column) 혹은 필드(field)를 선택하여 이 두 열을

  • 행 인덱스 (row index)
  • 열 인덱스 (column index)

로 변경해야 한다.

  • pivot 메서드를 사용하면 행 인덱스, 열 인덱스, 자료가 될 3가지의 열(column)을 지정할 수 있다.

In [1]:
data = {
    'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
    'year': [2000, 2001, 2002, 2001, 2002],
    'pop': [1.5, 2.5, 3.0, 2.5, 3.5]
}
df = pd.DataFrame(data, columns=["state", "year", "pop"])
df


Out[1]:
state year pop
0 Ohio 2000 1.5
1 Ohio 2001 2.5
2 Ohio 2002 3.0
3 Nevada 2001 2.5
4 Nevada 2002 3.5

In [2]:
df.pivot("state", "year", "pop")


Out[2]:
year 2000 2001 2002
state
Nevada NaN 2.5 3.5
Ohio 1.5 2.5 3.0
  • 행 인덱스와, 열 인덱스가 될 자료는 키(key)의 역할을 해야 한다. 즉, 이 값으로 데이터가 유일하게(unique) 결정되어야 한다.

In [5]:
df.pivot("year", "pop", "state")


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-749f5b6847f3> in <module>()
----> 1 df.pivot("year", "pop", "state")

C:\Anaconda3\lib\site-packages\pandas\core\frame.py in pivot(self, index, columns, values)
   3844         """
   3845         from pandas.core.reshape import pivot
-> 3846         return pivot(self, index=index, columns=columns, values=values)
   3847 
   3848     def stack(self, level=-1, dropna=True):

C:\Anaconda3\lib\site-packages\pandas\core\reshape.py in pivot(self, index, columns, values)
    330         indexed = Series(self[values].values,
    331                          index=MultiIndex.from_arrays([index, self[columns]]))
--> 332         return indexed.unstack(columns)
    333 
    334 

C:\Anaconda3\lib\site-packages\pandas\core\series.py in unstack(self, level, fill_value)
   2041         """
   2042         from pandas.core.reshape import unstack
-> 2043         return unstack(self, level, fill_value)
   2044 
   2045     # ----------------------------------------------------------------------

C:\Anaconda3\lib\site-packages\pandas\core\reshape.py in unstack(obj, level, fill_value)
    405     else:
    406         unstacker = _Unstacker(obj.values, obj.index, level=level,
--> 407                                fill_value=fill_value)
    408         return unstacker.get_result()
    409 

C:\Anaconda3\lib\site-packages\pandas\core\reshape.py in __init__(self, values, index, level, value_columns, fill_value)
     99 
    100         self._make_sorted_values_labels()
--> 101         self._make_selectors()
    102 
    103     def _make_sorted_values_labels(self):

C:\Anaconda3\lib\site-packages\pandas\core\reshape.py in _make_selectors(self)
    137 
    138         if mask.sum() < len(self.index):
--> 139             raise ValueError('Index contains duplicate entries, '
    140                              'cannot reshape')
    141 

ValueError: Index contains duplicate entries, cannot reshape

In [4]:
df.set_index(["state", "year"])


Out[4]:
pop
state year
Ohio 2000 1.5
2001 2.5
2002 3.0
Nevada 2001 2.5
2002 3.5

In [5]:
df.set_index(["state", "year"]).unstack()


Out[5]:
pop
year 2000 2001 2002
state
Nevada NaN 2.5 3.5
Ohio 1.5 2.5 3.0

그룹 연산

그룹 연산은 피봇 테이블과 달리 키에 의해서 결정되는 데이터가 복수개가 있어도 괜찮다. 대신 연산을 통해 복수개의 그룹 데이터에 대한 대표값을 정한다. 이를 split-apply-combine 연산이라고도 한다.

  • split 단계
    • 특정 Key 값에 따라 데이터 그룹을 만든다.
  • apply 단계

    • 각각의 그룹에 대해 원하는 연산을 하여 대표값을 생성한다.
      • count(), mean(), median(), min(), max()
      • sum(), prod(), std(), var(), quantile()
      • first(), last()
  • combine 단계

    • 그룹의 Key 값에 대해 원하는 연산의 결과를 Value로 지정한 dict를 생성한다.


In [6]:
np.random.seed(0)
df = pd.DataFrame({'key1': ['a', 'a', 'b', 'b', 'a'],
                   'key2': ['one', 'two', 'one', 'two', 'one'],
                   'data1': np.random.randn(5),
                   'data2': np.random.randn(5)})
df


Out[6]:
data1 data2 key1 key2
0 1.764052 -0.977278 a one
1 0.400157 0.950088 a two
2 0.978738 -0.151357 b one
3 2.240893 -0.103219 b two
4 1.867558 0.410599 a one
  • 문제: key1 값에 따른 data1의 평균은?

In [7]:
df.data1.groupby(df.key1).mean()


Out[7]:
key1
a    1.343923
b    1.609816
Name: data1, dtype: float64

In [8]:
gs = df.data1.groupby(df.key1)
gs


Out[8]:
<pandas.core.groupby.SeriesGroupBy object at 0x000000000A4E0940>

In [9]:
print("="*50)
for n, g in gs:
    print("[key]:", n)
    print("[group]:", type(g))
    print("-"*50)
    print(g)
    print("-"*50)
    print("[mean]:", g.mean())
    print("="*50)


==================================================
[key]: a
[group]: <class 'pandas.core.series.Series'>
--------------------------------------------------
0    1.764052
1    0.400157
4    1.867558
Name: data1, dtype: float64
--------------------------------------------------
[mean]: 1.343922514828285
==================================================
[key]: b
[group]: <class 'pandas.core.series.Series'>
--------------------------------------------------
2    0.978738
3    2.240893
Name: data1, dtype: float64
--------------------------------------------------
[mean]: 1.6098155916535986
==================================================

In [10]:
gs.mean()


Out[10]:
key1
a    1.343923
b    1.609816
Name: data1, dtype: float64
  • 문제: 복합 key (key1, key2) 값에 따른 data1의 평균은?

In [11]:
means = df.data1.groupby([df.key1, df.key2]).mean()
means


Out[11]:
key1  key2
a     one     1.815805
      two     0.400157
b     one     0.978738
      two     2.240893
Name: data1, dtype: float64

In [10]:
means = df.data1.groupby([df.key1, df.key2]).mean()
means


Out[10]:
key1  key2
a     one     1.815805
      two     0.400157
b     one     0.978738
      two     2.240893
Name: data1, dtype: float64

groupby 명령의 인수

  • groupby 명령에서 Key 인수로 입력할 수 있는 값은 다음과 같다.
    • 열 또는 열의 리스트
    • 행 인덱스
    • 사전/함수: Column의 값을 사전에 매핑(mapping)하거나 함수 처리하여 나온 결괏값을 키로 인식

In [15]:
np.random.seed(0)
people = pd.DataFrame(np.random.randn(5,5),
                      columns=['a','b','c','d','e'],
                      index=['Joe','Steve','Wes','Jim','Travis'])
people.ix[2:3, ['b', 'c']] = np.nan
people


Out[15]:
a b c d e
Joe 1.764052 0.400157 0.978738 2.240893 1.867558
Steve -0.977278 0.950088 -0.151357 -0.103219 0.410599
Wes 0.144044 NaN NaN 0.121675 0.443863
Jim 0.333674 1.494079 -0.205158 0.313068 -0.854096
Travis -2.552990 0.653619 0.864436 -0.742165 2.269755

In [16]:
print("="*80)
for n, g in people.groupby(people.index):
    print("[key]:", n)
    print("[group]:", type(g))
    print("-"*80)
    print(g)
    print("="*80)


================================================================================
[key]: Jim
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
            a         b         c         d         e
Jim  0.333674  1.494079 -0.205158  0.313068 -0.854096
================================================================================
[key]: Joe
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
            a         b         c         d         e
Joe  1.764052  0.400157  0.978738  2.240893  1.867558
================================================================================
[key]: Steve
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
              a         b         c         d         e
Steve -0.977278  0.950088 -0.151357 -0.103219  0.410599
================================================================================
[key]: Travis
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
              a         b         c         d         e
Travis -2.55299  0.653619  0.864436 -0.742165  2.269755
================================================================================
[key]: Wes
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
            a   b   c         d         e
Wes  0.144044 NaN NaN  0.121675  0.443863
================================================================================

In [17]:
mapping = {'Joe': 'J', 'Jim': 'J', 'Steve': 'S', 'Wes': 'S', 'Travis': 'S'}
print("="*80)
for n, g in people.groupby(mapping):
    print("[key]:", n)
    print("[group]:", type(g))
    print("-"*80)
    print(g)
    print("="*80)


================================================================================
[key]: J
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
            a         b         c         d         e
Joe  1.764052  0.400157  0.978738  2.240893  1.867558
Jim  0.333674  1.494079 -0.205158  0.313068 -0.854096
================================================================================
[key]: S
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
               a         b         c         d         e
Steve  -0.977278  0.950088 -0.151357 -0.103219  0.410599
Wes     0.144044       NaN       NaN  0.121675  0.443863
Travis -2.552990  0.653619  0.864436 -0.742165  2.269755
================================================================================

In [18]:
cap1 = lambda x: x[0].upper()
print("="*80)
for n, g in people.groupby(cap1):
    print("[key]:", n)
    print("[group]:", type(g))
    print("-"*80)
    print(g)
    print("="*80)


================================================================================
[key]: J
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
            a         b         c         d         e
Joe  1.764052  0.400157  0.978738  2.240893  1.867558
Jim  0.333674  1.494079 -0.205158  0.313068 -0.854096
================================================================================
[key]: S
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
              a         b         c         d         e
Steve -0.977278  0.950088 -0.151357 -0.103219  0.410599
================================================================================
[key]: T
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
              a         b         c         d         e
Travis -2.55299  0.653619  0.864436 -0.742165  2.269755
================================================================================
[key]: W
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
            a   b   c         d         e
Wes  0.144044 NaN NaN  0.121675  0.443863
================================================================================

In [21]:
print("="*80)
for n, g in people.groupby(people.columns, axis=1):
    print("[key]:", n)
    print("[group]:", type(g))
    print("-"*80)
    print(g)
    print("="*80)


================================================================================
[key]: a
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
               a
Joe     1.764052
Steve  -0.977278
Wes     0.144044
Jim     0.333674
Travis -2.552990
================================================================================
[key]: b
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
               b
Joe     0.400157
Steve   0.950088
Wes          NaN
Jim     1.494079
Travis  0.653619
================================================================================
[key]: c
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
               c
Joe     0.978738
Steve  -0.151357
Wes          NaN
Jim    -0.205158
Travis  0.864436
================================================================================
[key]: d
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
               d
Joe     2.240893
Steve  -0.103219
Wes     0.121675
Jim     0.313068
Travis -0.742165
================================================================================
[key]: e
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
               e
Joe     1.867558
Steve   0.410599
Wes     0.443863
Jim    -0.854096
Travis  2.269755
================================================================================

In [22]:
mapping = {'a': 'red', 'b': 'red', 'c': 'blue', 'd': 'blue', 'e': 'red', 'f' : 'orange'}
for n, g in people.groupby(mapping, axis=1):
    print("[key]:", n)
    print("[group]:", type(g))
    print("-"*80)
    print(g)
    print("="*80)


[key]: blue
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
               c         d
Joe     0.978738  2.240893
Steve  -0.151357 -0.103219
Wes          NaN  0.121675
Jim    -0.205158  0.313068
Travis  0.864436 -0.742165
================================================================================
[key]: red
[group]: <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------------------------------------
               a         b         e
Joe     1.764052  0.400157  1.867558
Steve  -0.977278  0.950088  0.410599
Wes     0.144044       NaN  0.443863
Jim     0.333674  1.494079 -0.854096
Travis -2.552990  0.653619  2.269755
================================================================================

특별한 group 별 연산

  • 통계

    • describe()
  • 그룹을 대표하는 하나의 값을 계산

    • agg(), aggregate()
  • 대표값으로 필드를 교체

    • transform()
  • 그룹 전체를 변형하는 계산

    • apply()

TIP 데이터 예제


In [23]:
%cd /home/dockeruser/data/pydata-book-master


[WinError 3] 지정된 경로를 찾을 수 없습니다: '/home/dockeruser/data/pydata-book-master'
C:\Users\Administrator\Documents\@수학자료\160502월_1일차_분석 환경, 소개

In [27]:
tips = pd.read_csv('../../pydata-book-master/ch08/tips.csv')

In [28]:
tips.head()


Out[28]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

In [32]:
tips['tip_pct'] = tips['tip'] / tips['total_bill']
tips.tail()


Out[32]:
total_bill tip sex smoker day time size tip_pct
239 29.03 5.92 Male No Sat Dinner 3 0.203927
240 27.18 2.00 Female Yes Sat Dinner 2 0.073584
241 22.67 2.00 Male Yes Sat Dinner 2 0.088222
242 17.82 1.75 Male No Sat Dinner 2 0.098204
243 18.78 3.00 Female No Thur Dinner 2 0.159744

In [33]:
tips.describe()


Out[33]:
total_bill tip size tip_pct
count 244.000000 244.000000 244.000000 244.000000
mean 19.785943 2.998279 2.569672 0.160803
std 8.902412 1.383638 0.951100 0.061072
min 3.070000 1.000000 1.000000 0.035638
25% 13.347500 2.000000 2.000000 0.129127
50% 17.795000 2.900000 2.000000 0.154770
75% 24.127500 3.562500 3.000000 0.191475
max 50.810000 10.000000 6.000000 0.710345

그룹별 통계


In [34]:
tips.groupby(["sex", "smoker"])[["tip", "tip_pct"]].describe()


Out[34]:
tip tip_pct
sex smoker
Female No count 54.000000 54.000000
mean 2.773519 0.156921
std 1.128425 0.036421
min 1.000000 0.056797
25% 2.000000 0.139708
50% 2.680000 0.149691
75% 3.437500 0.181630
max 5.200000 0.252672
Yes count 33.000000 33.000000
mean 2.931515 0.182150
std 1.219916 0.071595
min 1.000000 0.056433
25% 2.000000 0.152439
50% 2.880000 0.173913
75% 3.500000 0.198216
max 6.500000 0.416667
Male No count 97.000000 97.000000
mean 3.113402 0.160669
std 1.489559 0.041849
min 1.250000 0.071804
25% 2.000000 0.131810
50% 2.740000 0.157604
75% 3.710000 0.186220
max 9.000000 0.291990
Yes count 60.000000 60.000000
mean 3.051167 0.152771
std 1.500120 0.090588
min 1.000000 0.035638
25% 2.000000 0.101845
50% 3.000000 0.141015
75% 3.820000 0.191697
max 10.000000 0.710345

그룹별 연산


In [35]:
gs = tips.groupby(["sex", "smoker"])
gs_pct = gs["tip_pct"]

In [36]:
gs_pct.mean()


Out[36]:
sex     smoker
Female  No        0.156921
        Yes       0.182150
Male    No        0.160669
        Yes       0.152771
Name: tip_pct, dtype: float64

In [37]:
gs_pct.agg('mean')


Out[37]:
sex     smoker
Female  No        0.156921
        Yes       0.182150
Male    No        0.160669
        Yes       0.152771
Name: tip_pct, dtype: float64

In [38]:
def peak_to_peak(arr):
    return arr.max() - arr.min()

gs_pct.agg(['mean', 'std', peak_to_peak])


Out[38]:
mean std peak_to_peak
sex smoker
Female No 0.156921 0.036421 0.195876
Yes 0.182150 0.071595 0.360233
Male No 0.160669 0.041849 0.220186
Yes 0.152771 0.090588 0.674707

In [39]:
gs.agg({'tip_pct': 'mean', 'total_bill': peak_to_peak})


Out[39]:
total_bill tip_pct
sex smoker
Female No 28.58 0.156921
Yes 41.23 0.182150
Male No 40.82 0.160669
Yes 43.56 0.152771

그룹의 값을 대표값으로 대체


In [40]:
gs.agg("mean")


Out[40]:
total_bill tip size tip_pct
sex smoker
Female No 18.105185 2.773519 2.592593 0.156921
Yes 17.977879 2.931515 2.242424 0.182150
Male No 19.791237 3.113402 2.711340 0.160669
Yes 22.284500 3.051167 2.500000 0.152771

In [41]:
tips2 = tips.copy()
tips2["tips"] = gs.transform("mean")["tip_pct"]
tips2.tail(15)


Out[41]:
total_bill tip sex smoker day time size tip_pct tips
229 22.12 2.88 Female Yes Sat Dinner 2 0.130199 0.182150
230 24.01 2.00 Male Yes Sat Dinner 4 0.083299 0.152771
231 15.69 3.00 Male Yes Sat Dinner 3 0.191205 0.152771
232 11.61 3.39 Male No Sat Dinner 2 0.291990 0.160669
233 10.77 1.47 Male No Sat Dinner 2 0.136490 0.160669
234 15.53 3.00 Male Yes Sat Dinner 2 0.193175 0.152771
235 10.07 1.25 Male No Sat Dinner 2 0.124131 0.160669
236 12.60 1.00 Male Yes Sat Dinner 2 0.079365 0.152771
237 32.83 1.17 Male Yes Sat Dinner 2 0.035638 0.152771
238 35.83 4.67 Female No Sat Dinner 3 0.130338 0.156921
239 29.03 5.92 Male No Sat Dinner 3 0.203927 0.160669
240 27.18 2.00 Female Yes Sat Dinner 2 0.073584 0.182150
241 22.67 2.00 Male Yes Sat Dinner 2 0.088222 0.152771
242 17.82 1.75 Male No Sat Dinner 2 0.098204 0.160669
243 18.78 3.00 Female No Thur Dinner 2 0.159744 0.156921

그룹 자체를 대체

  • apply 메소드는 수치값이 아닌 Group을 출력
  • 단순히 대표값을 계산하는 것 뿐 아니라
  • 순서 정렬, 일부 삭제 등 그룹 내의 레코드 자체를 변형하는 것도 가능

In [42]:
def top(df, n=5, column='tip_pct'):
    return df.sort_values(by=column)[-n:]

In [43]:
top(tips, n=6)


Out[43]:
total_bill tip sex smoker day time size tip_pct
109 14.31 4.00 Female Yes Sat Dinner 2 0.279525
183 23.17 6.50 Male Yes Sun Dinner 4 0.280535
232 11.61 3.39 Male No Sat Dinner 2 0.291990
67 3.07 1.00 Female Yes Sat Dinner 1 0.325733
178 9.60 4.00 Female Yes Sun Dinner 2 0.416667
172 7.25 5.15 Male Yes Sun Dinner 2 0.710345

In [44]:
tips.groupby('smoker').apply(top)


Out[44]:
total_bill tip sex smoker day time size tip_pct
smoker
No 88 24.71 5.85 Male No Thur Lunch 2 0.236746
185 20.69 5.00 Male No Sun Dinner 5 0.241663
51 10.29 2.60 Female No Sun Dinner 2 0.252672
149 7.51 2.00 Male No Thur Lunch 2 0.266312
232 11.61 3.39 Male No Sat Dinner 2 0.291990
Yes 109 14.31 4.00 Female Yes Sat Dinner 2 0.279525
183 23.17 6.50 Male Yes Sun Dinner 4 0.280535
67 3.07 1.00 Female Yes Sat Dinner 1 0.325733
178 9.60 4.00 Female Yes Sun Dinner 2 0.416667
172 7.25 5.15 Male Yes Sun Dinner 2 0.710345

In [45]:
tips.groupby(['smoker', 'day']).apply(top, n=1, column='total_bill')


Out[45]:
total_bill tip sex smoker day time size tip_pct
smoker day
No Fri 94 22.75 3.25 Female No Fri Dinner 2 0.142857
Sat 212 48.33 9.00 Male No Sat Dinner 4 0.186220
Sun 156 48.17 5.00 Male No Sun Dinner 6 0.103799
Thur 142 41.19 5.00 Male No Thur Lunch 5 0.121389
Yes Fri 95 40.17 4.73 Male Yes Fri Dinner 4 0.117750
Sat 170 50.81 10.00 Male Yes Sat Dinner 3 0.196812
Sun 182 45.35 3.50 Male Yes Sun Dinner 3 0.077178
Thur 197 43.11 5.00 Female Yes Thur Lunch 4 0.115982

In [46]:
f = lambda x: x.describe()
tips.groupby(['smoker']).apply(f)


Out[46]:
total_bill tip size tip_pct
smoker
No count 151.000000 151.000000 151.000000 151.000000
mean 19.188278 2.991854 2.668874 0.159328
std 8.255582 1.377190 1.017984 0.039910
min 7.250000 1.000000 1.000000 0.056797
25% 13.325000 2.000000 2.000000 0.136906
50% 17.590000 2.740000 2.000000 0.155625
75% 22.755000 3.505000 3.000000 0.185014
max 48.330000 9.000000 6.000000 0.291990
Yes count 93.000000 93.000000 93.000000 93.000000
mean 20.756344 3.008710 2.408602 0.163196
std 9.832154 1.401468 0.810751 0.085119
min 3.070000 1.000000 1.000000 0.035638
25% 13.420000 2.000000 2.000000 0.106771
50% 17.920000 3.000000 2.000000 0.153846
75% 26.860000 3.680000 3.000000 0.195059
max 50.810000 10.000000 5.000000 0.710345

pivot_table

  • pivot 명령과 groupby 명령의 중간적 성격
  • pivot을 수행하지만 데이터가 유니크하게 선택되지 않으면 aggfunc 인수로 정의된 함수를 수행하여 대표값 계산
  • 디폴트 aggfunc 은 평균 계산

In [47]:
tips.pivot_table(index=['sex', 'smoker'])


Out[47]:
size tip tip_pct total_bill
sex smoker
Female No 2.592593 2.773519 0.156921 18.105185
Yes 2.242424 2.931515 0.182150 17.977879
Male No 2.711340 3.113402 0.160669 19.791237
Yes 2.500000 3.051167 0.152771 22.284500

In [48]:
tips.pivot_table(['tip_pct', 'size'], index=['sex', 'day'], columns='smoker')


Out[48]:
tip_pct size
smoker No Yes No Yes
sex day
Female Fri 0.165296 0.209129 2.500000 2.000000
Sat 0.147993 0.163817 2.307692 2.200000
Sun 0.165710 0.237075 3.071429 2.500000
Thur 0.155971 0.163073 2.480000 2.428571
Male Fri 0.138005 0.144730 2.000000 2.125000
Sat 0.162132 0.139067 2.656250 2.629630
Sun 0.158291 0.173964 2.883721 2.600000
Thur 0.165706 0.164417 2.500000 2.300000

In [49]:
tips.pivot_table(['tip_pct', 'size'], index=['sex', 'day'],
                 columns='smoker', margins=True)


Out[49]:
tip_pct size
smoker No Yes All No Yes All
sex day
Female Fri 0.165296 0.209129 0.199388 2.500000 2.000000 2.111111
Sat 0.147993 0.163817 0.156470 2.307692 2.200000 2.250000
Sun 0.165710 0.237075 0.181569 3.071429 2.500000 2.944444
Thur 0.155971 0.163073 0.157525 2.480000 2.428571 2.468750
Male Fri 0.138005 0.144730 0.143385 2.000000 2.125000 2.100000
Sat 0.162132 0.139067 0.151577 2.656250 2.629630 2.644068
Sun 0.158291 0.173964 0.162344 2.883721 2.600000 2.810345
Thur 0.165706 0.164417 0.165276 2.500000 2.300000 2.433333
All 0.159328 0.163196 0.160803 2.668874 2.408602 2.569672

In [50]:
tips.pivot_table('tip_pct', index=['sex', 'smoker'], columns='day',
                 aggfunc=len, margins=True)


Out[50]:
day Fri Sat Sun Thur All
sex smoker
Female No 2.0 13.0 14.0 25.0 54.0
Yes 7.0 15.0 4.0 7.0 33.0
Male No 2.0 32.0 43.0 20.0 97.0
Yes 8.0 27.0 15.0 10.0 60.0
All 19.0 87.0 76.0 62.0 244.0

In [51]:
tips.pivot_table('size', index=['time', 'sex', 'smoker'],
                 columns='day', aggfunc='sum', fill_value=0)


Out[51]:
day Fri Sat Sun Thur
time sex smoker
Dinner Female No 2 30 43 2
Yes 8 33 10 0
Male No 4 85 124 0
Yes 12 71 39 0
Lunch Female No 3 0 0 60
Yes 6 0 0 17
Male No 0 0 0 50
Yes 5 0 0 23