Content to accompany blog post on Practical Business Python
In [1]:
import numpy as np
import pandas as pd
from sparklines import sparklines
In [2]:
df = pd.read_excel('https://github.com/chris1610/pbpython/blob/master/data/2018_Sales_Total.xlsx?raw=true')
In [3]:
df.head()
Out[3]:
account number
name
sku
quantity
unit price
ext price
date
0
740150
Barton LLC
B1-20000
39
86.69
3380.91
2018-01-01 07:21:51
1
714466
Trantow-Barrows
S2-77896
-1
63.16
-63.16
2018-01-01 10:00:47
2
218895
Kulas Inc
B1-69924
23
90.70
2086.10
2018-01-01 13:24:58
3
307599
Kassulke, Ondricka and Metz
S1-65481
41
21.05
863.05
2018-01-01 15:05:22
4
412290
Jerde-Hilpert
S2-34077
6
83.21
499.26
2018-01-01 23:26:55
Do a simple groupby to look at the performance by customr
In [4]:
df.groupby('name')['ext price'].agg(['mean', 'sum'])
Out[4]:
mean
sum
name
Barton LLC
1334.615854
109438.50
Cronin, Oberbrunner and Spencer
1339.321642
89734.55
Frami, Hills and Schmidt
1438.466528
103569.59
Fritsch, Russel and Anderson
1385.366790
112214.71
Halvorson, Crona and Champlin
1206.971724
70004.36
Herman LLC
1336.532258
82865.00
Jerde-Hilpert
1265.072247
112591.43
Kassulke, Ondricka and Metz
1350.797969
86451.07
Keeling LLC
1363.977027
100934.30
Kiehn-Spinka
1260.870506
99608.77
Koepp Ltd
1264.152927
103660.54
Kuhn-Gusikowski
1247.866849
91094.28
Kulas Inc
1461.191064
137351.96
Pollich LLC
1196.536712
87347.18
Purdy-Kunde
1469.777547
77898.21
Sanford and Sons
1391.872958
98822.98
Stokes LLC
1271.332222
91535.92
Trantow-Barrows
1312.567872
123381.38
White-Trantow
1579.558023
135841.99
Will LLC
1411.318919
104437.60
Style the currency using python's string formatting
In [5]:
(df.groupby('name')['ext price']
.agg(['mean', 'sum'])
.style.format('${0:,.2f}'))
Out[5]:
mean sum name
Barton LLC
$1,334.62
$109,438.50
Cronin, Oberbrunner and Spencer
$1,339.32
$89,734.55
Frami, Hills and Schmidt
$1,438.47
$103,569.59
Fritsch, Russel and Anderson
$1,385.37
$112,214.71
Halvorson, Crona and Champlin
$1,206.97
$70,004.36
Herman LLC
$1,336.53
$82,865.00
Jerde-Hilpert
$1,265.07
$112,591.43
Kassulke, Ondricka and Metz
$1,350.80
$86,451.07
Keeling LLC
$1,363.98
$100,934.30
Kiehn-Spinka
$1,260.87
$99,608.77
Koepp Ltd
$1,264.15
$103,660.54
Kuhn-Gusikowski
$1,247.87
$91,094.28
Kulas Inc
$1,461.19
$137,351.96
Pollich LLC
$1,196.54
$87,347.18
Purdy-Kunde
$1,469.78
$77,898.21
Sanford and Sons
$1,391.87
$98,822.98
Stokes LLC
$1,271.33
$91,535.92
Trantow-Barrows
$1,312.57
$123,381.38
White-Trantow
$1,579.56
$135,841.99
Will LLC
$1,411.32
$104,437.60
Round the results to 0 decimals
In [6]:
(df.groupby('name')['ext price']
.agg(['mean', 'sum'])
.style.format('${0:,.0f}'))
Out[6]:
mean sum name
Barton LLC
$1,335
$109,438
Cronin, Oberbrunner and Spencer
$1,339
$89,735
Frami, Hills and Schmidt
$1,438
$103,570
Fritsch, Russel and Anderson
$1,385
$112,215
Halvorson, Crona and Champlin
$1,207
$70,004
Herman LLC
$1,337
$82,865
Jerde-Hilpert
$1,265
$112,591
Kassulke, Ondricka and Metz
$1,351
$86,451
Keeling LLC
$1,364
$100,934
Kiehn-Spinka
$1,261
$99,609
Koepp Ltd
$1,264
$103,661
Kuhn-Gusikowski
$1,248
$91,094
Kulas Inc
$1,461
$137,352
Pollich LLC
$1,197
$87,347
Purdy-Kunde
$1,470
$77,898
Sanford and Sons
$1,392
$98,823
Stokes LLC
$1,271
$91,536
Trantow-Barrows
$1,313
$123,381
White-Trantow
$1,580
$135,842
Will LLC
$1,411
$104,438
More complex analysis of performance by month
In [7]:
monthly_sales = df.groupby([pd.Grouper(key='date', freq='M')])['ext price'].agg(['sum']).reset_index()
monthly_sales['pct_of_total'] = monthly_sales['sum'] / df['ext price'].sum()
monthly_sales
Out[7]:
date
sum
pct_of_total
0
2018-01-31
185361.66
0.091818
1
2018-02-28
146211.62
0.072426
2
2018-03-31
203921.38
0.101012
3
2018-04-30
174574.11
0.086475
4
2018-05-31
165418.55
0.081940
5
2018-06-30
174089.33
0.086235
6
2018-07-31
191662.11
0.094939
7
2018-08-31
153778.59
0.076174
8
2018-09-30
168443.17
0.083438
9
2018-10-31
171495.32
0.084950
10
2018-11-30
119961.22
0.059423
11
2018-12-31
163867.26
0.081171
Use a format dictionary to control formatting per column
In [8]:
format_dict = {'sum':'${0:,.0f}', 'date': '{:%m-%Y}', 'pct_of_total': '{:.2%}'}
monthly_sales.style.format(format_dict).hide_index()
Out[8]:
date sum pct_of_total
01-2018
$185,362
9.18%
02-2018
$146,212
7.24%
03-2018
$203,921
10.10%
04-2018
$174,574
8.65%
05-2018
$165,419
8.19%
06-2018
$174,089
8.62%
07-2018
$191,662
9.49%
08-2018
$153,779
7.62%
09-2018
$168,443
8.34%
10-2018
$171,495
8.49%
11-2018
$119,961
5.94%
12-2018
$163,867
8.12%
Introduce the highlight functions
In [9]:
(monthly_sales
.style
.format(format_dict)
.hide_index()
.highlight_max(color='lightgreen')
.highlight_min(color='#cd4f39'))
Out[9]:
date sum pct_of_total
01-2018
$185,362
9.18%
02-2018
$146,212
7.24%
03-2018
$203,921
10.10%
04-2018
$174,574
8.65%
05-2018
$165,419
8.19%
06-2018
$174,089
8.62%
07-2018
$191,662
9.49%
08-2018
$153,779
7.62%
09-2018
$168,443
8.34%
10-2018
$171,495
8.49%
11-2018
$119,961
5.94%
12-2018
$163,867
8.12%
Introduce bar formatting for table cells
In [10]:
(monthly_sales
.style
.format(format_dict)
.hide_index()
.bar(color='#FFA07A', vmin=100_000, subset=['sum'], align='zero')
.bar(color='lightgreen', vmin=0, subset=['pct_of_total'], align='zero')
.set_caption('2018 Sales Performance'))
Out[10]:
2018 Sales Performance date sum pct_of_total
01-2018
$185,362
9.18%
02-2018
$146,212
7.24%
03-2018
$203,921
10.10%
04-2018
$174,574
8.65%
05-2018
$165,419
8.19%
06-2018
$174,089
8.62%
07-2018
$191,662
9.49%
08-2018
$153,779
7.62%
09-2018
$168,443
8.34%
10-2018
$171,495
8.49%
11-2018
$119,961
5.94%
12-2018
$163,867
8.12%
In [11]:
(monthly_sales.style
.format(format_dict)
.background_gradient(subset=['sum'],cmap='BuGn'))
Out[11]:
date sum pct_of_total
0
01-2018
$185,362
9.18%
1
02-2018
$146,212
7.24%
2
03-2018
$203,921
10.10%
3
04-2018
$174,574
8.65%
4
05-2018
$165,419
8.19%
5
06-2018
$174,089
8.62%
6
07-2018
$191,662
9.49%
7
08-2018
$153,779
7.62%
8
09-2018
$168,443
8.34%
9
10-2018
$171,495
8.49%
10
11-2018
$119,961
5.94%
11
12-2018
$163,867
8.12%
Cool example of using sparklines from Peter Baumgartner https://twitter.com/pmbaumgartner/status/1084645440224559104
In [12]:
def sparkline_str(x):
bins=np.histogram(x)[0]
sl = ''.join(sparklines(bins))
return sl
sparkline_str.__name__ = "sparkline"
In [13]:
df.groupby('name')['quantity', 'ext price'].agg(['mean', sparkline_str])
Out[13]:
quantity
ext price
mean
sparkline
mean
sparkline
name
Barton LLC
24.890244
▄▄▃▂▃▆▄█▁▄
1334.615854
█▄▃▆▄▄▁▁▁▁
Cronin, Oberbrunner and Spencer
24.970149
█▄▁▄▄▇▅▁▄▄
1339.321642
█▅▅▃▃▃▂▂▁▁
Frami, Hills and Schmidt
26.430556
▄▄▁▂▇█▂▂▅▅
1438.466528
█▅▄▇▅▃▄▁▁▁
Fritsch, Russel and Anderson
26.074074
▁▄▇▃▂▂█▃▄▄
1385.366790
▇█▃▄▂▂▁▂▁▁
Halvorson, Crona and Champlin
22.137931
▇▆▆▇█▁▄▂▄▃
1206.971724
██▆▅▁▃▂▂▂▂
Herman LLC
24.806452
▄▃▅▁▆▄▂▆▃█
1336.532258
█▅▇▄▅▄▁▃▂▂
Jerde-Hilpert
22.460674
▄▄█▁▂▅▃▂▄▃
1265.072247
█▄▅▂▁▂▃▂▂▁
Kassulke, Ondricka and Metz
25.734375
▂▂▁▁▂▂▁▅▄█
1350.797969
█▆▆▄▄▃▂▁▁▂
Keeling LLC
24.405405
▁▄▇▃▅█▃▄▃▆
1363.977027
▅█▆▃▄▂▂▁▁▁
Kiehn-Spinka
22.227848
▃▂█▂▃▅▄▁▄▁
1260.870506
█▇▄▃▃▂▁▂▁▁
Koepp Ltd
21.829268
▅▇█▆▄▇▅▁▅▇
1264.152927
█▇▅▂▄▂▂▂▁▁
Kuhn-Gusikowski
22.808219
▂▄█▄▃▁█▄▂▄
1247.866849
▆█▄▃▃▃▃▁▁▁
Kulas Inc
24.095745
▇▃▇▂▇▁▅▆█▆
1461.191064
█▅█▂▄▄▄▂▁▁
Pollich LLC
23.383562
█▂▅▂▃▆▁▂▄▇
1196.536712
█▆▆▃▃▃▂▂▁▁
Purdy-Kunde
27.358491
▄▅█▁▄▇▅█▇█
1469.777547
█▅▄▃▄▃▃▂▁▂
Sanford and Sons
24.000000
▂▂▂▆▁█▂▂▁▅
1391.872958
▅▇█▃▃▄▁▃▁▂
Stokes LLC
24.527778
▄▄▄▁▃▂▄█▄▄
1271.332222
█▇▆▆▄▂▂▃▁▂
Trantow-Barrows
24.159574
▅▄▅▂▃▃▄▄▁█
1312.567872
█▇▅▆▃▂▂▁▁▁
White-Trantow
26.255814
▃▂▆▆▃██▁██
1579.558023
▄█▆▅▄▃▄▂▁▃
Will LLC
24.702703
▂█▁▄▄▄▂▅▅▄
1411.318919
▆█▄▅▄▂▁▂▁▂
In [ ]:
Content source: chris1610/pbpython
Similar notebooks: