1.) Import the modules we will need


In [1]:
from __future__ import division, unicode_literals
import pandas as pd
import numpy as np
import glob
import warnings
import calendar
warnings.filterwarnings("ignore")

2.) Take a look at the files we have to work with


In [2]:
glob.glob('./input/sales-*.xlsx')


Out[2]:
['./input/sales-feb-2014.xlsx',
 './input/sales-mar-2014.xlsx',
 './input/sales-jan-2014.xlsx']

a.) Now let's grab all the files and concatenate them together


In [3]:
all_data = pd.DataFrame()
for f in glob.glob('./input/sales-*.xlsx'):
    df = pd.read_excel(f)
    all_data = pd.concat([all_data, df])
all_data.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 1742 entries, 0 to 133
Data columns (total 8 columns):
account number    1742 non-null int64
date              1742 non-null object
discount          1500 non-null float64
ext price         1742 non-null float64
name              1742 non-null object
quantity          1742 non-null int64
sku               1742 non-null object
unit price        1742 non-null float64
dtypes: float64(3), int64(2), object(3)
memory usage: 122.5+ KB

b.) We can get quick stats on numeric columns just by running a "describe"


In [4]:
all_data.describe()


Out[4]:
account number discount ext price quantity unit price
count 1742.000000 1500.00 1742.000000 1742.000000 1742.000000
mean 485766.487945 0.05 1349.229392 24.319173 54.985454
std 223750.660792 0.00 1094.639319 14.502759 26.108490
min 141962.000000 0.05 -97.160000 -1.000000 10.030000
25% 257198.000000 0.05 468.592500 12.000000 32.132500
50% 527099.000000 0.05 1049.700000 25.000000 55.465000
75% 714466.000000 0.05 2074.972500 37.000000 77.607500
max 786968.000000 0.05 4824.540000 49.000000 99.850000

c.) Let's look at the raw data


In [5]:
all_data.head()


Out[5]:
account number date discount ext price name quantity sku unit price
0 383080 2014-02-01 09:04:59 NaN 235.83 Will LLC 7 B1-20000 33.69
1 412290 2014-02-01 11:51:46 NaN 232.32 Jerde-Hilpert 11 S1-27722 21.12
2 412290 2014-02-01 17:24:32 NaN 107.97 Jerde-Hilpert 3 B1-86481 35.99
3 412290 2014-02-01 19:56:48 NaN 1814.70 Jerde-Hilpert 23 B1-20000 78.90
4 672390 2014-02-02 03:45:20 NaN 2679.36 Kuhn-Gusikowski 48 S1-06532 55.82

In [6]:
all_data.tail()


Out[6]:
account number date discount ext price name quantity sku unit price
129 383080 2014-01-30 23:25:46 NaN 1151.07 Will LLC 17 S2-11481 67.71
130 737550 2014-01-30 23:56:20 NaN 34.81 Fritsch, Russel and Anderson 1 S1-50961 34.81
131 146832 2014-01-31 02:53:16 NaN 3071.31 Kiehn-Spinka 33 B1-53636 93.07
132 729833 2014-01-31 19:43:49 NaN 650.88 Koepp Ltd 24 S2-16558 27.12
133 383080 2014-01-31 22:51:18 NaN 3447.31 Will LLC 43 B1-05914 80.17

3.) Time to do a little house cleaning and transformation

a.) First the clean up


In [7]:
all_data['discount'].fillna(0, inplace=True)          #  Let's fill mising values with zeros
all_data['ext price'] = abs(all_data['ext price'])    #  Let's make the negative numbers go bye bye
all_data['unit price'] = abs(all_data['unit price'])
all_data['quantity'] = abs(all_data['quantity'])
all_data['date'] = pd.to_datetime(all_data['date'])   #  Let's make this a date object so we can do fancy stuff
all_data.head()


Out[7]:
account number date discount ext price name quantity sku unit price
0 383080 2014-02-01 09:04:59 0.0 235.83 Will LLC 7 B1-20000 33.69
1 412290 2014-02-01 11:51:46 0.0 232.32 Jerde-Hilpert 11 S1-27722 21.12
2 412290 2014-02-01 17:24:32 0.0 107.97 Jerde-Hilpert 3 B1-86481 35.99
3 412290 2014-02-01 19:56:48 0.0 1814.70 Jerde-Hilpert 23 B1-20000 78.90
4 672390 2014-02-02 03:45:20 0.0 2679.36 Kuhn-Gusikowski 48 S1-06532 55.82

b.) Now let's add a couple calculated columns


In [8]:
all_data['total price'] = all_data['unit price'] * all_data['quantity'] * (1 - all_data['discount'])
all_data['month'] = all_data['date'].dt.month
all_data['month'] = all_data['month'].apply(lambda x: calendar.month_name[x])
all_data.tail()


Out[8]:
account number date discount ext price name quantity sku unit price total price month
129 383080 2014-01-30 23:25:46 0.0 1151.07 Will LLC 17 S2-11481 67.71 1151.07 January
130 737550 2014-01-30 23:56:20 0.0 34.81 Fritsch, Russel and Anderson 1 S1-50961 34.81 34.81 January
131 146832 2014-01-31 02:53:16 0.0 3071.31 Kiehn-Spinka 33 B1-53636 93.07 3071.31 January
132 729833 2014-01-31 19:43:49 0.0 650.88 Koepp Ltd 24 S2-16558 27.12 650.88 January
133 383080 2014-01-31 22:51:18 0.0 3447.31 Will LLC 43 B1-05914 80.17 3447.31 January

The road so far

  • Merged 3 data files of different format and scale
  • Cleaned and normalized data
  • Added calculated columns
  • Only 11 lines of "real" code used so far

4.) Let's join our data to another data set


In [9]:
customer_data = pd.read_excel('./input/customer-status.xlsx')
customer_data.head()


Out[9]:
account number name address city state status
0 740150 Barton LLC 123 Main st Norfolk Virginia gold
1 714466 Trantow-Barrows 467 Some Rd Dover Delawhere? silver
2 218895 Kulas Inc 155 Baker St Lenordtown Maryland bronze
3 307599 Kassulke, Ondricka and Metz 101 Wall St Richmond Verginya bronze
4 412290 Jerde-Hilpert 302 Broadway Celebration Florda bronze

In [10]:
all_data = pd.merge(all_data, customer_data, how='left')    #  Just like in SQL, we can left join data sets
all_data.head()


Out[10]:
account number date discount ext price name quantity sku unit price total price month address city state status
0 383080 2014-02-01 09:04:59 0.0 235.83 Will LLC 7 B1-20000 33.69 235.83 February NaN NaN NaN NaN
1 412290 2014-02-01 11:51:46 0.0 232.32 Jerde-Hilpert 11 S1-27722 21.12 232.32 February 302 Broadway Celebration Florda bronze
2 412290 2014-02-01 17:24:32 0.0 107.97 Jerde-Hilpert 3 B1-86481 35.99 107.97 February 302 Broadway Celebration Florda bronze
3 412290 2014-02-01 19:56:48 0.0 1814.70 Jerde-Hilpert 23 B1-20000 78.90 1814.70 February 302 Broadway Celebration Florda bronze
4 672390 2014-02-02 03:45:20 0.0 2679.36 Kuhn-Gusikowski 48 S1-06532 55.82 2679.36 February 185 South Park Huston Texes silver

a.) Looks like we have some missing values again, let's fix that


In [11]:
all_data['status'].fillna('bronze',inplace=True)    #  Let's replace missing values with 'bronze' as a default
all_data['address'].fillna('?',inplace=True)        #  Let's replace missing values with '?' as a default
all_data['city'].fillna('?',inplace=True)           #  Let's replace missing values with '?' as a default
all_data['state'].fillna('?',inplace=True)          #  Let's replace missing values with '?' as a default
all_data.head()


Out[11]:
account number date discount ext price name quantity sku unit price total price month address city state status
0 383080 2014-02-01 09:04:59 0.0 235.83 Will LLC 7 B1-20000 33.69 235.83 February ? ? ? bronze
1 412290 2014-02-01 11:51:46 0.0 232.32 Jerde-Hilpert 11 S1-27722 21.12 232.32 February 302 Broadway Celebration Florda bronze
2 412290 2014-02-01 17:24:32 0.0 107.97 Jerde-Hilpert 3 B1-86481 35.99 107.97 February 302 Broadway Celebration Florda bronze
3 412290 2014-02-01 19:56:48 0.0 1814.70 Jerde-Hilpert 23 B1-20000 78.90 1814.70 February 302 Broadway Celebration Florda bronze
4 672390 2014-02-02 03:45:20 0.0 2679.36 Kuhn-Gusikowski 48 S1-06532 55.82 2679.36 February 185 South Park Huston Texes silver

5.) Lets try to join another dirty data set to get location info

a.) Insert a blank column as a placeholder


In [12]:
all_data.insert(13, "abbrev", np.nan)

b.) Open a JSON file that has abbreviations for all the states


In [13]:
import json
states = json.loads(open('./input/state_abbreviations.json').read())

c.) Create a function we will use to lookup state abbreviations


In [14]:
from fuzzywuzzy import process
def convert_state(row):
    abbrev = process.extractOne(row["state"], choices=states.keys(), score_cutoff=80)
    if abbrev:
        return states[abbrev[0]]
    return np.nan

d.) Let's apply our function to populate the correct abbreviations


In [15]:
all_data['abbrev'] = all_data.apply(convert_state, axis=1)
all_data.head()


WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
WARNING:root:Applied processor reduces input query to empty string, all comparisons will have score 0. [Query: '?']
Out[15]:
account number date discount ext price name quantity sku unit price total price month address city state abbrev status
0 383080 2014-02-01 09:04:59 0.0 235.83 Will LLC 7 B1-20000 33.69 235.83 February ? ? ? NaN bronze
1 412290 2014-02-01 11:51:46 0.0 232.32 Jerde-Hilpert 11 S1-27722 21.12 232.32 February 302 Broadway Celebration Florda FL bronze
2 412290 2014-02-01 17:24:32 0.0 107.97 Jerde-Hilpert 3 B1-86481 35.99 107.97 February 302 Broadway Celebration Florda FL bronze
3 412290 2014-02-01 19:56:48 0.0 1814.70 Jerde-Hilpert 23 B1-20000 78.90 1814.70 February 302 Broadway Celebration Florda FL bronze
4 672390 2014-02-02 03:45:20 0.0 2679.36 Kuhn-Gusikowski 48 S1-06532 55.82 2679.36 February 185 South Park Huston Texes TX silver

The road so far

  • Merged 3 data files of different format and scale
  • Cleaned and normalized data
  • Added calculated columns
  • Left joined 4th data file, and filled in missing values
  • Built function to auto-magically populate column based on dirty data
  • Only 25 lines of "real" code used so far

6.) Now that our data is combined and cleaned up, let's pivot!


In [16]:
pt = pd.pivot_table(
    all_data,                            #  This is the Dataframe we will pivot
    index=["name", 'sku'],               #  These are the fields, in order, that we want to pivot
    columns=['month'],                   #  Let's group by this these columns
    values=['quantity', 'total price'],  #  These are the value we want to agg
    aggfunc=[np.sum],                    #  This is the methods of aggrigation we want to use
    fill_value=0                         #  Since we know this will happen, let's fill missing values with zero
)
pt


Out[16]:
sum
quantity ... total price
month April August December February January July June March May November ... December February January July June March May November October September
name sku
Barton LLC B1-04202 0 0 0 0 0 0 0 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 2311.5400
B1-05914 0 0 0 0 0 0 0 0 24 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 308.4840 0.000 0.0000 0.0000
B1-20000 0 1 0 56 78 0 0 20 0 0 ... 0.0000 4443.8940 6592.7745 0.0000 0.0000 1404.6700 0.0000 0.000 0.0000 0.0000
B1-33087 0 0 0 70 0 0 9 0 0 0 ... 0.0000 6364.3125 0.0000 0.0000 851.4090 0.0000 0.0000 0.000 0.0000 0.0000
B1-33364 0 46 0 0 12 0 0 0 0 0 ... 0.0000 0.0000 288.2880 0.0000 0.0000 0.0000 0.0000 0.000 346.0375 0.0000
B1-38851 42 63 0 40 0 0 0 0 5 0 ... 0.0000 3157.4595 0.0000 0.0000 0.0000 0.0000 305.9475 0.000 0.0000 0.0000
B1-50809 75 0 0 0 16 0 0 0 31 0 ... 0.0000 0.0000 305.7600 0.0000 0.0000 0.0000 1588.2385 0.000 0.0000 0.0000
B1-53102 0 0 3 0 2 0 0 0 0 6 ... 154.0995 0.0000 132.7170 0.0000 0.0000 0.0000 0.0000 190.950 0.0000 0.0000
B1-53636 31 0 0 0 0 0 0 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 1809.3320
B1-65551 7 0 0 0 0 0 0 0 38 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1451.9420 0.000 323.2850 2478.0750
B1-69924 1 0 0 0 0 0 0 0 38 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1614.7530 0.000 0.0000 0.0000
B1-86481 0 0 0 0 40 0 0 0 0 0 ... 0.0000 0.0000 1185.9900 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 807.1200
S1-06532 49 0 0 0 20 68 45 0 27 0 ... 0.0000 0.0000 1284.4650 3789.7590 556.1775 0.0000 1474.1055 0.000 0.0000 0.0000
S1-27722 0 49 0 0 0 0 38 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 2333.5040 0.0000 0.0000 0.000 0.0000 0.0000
S1-30248 0 36 0 42 0 0 0 0 0 0 ... 0.0000 575.3475 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S1-47412 0 0 0 0 54 40 0 0 0 0 ... 0.0000 0.0000 1901.7180 1641.6000 0.0000 0.0000 0.0000 0.000 2441.3670 0.0000
S1-50961 0 38 0 0 0 0 0 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 268.6125 0.0000
S1-65481 0 0 0 0 0 0 0 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 1409.8000 1775.1890
S1-82801 0 16 0 0 0 34 0 29 0 28 ... 0.0000 0.0000 0.0000 981.5970 0.0000 1675.3155 0.0000 1862.000 0.0000 0.0000
S1-93683 0 0 0 42 0 0 0 0 0 21 ... 0.0000 423.4230 0.0000 0.0000 0.0000 0.0000 0.0000 1423.233 0.0000 0.0000
S2-10342 0 0 28 94 0 0 40 0 37 0 ... 2060.9680 8860.7220 0.0000 0.0000 2658.1000 0.0000 2254.5210 0.000 0.0000 430.5875
S2-11481 0 0 0 0 0 0 25 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 692.0750 0.0000 0.0000 0.000 0.0000 0.0000
S2-16558 0 0 25 0 4 0 0 0 0 0 ... 419.1875 0.0000 354.5490 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S2-34077 0 0 0 0 0 0 0 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 4186.5360 1253.3160
S2-77896 0 47 0 0 0 0 52 0 0 30 ... 0.0000 0.0000 0.0000 0.0000 2849.2780 0.0000 0.0000 1179.900 0.0000 0.0000
S2-78676 0 0 0 0 0 0 0 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 2485.7700
S2-82423 33 0 0 0 0 0 0 0 10 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 711.1700 0.000 0.0000 0.0000
S2-83881 6 44 0 0 0 0 0 12 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 257.8680 0.0000 0.000 0.0000 0.0000
Cronin, Oberbrunner and Spencer B1-04202 0 0 0 0 0 0 0 0 0 28 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 2320.318 0.0000 2481.7800
B1-05914 0 28 0 0 16 30 0 0 0 0 ... 0.0000 0.0000 359.5800 2123.2500 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
White-Trantow S2-82423 30 12 0 0 0 11 0 38 0 26 ... 0.0000 0.0000 0.0000 111.1880 0.0000 1838.5730 0.0000 2290.925 0.0000 0.0000
S2-83881 0 32 5 84 0 0 0 0 0 0 ... 418.9025 7239.9600 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
Will LLC B1-04202 0 0 0 0 0 0 0 0 6 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 346.6170 0.000 94.4965 0.0000
B1-05914 1 0 0 0 136 0 0 0 0 0 ... 0.0000 0.0000 11522.1795 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
B1-20000 45 0 0 22 0 0 0 45 0 0 ... 0.0000 733.1805 0.0000 0.0000 0.0000 1278.2250 0.0000 0.000 0.0000 0.0000
B1-33087 0 0 0 0 0 0 0 19 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 1024.1570 0.0000 0.000 1535.3330 966.0170
B1-33364 0 0 73 0 138 0 0 0 0 0 ... 5480.0275 0.0000 11370.3525 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
B1-38851 3 0 0 0 0 0 0 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 1243.1320
B1-53102 0 6 0 154 0 0 32 0 0 0 ... 0.0000 10336.1310 0.0000 0.0000 1368.9120 0.0000 0.0000 0.000 0.0000 0.0000
B1-53636 0 0 0 0 74 0 0 0 0 0 ... 0.0000 0.0000 1038.9600 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
B1-65551 0 0 17 0 0 41 0 0 0 0 ... 565.2500 0.0000 0.0000 2321.8095 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
B1-69924 0 0 0 66 0 0 0 0 98 9 ... 0.0000 6138.3465 0.0000 0.0000 0.0000 0.0000 5859.4575 670.662 0.0000 0.0000
B1-86481 0 0 0 0 0 0 0 0 22 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1681.8230 0.000 0.0000 0.0000
S1-06532 0 0 0 0 0 0 8 0 44 0 ... 0.0000 0.0000 0.0000 0.0000 139.4600 0.0000 3009.5240 0.000 0.0000 0.0000
S1-27722 0 0 0 0 0 51 0 0 0 0 ... 0.0000 0.0000 0.0000 4268.4545 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S1-30248 16 0 0 0 0 0 0 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S1-47412 0 28 0 0 0 0 0 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 1169.2600 0.0000
S1-50961 7 0 0 0 0 0 0 37 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 2966.2705 0.0000 0.000 0.0000 0.0000
S1-65481 0 0 0 0 0 6 0 0 0 0 ... 0.0000 0.0000 0.0000 563.7870 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S1-82801 0 0 0 42 0 0 0 41 0 0 ... 0.0000 557.7390 0.0000 0.0000 0.0000 1661.4550 0.0000 0.000 0.0000 0.0000
S1-93683 0 0 0 82 90 0 37 28 0 0 ... 0.0000 7778.3355 4967.5275 0.0000 455.1925 2416.8760 0.0000 0.000 0.0000 0.0000
S2-00301 0 0 38 0 0 0 38 0 0 0 ... 1676.4840 0.0000 0.0000 0.0000 507.2145 0.0000 0.0000 0.000 0.0000 0.0000
S2-10342 15 10 45 0 0 0 0 0 0 0 ... 2843.7300 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S2-11481 0 0 31 0 34 0 0 0 0 32 ... 1367.6580 0.0000 2244.5865 0.0000 0.0000 0.0000 0.0000 1647.376 0.0000 0.0000
S2-16558 0 0 0 0 0 6 0 0 0 0 ... 0.0000 0.0000 0.0000 392.8440 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S2-23246 0 0 0 94 46 0 0 0 0 0 ... 0.0000 1001.7345 2690.1030 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S2-34077 19 0 0 0 0 0 0 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S2-77896 0 0 0 0 28 0 0 0 0 48 ... 0.0000 0.0000 653.8350 0.0000 0.0000 0.0000 0.0000 731.880 2085.8200 1919.5415
S2-78676 0 0 0 0 0 38 0 0 0 0 ... 0.0000 0.0000 0.0000 2919.7680 0.0000 0.0000 0.0000 0.000 1846.1540 0.0000
S2-83881 0 0 0 0 70 0 0 0 7 0 ... 0.0000 0.0000 6372.5025 0.0000 0.0000 0.0000 366.6810 0.000 0.0000 0.0000

544 rows × 24 columns

a.) Did you know that you can cross section pivots?


In [17]:
pt.xs('Will LLC', level=0)    #  Let's cut out just the data for one account


Out[17]:
sum
quantity ... total price
month April August December February January July June March May November ... December February January July June March May November October September
sku
B1-04202 0 0 0 0 0 0 0 0 6 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 346.6170 0.000 94.4965 0.0000
B1-05914 1 0 0 0 136 0 0 0 0 0 ... 0.0000 0.0000 11522.1795 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
B1-20000 45 0 0 22 0 0 0 45 0 0 ... 0.0000 733.1805 0.0000 0.0000 0.0000 1278.2250 0.0000 0.000 0.0000 0.0000
B1-33087 0 0 0 0 0 0 0 19 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 1024.1570 0.0000 0.000 1535.3330 966.0170
B1-33364 0 0 73 0 138 0 0 0 0 0 ... 5480.0275 0.0000 11370.3525 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
B1-38851 3 0 0 0 0 0 0 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 1243.1320
B1-53102 0 6 0 154 0 0 32 0 0 0 ... 0.0000 10336.1310 0.0000 0.0000 1368.9120 0.0000 0.0000 0.000 0.0000 0.0000
B1-53636 0 0 0 0 74 0 0 0 0 0 ... 0.0000 0.0000 1038.9600 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
B1-65551 0 0 17 0 0 41 0 0 0 0 ... 565.2500 0.0000 0.0000 2321.8095 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
B1-69924 0 0 0 66 0 0 0 0 98 9 ... 0.0000 6138.3465 0.0000 0.0000 0.0000 0.0000 5859.4575 670.662 0.0000 0.0000
B1-86481 0 0 0 0 0 0 0 0 22 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1681.8230 0.000 0.0000 0.0000
S1-06532 0 0 0 0 0 0 8 0 44 0 ... 0.0000 0.0000 0.0000 0.0000 139.4600 0.0000 3009.5240 0.000 0.0000 0.0000
S1-27722 0 0 0 0 0 51 0 0 0 0 ... 0.0000 0.0000 0.0000 4268.4545 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S1-30248 16 0 0 0 0 0 0 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S1-47412 0 28 0 0 0 0 0 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 1169.2600 0.0000
S1-50961 7 0 0 0 0 0 0 37 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 2966.2705 0.0000 0.000 0.0000 0.0000
S1-65481 0 0 0 0 0 6 0 0 0 0 ... 0.0000 0.0000 0.0000 563.7870 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S1-82801 0 0 0 42 0 0 0 41 0 0 ... 0.0000 557.7390 0.0000 0.0000 0.0000 1661.4550 0.0000 0.000 0.0000 0.0000
S1-93683 0 0 0 82 90 0 37 28 0 0 ... 0.0000 7778.3355 4967.5275 0.0000 455.1925 2416.8760 0.0000 0.000 0.0000 0.0000
S2-00301 0 0 38 0 0 0 38 0 0 0 ... 1676.4840 0.0000 0.0000 0.0000 507.2145 0.0000 0.0000 0.000 0.0000 0.0000
S2-10342 15 10 45 0 0 0 0 0 0 0 ... 2843.7300 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S2-11481 0 0 31 0 34 0 0 0 0 32 ... 1367.6580 0.0000 2244.5865 0.0000 0.0000 0.0000 0.0000 1647.376 0.0000 0.0000
S2-16558 0 0 0 0 0 6 0 0 0 0 ... 0.0000 0.0000 0.0000 392.8440 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S2-23246 0 0 0 94 46 0 0 0 0 0 ... 0.0000 1001.7345 2690.1030 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S2-34077 19 0 0 0 0 0 0 0 0 0 ... 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.000 0.0000 0.0000
S2-77896 0 0 0 0 28 0 0 0 0 48 ... 0.0000 0.0000 653.8350 0.0000 0.0000 0.0000 0.0000 731.880 2085.8200 1919.5415
S2-78676 0 0 0 0 0 38 0 0 0 0 ... 0.0000 0.0000 0.0000 2919.7680 0.0000 0.0000 0.0000 0.000 1846.1540 0.0000
S2-83881 0 0 0 0 70 0 0 0 7 0 ... 0.0000 0.0000 6372.5025 0.0000 0.0000 0.0000 366.6810 0.000 0.0000 0.0000

28 rows × 24 columns

7.) Now let's write all this data to the files we need

a.) An Excel file with the full data set


In [18]:
all_data.to_excel('./output/all_data.xls')

b.) An Excel file of the pivoted data


In [19]:
pt.to_excel('./output/pivot_data.xls')

c.) An Excel file with a different cross-section on each tab


In [20]:
writer = pd.ExcelWriter('./output/pivoted_x-sections.xlsx')
for name in pt.index.get_level_values(0).unique():
    temp_df = pt.xs(name, level=0)
    temp_df.to_excel(writer,name)
writer.save()

All said and done

  • Merged 3 data files of different format and scale
  • Cleaned and normalized data
  • Added calculated columns
  • Left joined 4th data file, and filled in missing values
  • Built function to auto-magically populate column based on dirty data
  • Dumped all data to flat excel file
  • Built pivot table and exported to excel
  • Built x-sectioned file of pivots and exported to excel
  • Only 35 lines of "real" code used (note including imports and exploration steps)