In [1]:

    
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

Advanced Tables

Why are databases so complex?

Data stored in a database may be split into multiple tables, each containing multiple columns. A column stores a single attribute of the data; a table stores a collection of related attributes.
The database also keeps track of the relationships between different tables.
Databases are designed to minimize redundancy and maintain data integrity, particularly when data is added, changed, or deleted.
1. Consistency when updating: no duplicate places for information https://en.wikipedia.org/wiki/Database_normalization
2. Performance https://en.wikipedia.org/wiki/Star_schema
Side note: you may also have to think about isolation level when working with a database where someone may be updating data as you're trying to read it. The isolation level determines the database read behavior in this situation. See https://en.wikipedia.org/wiki/Isolation_(database_systems).

Working with multiple tables

Two tables can be joined at a time. 'Join' is a binary operator. See https://en.wikipedia.org/wiki/Join_(SQL).
Tables must have key values that can be matched. Usually one table has a primary key and the other table has a foreign key.

Pandas

Pandas allows "merge", "join", and "concatenate" operations. See http://pandas.pydata.org/pandas-docs/version/0.18.1/merging.html#merge-join-and-concatenate for additional reading.
Pandas also allows reshaping and pivoting data tables, see http://pandas.pydata.org/pandas-docs/version/0.18.1/reshaping.html.

In this class, we will cover table joining, merging and concatenation. We will also go over using some of the time-series handling capabilities in Pandas.



In [2]:

    
import pandas as pd
import numpy as np

Concatenating tables in Pandas

To introduce join operations, we will be working with the AdventureWorks dataset, a standard dataset from Microsoft SLQ Server for learing to work with databases. It contains data for the fictitious bicycle manufacturer (Adventure Works Cycles).

Let's starts by importing some tables from AdventureWorks in /home/data/AdventureWorks. These tables contain data on AdventureWorks employees, sales territories, customers, and orders placed by the customers.



In [3]:

    
Employees = pd.read_excel('/home/data/AdventureWorks/Employees.xls')
Territory = pd.read_excel('/home/data/AdventureWorks/SalesTerritory.xls')
Customers = pd.read_excel('/home/data/AdventureWorks/Customers.xls')
Orders = pd.read_excel('/home/data/AdventureWorks/ItemsOrdered.xls')

Let's take a look at the data we'll be working with:



In [4]:

    
Employees.head()









    Out[4]:






  
    
      
      EmployeeID
      ManagerID
      TerritoryID
      Title
      FirstName
      MiddleName
      LastName
      Suffix
      JobTitle
      NationalIDNumber
      ...
      SickLeaveHours
      PhoneNumber
      PhoneNumberType
      EmailAddress
      AddressLine1
      AddressLine2
      City
      StateProvinceName
      PostalCode
      CountryName
    
  
  
    
      0
      259
      250.0
      NaN
      NaN
      Ben
      T
      Miller
      NaN
      Buyer
      20269531
      ...
      47
      151-555-0113
      Work
      ben0@adventure-works.com
      101 Candy Rd.
      NaN
      Redmond
      Washington
      98052
      United States
    
    
      1
      278
      274.0
      6.0
      NaN
      Garrett
      R
      Vargas
      NaN
      Sales Representative
      234474252
      ...
      36
      922-555-0165
      Work
      garrett1@mapleleafmail.ca
      10203 Acorn Avenue
      NaN
      Calgary
      Alberta
      T2P 2G8
      Canada
    
    
      2
      204
      26.0
      NaN
      NaN
      Gabe
      B
      Mares
      NaN
      Production Technician - WC40
      440379437
      ...
      48
      310-555-0117
      Work
      gabe0@adventure-works.com
      1061 Buskrik Avenue
      NaN
      Edmonds
      Washington
      98020
      United States
    
    
      3
      78
      26.0
      NaN
      NaN
      Reuben
      H
      D'sa
      NaN
      Production Supervisor - WC40
      370989364
      ...
      56
      191-555-0112
      Work
      reuben0@adventure-works.com
      1064 Slow Creek Road
      NaN
      Seattle
      Washington
      98104
      United States
    
    
      4
      255
      250.0
      NaN
      NaN
      Gordon
      L
      Hee
      NaN
      Buyer
      466142721
      ...
      46
      230-555-0144
      Cell
      gordon0@adventure-works.com
      108 Lakeside Court
      NaN
      Bellevue
      Washington
      98004
      United States
    
  

5 rows × 26 columns



In [5]:

    
Territory.head()









    Out[5]:






  
    
      
      TerritoryID
      Name
      CountryCode
      Region
      SalesYTD
      SalesLastYear
    
  
  
    
      0
      1
      Northwest
      US
      North America
      7887186.79
      3298694.49
    
    
      1
      2
      Northeast
      US
      North America
      2402176.85
      3607148.94
    
    
      2
      3
      Central
      US
      North America
      3072175.12
      3205014.08
    
    
      3
      4
      Southwest
      US
      North America
      10510853.87
      5366575.71
    
    
      4
      5
      Southeast
      US
      North America
      2538667.25
      3925071.43



In [6]:

    
Customers.head()









    Out[6]:






  
    
      
      CustomerID
      SalesTerritoryID
      FirstName
      LastName
      City
      StateName
    
  
  
    
      0
      10101
      1
      John
      Gray
      Lynden
      Washington
    
    
      1
      10298
      4
      Leroy
      Brown
      Pinetop
      Arizona
    
    
      2
      10299
      1
      Elroy
      Keller
      Snoqualmie
      Washington
    
    
      3
      10315
      3
      Lisa
      Jones
      Oshkosh
      Wisconsin
    
    
      4
      10325
      1
      Ginger
      Schultz
      Pocatello
      Idaho



In [7]:

    
Orders.head()

Let's construct a slightly artificial example. Suppose that AdventureWorks was formed by merging two companies, AdventuresUSA which operated in the US and AdventuresWorld, which operated in other countries. Now we want information on their combined sales territories.

The Pandas "concat" function is good for stacking tables on top of each other. We will use it to combine the AdventuresUSA and AdventuresWorld territories data tables.



In [8]:

    
help(pd.concat)









    



Help on function concat in module pandas.tools.merge:

concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, copy=True)
    Concatenate pandas objects along a particular axis with optional set logic
    along the other axes. Can also add a layer of hierarchical indexing on the
    concatenation axis, which may be useful if the labels are the same (or
    overlapping) on the passed axis number
    
    Parameters
    ----------
    objs : a sequence or mapping of Series, DataFrame, or Panel objects
        If a dict is passed, the sorted keys will be used as the `keys`
        argument, unless it is passed, in which case the values will be
        selected (see below). Any None objects will be dropped silently unless
        they are all None in which case a ValueError will be raised
    axis : {0, 1, ...}, default 0
        The axis to concatenate along
    join : {'inner', 'outer'}, default 'outer'
        How to handle indexes on other axis(es)
    join_axes : list of Index objects
        Specific indexes to use for the other n - 1 axes instead of performing
        inner/outer set logic
    verify_integrity : boolean, default False
        Check whether the new concatenated axis contains duplicates. This can
        be very expensive relative to the actual data concatenation
    keys : sequence, default None
        If multiple levels passed, should contain tuples. Construct
        hierarchical index using the passed keys as the outermost level
    levels : list of sequences, default None
        Specific levels (unique values) to use for constructing a
        MultiIndex. Otherwise they will be inferred from the keys
    names : list, default None
        Names for the levels in the resulting hierarchical index
    ignore_index : boolean, default False
        If True, do not use the index values along the concatenation axis. The
        resulting axis will be labeled 0, ..., n - 1. This is useful if you are
        concatenating objects where the concatenation axis does not have
        meaningful indexing information. Note the index values on the other
        axes are still respected in the join.
    copy : boolean, default True
        If False, do not copy data unnecessarily
    
    Notes
    -----
    The keys, levels, and names arguments are all optional
    
    Returns
    -------
    concatenated : type of objects



In [9]:

    
# constructing the territory tables... as noted, this is an artificial example
TerritoryUSA = Territory[Territory.CountryCode=='US']; TerritoryUSA['RepID'] = np.random.randint(1,1000,5)
TerritoryWorld = Territory[Territory.CountryCode!='US']









    



/usr/lib/python3.4/site-packages/ipykernel/__main__.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app



In [10]:

    
TerritoryUSA









    Out[10]:






  
    
      
      TerritoryID
      Name
      CountryCode
      Region
      SalesYTD
      SalesLastYear
      RepID
    
  
  
    
      0
      1
      Northwest
      US
      North America
      7887186.79
      3298694.49
      445
    
    
      1
      2
      Northeast
      US
      North America
      2402176.85
      3607148.94
      717
    
    
      2
      3
      Central
      US
      North America
      3072175.12
      3205014.08
      328
    
    
      3
      4
      Southwest
      US
      North America
      10510853.87
      5366575.71
      206
    
    
      4
      5
      Southeast
      US
      North America
      2538667.25
      3925071.43
      886



In [11]:

    
TerritoryWorld









    Out[11]:






  
    
      
      TerritoryID
      Name
      CountryCode
      Region
      SalesYTD
      SalesLastYear
    
  
  
    
      5
      6
      Canada
      CA
      North America
      6771829.14
      5693988.860
    
    
      6
      7
      France
      FR
      Europe
      4772398.31
      2396539.760
    
    
      7
      8
      Germany
      DE
      Europe
      3805202.35
      1307949.790
    
    
      8
      9
      Australia
      AU
      Pacific
      5977814.92
      2278548.980
    
    
      9
      10
      United Kingdom
      GB
      Europe
      5012905.37
      1635823.400
    
    
      10
      11
      Brazil
      BR
      South America
      0.00
      261589.958
    
    
      11
      12
      Mexico
      MX
      North America
      0.00
      0.000



In [12]:

    
# we'll concatenate the databases, but keep separate keys so that we can keep track of which entries came from AdventuresUSA and 
# which from AdventuresWorld.
# We'll use "join='inner'" to only keep colunms that are common to both tables; 
# that is, we will drop the no-longer needed RepID in AdventuresUSA. 
Territory2 = pd.concat([TerritoryUSA, TerritoryWorld], keys=['usa', 'world'], join='inner')



In [13]:

    
Territory2









    Out[13]:






  
    
      
      
      TerritoryID
      Name
      CountryCode
      Region
      SalesYTD
      SalesLastYear
    
  
  
    
      usa
      0
      1
      Northwest
      US
      North America
      7887186.79
      3298694.490
    
    
      1
      2
      Northeast
      US
      North America
      2402176.85
      3607148.940
    
    
      2
      3
      Central
      US
      North America
      3072175.12
      3205014.080
    
    
      3
      4
      Southwest
      US
      North America
      10510853.87
      5366575.710
    
    
      4
      5
      Southeast
      US
      North America
      2538667.25
      3925071.430
    
    
      world
      5
      6
      Canada
      CA
      North America
      6771829.14
      5693988.860
    
    
      6
      7
      France
      FR
      Europe
      4772398.31
      2396539.760
    
    
      7
      8
      Germany
      DE
      Europe
      3805202.35
      1307949.790
    
    
      8
      9
      Australia
      AU
      Pacific
      5977814.92
      2278548.980
    
    
      9
      10
      United Kingdom
      GB
      Europe
      5012905.37
      1635823.400
    
    
      10
      11
      Brazil
      BR
      South America
      0.00
      261589.958
    
    
      11
      12
      Mexico
      MX
      North America
      0.00
      0.000

Pandas "append" behaves just like "concat" with axis=0 and join='outer' (i.e., keep all column names). Missing values are set to NaN.



In [14]:

    
help(pd.DataFrame.append)









    



Help on function append in module pandas.core.frame:

append(self, other, ignore_index=False, verify_integrity=False)
    Append rows of `other` to the end of this frame, returning a new
    object. Columns not in this frame are added as new columns.
    
    Parameters
    ----------
    other : DataFrame or Series/dict-like object, or list of these
        The data to append.
    ignore_index : boolean, default False
        If True, do not use the index labels.
    verify_integrity : boolean, default False
        If True, raise ValueError on creating index with duplicates.
    
    Returns
    -------
    appended : DataFrame
    
    Notes
    -----
    If a list of dict/series is passed and the keys are all contained in
    the DataFrame's index, the order of the columns in the resulting
    DataFrame will be unchanged.
    
    See also
    --------
    pandas.concat : General function to concatenate DataFrame, Series
        or Panel objects
    
    Examples
    --------
    
    >>> df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
    >>> df
       A  B
    0  1  2
    1  3  4
    >>> df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
    >>> df.append(df2)
       A  B
    0  1  2
    1  3  4
    0  5  6
    1  7  8
    
    With `ignore_index` set to True:
    
    >>> df.append(df2, ignore_index=True)
       A  B
    0  1  2
    1  3  4
    2  5  6
    3  7  8



In [15]:

    
Territory3 = TerritoryUSA.append(TerritoryWorld)



In [16]:

    
Territory3









    Out[16]:






  
    
      
      CountryCode
      Name
      Region
      RepID
      SalesLastYear
      SalesYTD
      TerritoryID
    
  
  
    
      0
      US
      Northwest
      North America
      445.0
      3298694.490
      7887186.79
      1
    
    
      1
      US
      Northeast
      North America
      717.0
      3607148.940
      2402176.85
      2
    
    
      2
      US
      Central
      North America
      328.0
      3205014.080
      3072175.12
      3
    
    
      3
      US
      Southwest
      North America
      206.0
      5366575.710
      10510853.87
      4
    
    
      4
      US
      Southeast
      North America
      886.0
      3925071.430
      2538667.25
      5
    
    
      5
      CA
      Canada
      North America
      NaN
      5693988.860
      6771829.14
      6
    
    
      6
      FR
      France
      Europe
      NaN
      2396539.760
      4772398.31
      7
    
    
      7
      DE
      Germany
      Europe
      NaN
      1307949.790
      3805202.35
      8
    
    
      8
      AU
      Australia
      Pacific
      NaN
      2278548.980
      5977814.92
      9
    
    
      9
      GB
      United Kingdom
      Europe
      NaN
      1635823.400
      5012905.37
      10
    
    
      10
      BR
      Brazil
      South America
      NaN
      261589.958
      0.00
      11
    
    
      11
      MX
      Mexico
      North America
      NaN
      0.000
      0.00
      12

Joining and merging tables in Pandas

Join and merge are powerful tools for working with multiple tables. We will use them to answer some questions about the AdventureWorks dataset that you might encounter in real-life situations.

Join does fast table joining on a shared index. Merge does the same thing, but gives you the option to specify columns to join on. The idea of joining on a column will become clearer with some examples.

Example 1. "I want a list of all employees, and if any are salespeople then show me the details about their sales territory"

From AdventureWorks, we have a table "Employees" that gives a lot of information about AdventureWorks employees, like 'EmployeeID', 'ManagerID', 'TerritoryID', 'Title', 'FirstName','MiddleName', 'LastName', 'Suffix', 'JobTitle', 'NationalIDNumber', 'BirthDate', 'MaritalStatus', 'Gender', 'HireDate', 'SalariedFlag', 'VacationHours', 'SickLeaveHours', 'PhoneNumber', 'PhoneNumberType', 'EmailAddress', 'AddressLine1', 'AddressLine2', 'City', 'StateProvinceName', 'PostalCode', 'CountryName'. \

Since we're just being asked for a list of employees, we'll give the EmployeeID and their first, middle, and last names, and their role in the company (since additional information is requested for salespeople only). Then, for the salespeople, we must attach information about their sales territories, which is contained in the Territories table.

Notice that the Employees table has a column 'TerritoryID', which corresponds to the primary key in the 'Territory' table (in 'Territory', each territory has a unique 'TerritoryID'). We'll do a join on TerritoryID.



In [17]:

    
help(pd.merge)









    



Help on function merge in module pandas.tools.merge:

merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False)
    Merge DataFrame objects by performing a database-style join operation by
    columns or indexes.
    
    If joining columns on columns, the DataFrame indexes *will be
    ignored*. Otherwise if joining indexes on indexes or indexes on a column or
    columns, the index will be passed on.
    
    Parameters
    ----------
    left : DataFrame
    right : DataFrame
    how : {'left', 'right', 'outer', 'inner'}, default 'inner'
        * left: use only keys from left frame (SQL: left outer join)
        * right: use only keys from right frame (SQL: right outer join)
        * outer: use union of keys from both frames (SQL: full outer join)
        * inner: use intersection of keys from both frames (SQL: inner join)
    on : label or list
        Field names to join on. Must be found in both DataFrames. If on is
        None and not merging on indexes, then it merges on the intersection of
        the columns by default.
    left_on : label or list, or array-like
        Field names to join on in left DataFrame. Can be a vector or list of
        vectors of the length of the DataFrame to use a particular vector as
        the join key instead of columns
    right_on : label or list, or array-like
        Field names to join on in right DataFrame or vector/list of vectors per
        left_on docs
    left_index : boolean, default False
        Use the index from the left DataFrame as the join key(s). If it is a
        MultiIndex, the number of keys in the other DataFrame (either the index
        or a number of columns) must match the number of levels
    right_index : boolean, default False
        Use the index from the right DataFrame as the join key. Same caveats as
        left_index
    sort : boolean, default False
        Sort the join keys lexicographically in the result DataFrame
    suffixes : 2-length sequence (tuple, list, ...)
        Suffix to apply to overlapping column names in the left and right
        side, respectively
    copy : boolean, default True
        If False, do not copy data unnecessarily
    indicator : boolean or string, default False
        If True, adds a column to output DataFrame called "_merge" with
        information on the source of each row.
        If string, column with information on source of each row will be added to
        output DataFrame, and column will be named value of string.
        Information column is Categorical-type and takes on a value of "left_only"
        for observations whose merge key only appears in 'left' DataFrame,
        "right_only" for observations whose merge key only appears in 'right'
        DataFrame, and "both" if the observation's merge key is found in both.
    
        .. versionadded:: 0.17.0
    
    Examples
    --------
    
    >>> A              >>> B
        lkey value         rkey value
    0   foo  1         0   foo  5
    1   bar  2         1   bar  6
    2   baz  3         2   qux  7
    3   foo  4         3   bar  8
    
    >>> A.merge(B, left_on='lkey', right_on='rkey', how='outer')
       lkey  value_x  rkey  value_y
    0  foo   1        foo   5
    1  foo   4        foo   5
    2  bar   2        bar   6
    3  bar   2        bar   8
    4  baz   3        NaN   NaN
    5  NaN   NaN      qux   7
    
    Returns
    -------
    merged : DataFrame
        The output type will the be same as 'left', if it is a subclass
        of DataFrame.



In [18]:

    
Ans = pd.merge(Employees.loc[:,["EmployeeID","FirstName","MiddleName","LastName","JobTitle","TerritoryID"]], 
               Territory, 
               how='left', on='TerritoryID')
Ans.head()









    Out[18]:






  
    
      
      EmployeeID
      FirstName
      MiddleName
      LastName
      JobTitle
      TerritoryID
      Name
      CountryCode
      Region
      SalesYTD
      SalesLastYear
    
  
  
    
      0
      259
      Ben
      T
      Miller
      Buyer
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1
      278
      Garrett
      R
      Vargas
      Sales Representative
      6.0
      Canada
      CA
      North America
      6771829.14
      5693988.86
    
    
      2
      204
      Gabe
      B
      Mares
      Production Technician - WC40
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3
      78
      Reuben
      H
      D'sa
      Production Supervisor - WC40
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4
      255
      Gordon
      L
      Hee
      Buyer
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN



In [19]:

    
# Overachiever answer:
Ans['EmployeeName'] = Ans[["FirstName","MiddleName","LastName"]].apply(lambda x: x.LastName+", "+x.FirstName+" "+str(x.MiddleName), axis=1)
Ans = Ans[['EmployeeName', 'EmployeeID', 'JobTitle', 'TerritoryID', 'Name', 'CountryCode', 'Region', 'SalesYTD', 'SalesLastYear']]
Ans









    Out[19]:






  
    
      
      EmployeeName
      EmployeeID
      JobTitle
      TerritoryID
      Name
      CountryCode
      Region
      SalesYTD
      SalesLastYear
    
  
  
    
      0
      Miller, Ben T
      259
      Buyer
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1
      Vargas, Garrett R
      278
      Sales Representative
      6.0
      Canada
      CA
      North America
      6771829.14
      5693988.86
    
    
      2
      Mares, Gabe B
      204
      Production Technician - WC40
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3
      D'sa, Reuben H
      78
      Production Supervisor - WC40
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4
      Hee, Gordon L
      255
      Buyer
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      5
      Khanna, Karan R
      66
      Production Technician - WC60
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      6
      Ajenstat, François P
      270
      Database Administrator
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      7
      Harnpadoungsataya, Sariya E
      22
      Marketing Specialist
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      8
      Koenigsbauer, Kirk J
      161
      Production Technician - WC45
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      9
      Ralls, Kim T
      124
      Stocker
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      10
      Raheem, Michael nan
      10
      Research and Development Manager
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      11
      Seamans, Mike K
      248
      Accountant
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      12
      Koch, Reed T
      175
      Production Technician - WC30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      13
      Fakhouri, Fadi K
      155
      Production Technician - WC20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      14
      Singh, Paul R
      144
      Production Technician - WC20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      15
      Diaz, Brenda M
      192
      Production Supervisor - WC40
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      16
      Richins, Jack S
      166
      Production Supervisor - WC30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      17
      Evans, John P
      112
      Production Technician - WC50
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      18
      Myer, Ken L
      203
      Production Technician - WC40
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      19
      Moreland, Barbara C
      245
      Accountant
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      20
      Kurjan, Eric S
      257
      Buyer
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      21
      Walton, Bryan A
      244
      Accounts Receivable Specialist
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      22
      Masters, Steve F
      69
      Production Technician - WC60
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      23
      Newman, Belinda M
      210
      Production Technician - WC45
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      24
      Nayberg, Alex M
      163
      Production Technician - WC45
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      25
      Hall, Don L
      118
      Production Technician - WC50
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      26
      Lawrence, David Oliver
      176
      Production Technician - WC30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      27
      Reátegui Alayo, Sandra nan
      147
      Production Technician - WC30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      28
      Smith, Samantha H
      138
      Production Technician - WC20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      29
      Scardelis, Jim H
      95
      Production Technician - WC50
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      261
      Ortiz, David J
      70
      Production Technician - WC60
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      262
      Chai, Sean N
      219
      Document Control Assistant
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      263
      Watters, Jason M
      148
      Production Technician - WC30
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      264
      Samarawickrama, Prasanna E
      140
      Production Technician - WC20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      265
      Varkey Chudukatil, Ranjit R
      290
      Sales Representative
      7.0
      France
      FR
      Europe
      4772398.31
      2396539.76
    
    
      266
      Erickson, Gail A
      5
      Design Engineer
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      267
      Rao, Arvind B
      252
      Buyer
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      268
      Connelly, Peter I
      266
      Network Administrator
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      269
      Hedlund, Magnus E
      233
      Facilities Administrative Assistant
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      270
      Cook, Patrick M
      83
      Production Technician - WC40
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      271
      Sandberg, Mikael Q
      251
      Buyer
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      272
      Liu, David J
      241
      Accounts Manager
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      273
      Hamilton, James R
      25
      Vice President of Production
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      274
      Liu, Jinghao K
      108
      Production Supervisor - WC50
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      275
      Martin, Mindy C
      239
      Benefits Specialist
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      276
      Brown, Jo A
      27
      Production Supervisor - WC60
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      277
      Kim, Shane S
      186
      Production Supervisor - WC45
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      278
      Abercrombie, Kim B
      38
      Production Technician - WC60
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      279
      Sam, Raymond K
      154
      Production Technician - WC20
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      280
      Krapauskas, Mindaugas J
      114
      Production Technician - WC50
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      281
      Kuppa, Vamsi N
      123
      Shipping and Receiving Clerk
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      282
      McArthur, Mark K
      29
      Production Technician - WC60
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      283
      Pournasseh, Houman N
      101
      Production Technician - WC50
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      284
      Norred, Chris K
      221
      Control Specialist
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      285
      Wood, John L
      18
      Marketing Specialist
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      286
      Kogan, Eugene O
      46
      Production Technician - WC60
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      287
      Nusbaum, Tawana G
      202
      Production Technician - WC40
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      288
      Pak, Jae B
      289
      Sales Representative
      10.0
      United Kingdom
      GB
      Europe
      5012905.37
      1635823.40
    
    
      289
      Valdez, Rachel B
      288
      Sales Representative
      8.0
      Germany
      DE
      Europe
      3805202.35
      1307949.79
    
    
      290
      Smith, Chadwick nan
      999
      BI Professor
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

291 rows × 9 columns

"For the list above, limit the results to just salespeople"



In [20]:

    
Ans2 = Ans[Ans.JobTitle=='Sales Representative']
Ans2









    Out[20]:






  
    
      
      EmployeeName
      EmployeeID
      JobTitle
      TerritoryID
      Name
      CountryCode
      Region
      SalesYTD
      SalesLastYear
    
  
  
    
      1
      Vargas, Garrett R
      278
      Sales Representative
      6.0
      Canada
      CA
      North America
      6771829.14
      5693988.86
    
    
      54
      Campbell, David R
      283
      Sales Representative
      1.0
      Northwest
      US
      North America
      7887186.79
      3298694.49
    
    
      63
      Mitchell, Linda C
      276
      Sales Representative
      4.0
      Southwest
      US
      North America
      10510853.87
      5366575.71
    
    
      93
      Tsoflias, Lynn N
      286
      Sales Representative
      9.0
      Australia
      AU
      Pacific
      5977814.92
      2278548.98
    
    
      114
      Mensa-Annan, Tete A
      284
      Sales Representative
      1.0
      Northwest
      US
      North America
      7887186.79
      3298694.49
    
    
      163
      Ito, Shu K
      281
      Sales Representative
      4.0
      Southwest
      US
      North America
      10510853.87
      5366575.71
    
    
      176
      Ansman-Wolfe, Pamela O
      280
      Sales Representative
      1.0
      Northwest
      US
      North America
      7887186.79
      3298694.49
    
    
      230
      Carson, Jillian nan
      277
      Sales Representative
      3.0
      Central
      US
      North America
      3072175.12
      3205014.08
    
    
      235
      Blythe, Michael G
      275
      Sales Representative
      2.0
      Northeast
      US
      North America
      2402176.85
      3607148.94
    
    
      240
      Reiter, Tsvi Michael
      279
      Sales Representative
      5.0
      Southeast
      US
      North America
      2538667.25
      3925071.43
    
    
      256
      Saraiva, José Edvaldo
      282
      Sales Representative
      6.0
      Canada
      CA
      North America
      6771829.14
      5693988.86
    
    
      265
      Varkey Chudukatil, Ranjit R
      290
      Sales Representative
      7.0
      France
      FR
      Europe
      4772398.31
      2396539.76
    
    
      288
      Pak, Jae B
      289
      Sales Representative
      10.0
      United Kingdom
      GB
      Europe
      5012905.37
      1635823.40
    
    
      289
      Valdez, Rachel B
      288
      Sales Representative
      8.0
      Germany
      DE
      Europe
      3805202.35
      1307949.79



In [21]:

    
# Overachiever: What about *all* employees associated with sales?
Ans2 = Ans[Ans["JobTitle"].apply(lambda x: 'Sales' in x)]
Ans2









    Out[21]:






  
    
      
      EmployeeName
      EmployeeID
      JobTitle
      TerritoryID
      Name
      CountryCode
      Region
      SalesYTD
      SalesLastYear
    
  
  
    
      1
      Vargas, Garrett R
      278
      Sales Representative
      6.0
      Canada
      CA
      North America
      6771829.14
      5693988.86
    
    
      54
      Campbell, David R
      283
      Sales Representative
      1.0
      Northwest
      US
      North America
      7887186.79
      3298694.49
    
    
      59
      Jiang, Stephen Y
      274
      North American Sales Manager
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      63
      Mitchell, Linda C
      276
      Sales Representative
      4.0
      Southwest
      US
      North America
      10510853.87
      5366575.71
    
    
      93
      Tsoflias, Lynn N
      286
      Sales Representative
      9.0
      Australia
      AU
      Pacific
      5977814.92
      2278548.98
    
    
      114
      Mensa-Annan, Tete A
      284
      Sales Representative
      1.0
      Northwest
      US
      North America
      7887186.79
      3298694.49
    
    
      135
      Alberts, Amy E
      287
      European Sales Manager
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      163
      Ito, Shu K
      281
      Sales Representative
      4.0
      Southwest
      US
      North America
      10510853.87
      5366575.71
    
    
      176
      Ansman-Wolfe, Pamela O
      280
      Sales Representative
      1.0
      Northwest
      US
      North America
      7887186.79
      3298694.49
    
    
      212
      Abbas, Syed E
      285
      Pacific Sales Manager
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      222
      Welcker, Brian S
      273
      Vice President of Sales
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      230
      Carson, Jillian nan
      277
      Sales Representative
      3.0
      Central
      US
      North America
      3072175.12
      3205014.08
    
    
      235
      Blythe, Michael G
      275
      Sales Representative
      2.0
      Northeast
      US
      North America
      2402176.85
      3607148.94
    
    
      240
      Reiter, Tsvi Michael
      279
      Sales Representative
      5.0
      Southeast
      US
      North America
      2538667.25
      3925071.43
    
    
      256
      Saraiva, José Edvaldo
      282
      Sales Representative
      6.0
      Canada
      CA
      North America
      6771829.14
      5693988.86
    
    
      265
      Varkey Chudukatil, Ranjit R
      290
      Sales Representative
      7.0
      France
      FR
      Europe
      4772398.31
      2396539.76
    
    
      288
      Pak, Jae B
      289
      Sales Representative
      10.0
      United Kingdom
      GB
      Europe
      5012905.37
      1635823.40
    
    
      289
      Valdez, Rachel B
      288
      Sales Representative
      8.0
      Germany
      DE
      Europe
      3805202.35
      1307949.79

"Give me a list of our customers, and also tell me which sales territory they fall in."

This looks like another question for "merge"! We have a list of customers with their addresses, and we have a list of territories, but they are in separate tables.

Let's recover a list of customer names and IDs, together with corresponding sales territory names.

This time, we have to be careful, because "TerritoryID" in the Territory table matches "SalesTerritoryID" in the table Customers. So, we'll have to specify different columns names to merge on for the two tables.



In [22]:

    
Ans3 = pd.merge(Customers[["CustomerID","FirstName","LastName","SalesTerritoryID"]], 
                Territory[["TerritoryID","Name"]], 
                how='left', 
                left_on='SalesTerritoryID', right_on='TerritoryID', )
Ans3









    Out[22]:






  
    
      
      CustomerID
      FirstName
      LastName
      SalesTerritoryID
      TerritoryID
      Name
    
  
  
    
      0
      10101
      John
      Gray
      1
      1
      Northwest
    
    
      1
      10298
      Leroy
      Brown
      4
      4
      Southwest
    
    
      2
      10299
      Elroy
      Keller
      1
      1
      Northwest
    
    
      3
      10315
      Lisa
      Jones
      3
      3
      Central
    
    
      4
      10325
      Ginger
      Schultz
      1
      1
      Northwest
    
    
      5
      10329
      Kelly
      Mendoza
      5
      5
      Southeast
    
    
      6
      10330
      Shawn
      Dalton
      1
      1
      Northwest
    
    
      7
      10338
      Michael
      Howell
      1
      1
      Northwest
    
    
      8
      10339
      Anthony
      Sanchez
      4
      4
      Southwest
    
    
      9
      10408
      Elroy
      Cleaver
      4
      4
      Southwest
    
    
      10
      10410
      Mary Ann
      Howell
      5
      5
      Southeast
    
    
      11
      10413
      Donald
      Davids
      4
      4
      Southwest
    
    
      12
      10419
      Linda
      Sakahara
      4
      4
      Southwest
    
    
      13
      10429
      Sarah
      Graham
      5
      5
      Southeast
    
    
      14
      10438
      Kevin
      Smith
      3
      3
      Central
    
    
      15
      10439
      Conrad
      Giles
      3
      3
      Central
    
    
      16
      10449
      Isabela
      Moore
      4
      4
      Southwest

"Give me a list of all sales territories, also show what customers fall under them"



In [23]:

    
Ans = pd.merge(Territory, Customers, how="inner", left_on="TerritoryID", right_on="SalesTerritoryID")



In [24]:

    
Ans









    Out[24]:






  
    
      
      TerritoryID
      Name
      CountryCode
      Region
      SalesYTD
      SalesLastYear
      CustomerID
      SalesTerritoryID
      FirstName
      LastName
      City
      StateName
    
  
  
    
      0
      1
      Northwest
      US
      North America
      7887186.79
      3298694.49
      10101
      1
      John
      Gray
      Lynden
      Washington
    
    
      1
      1
      Northwest
      US
      North America
      7887186.79
      3298694.49
      10299
      1
      Elroy
      Keller
      Snoqualmie
      Washington
    
    
      2
      1
      Northwest
      US
      North America
      7887186.79
      3298694.49
      10325
      1
      Ginger
      Schultz
      Pocatello
      Idaho
    
    
      3
      1
      Northwest
      US
      North America
      7887186.79
      3298694.49
      10330
      1
      Shawn
      Dalton
      Cannon Beach
      Oregon
    
    
      4
      1
      Northwest
      US
      North America
      7887186.79
      3298694.49
      10338
      1
      Michael
      Howell
      Tillamook
      Oregon
    
    
      5
      3
      Central
      US
      North America
      3072175.12
      3205014.08
      10315
      3
      Lisa
      Jones
      Oshkosh
      Wisconsin
    
    
      6
      3
      Central
      US
      North America
      3072175.12
      3205014.08
      10438
      3
      Kevin
      Smith
      Durango
      Colorado
    
    
      7
      3
      Central
      US
      North America
      3072175.12
      3205014.08
      10439
      3
      Conrad
      Giles
      Telluride
      Colorado
    
    
      8
      4
      Southwest
      US
      North America
      10510853.87
      5366575.71
      10298
      4
      Leroy
      Brown
      Pinetop
      Arizona
    
    
      9
      4
      Southwest
      US
      North America
      10510853.87
      5366575.71
      10339
      4
      Anthony
      Sanchez
      Winslow
      Arizona
    
    
      10
      4
      Southwest
      US
      North America
      10510853.87
      5366575.71
      10408
      4
      Elroy
      Cleaver
      Globe
      Arizona
    
    
      11
      4
      Southwest
      US
      North America
      10510853.87
      5366575.71
      10413
      4
      Donald
      Davids
      Gila Bend
      Arizona
    
    
      12
      4
      Southwest
      US
      North America
      10510853.87
      5366575.71
      10419
      4
      Linda
      Sakahara
      Nogales
      Arizona
    
    
      13
      4
      Southwest
      US
      North America
      10510853.87
      5366575.71
      10449
      4
      Isabela
      Moore
      Yuma
      Arizona
    
    
      14
      5
      Southeast
      US
      North America
      2538667.25
      3925071.43
      10329
      5
      Kelly
      Mendoza
      Kailua
      Hawaii
    
    
      15
      5
      Southeast
      US
      North America
      2538667.25
      3925071.43
      10410
      5
      Mary Ann
      Howell
      Charleston
      South Carolina
    
    
      16
      5
      Southeast
      US
      North America
      2538667.25
      3925071.43
      10429
      5
      Sarah
      Graham
      Greensboro
      North Carolina



In [ ]:



In [ ]:



In [ ]:



In [ ]:

"Give me a list of the customers we have in North Carolina, and tell me how many there are."



In [25]:

    
# In-class exercise! :)



In [34]:

    
Customers[Customers.StateName=="North Carolina"].CustomerID.count()









    Out[34]:





1



In [ ]:



In [ ]:

"For each of the items ordered, show the total price (sometimes they ordered more than 1 item)"



In [26]:

    
# We'll use the Orders table for this! In-class exercise :)



In [41]:

    
Orders['TotalItemPrice'] = Orders.Quantity * Orders.Price



In [ ]:



In [ ]:

"Show a list of customers, and the total amount of money they have spent with AdventureWorks. I want the highest spenders to appear first!"



In [27]:

    
# In-class exercise! :)



In [52]:

    
pd.merge(Customers[["FirstName","LastName","CustomerID"]],Orders[["CustomerID","TotalItemPrice"]], how="inner", on="CustomerID").groupby(["CustomerID","FirstName","LastName"]).sum()









    Out[52]:






  
    
      
      
      
      TotalItemPrice
    
    
      CustomerID
      FirstName
      LastName
      
    
  
  
    
      10101
      John
      Gray
      813.95
    
    
      10298
      Leroy
      Brown
      147.88
    
    
      10299
      Elroy
      Keller
      1288.00
    
    
      10315
      Lisa
      Jones
      8.00
    
    
      10330
      Shawn
      Dalton
      156.75
    
    
      10339
      Anthony
      Sanchez
      4.50
    
    
      10410
      Mary Ann
      Howell
      281.72
    
    
      10413
      Donald
      Davids
      128.00
    
    
      10438
      Kevin
      Smith
      95.24
    
    
      10439
      Conrad
      Giles
      139.00
    
    
      10449
      Isabela
      Moore
      970.79



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:

Another side note:



In [28]:

    
help(pd.DataFrame.combine_first)









    



Help on function combine_first in module pandas.core.frame:

combine_first(self, other)
    Combine two DataFrame objects and default to non-null values in frame
    calling the method. Result index columns will be the union of the
    respective indexes and columns
    
    Parameters
    ----------
    other : DataFrame
    
    Examples
    --------
    a's values prioritized, use values from b to fill holes:
    
    >>> a.combine_first(b)
    
    
    Returns
    -------
    combined : DataFrame



In [29]:

    
help(pd.DataFrame.update)









    



Help on function update in module pandas.core.frame:

update(self, other, join='left', overwrite=True, filter_func=None, raise_conflict=False)
    Modify DataFrame in place using non-NA values from passed
    DataFrame. Aligns on indices
    
    Parameters
    ----------
    other : DataFrame, or object coercible into a DataFrame
    join : {'left'}, default 'left'
    overwrite : boolean, default True
        If True then overwrite values for common keys in the calling frame
    filter_func : callable(1d-array) -> 1d-array<boolean>, default None
        Can choose to replace values other than NA. Return True for values
        that should be updated
    raise_conflict : boolean
        If True, will raise an error if the DataFrame and other both
        contain data in the same place.



In [30]:

    
Customers









    Out[30]:






  
    
      
      CustomerID
      SalesTerritoryID
      FirstName
      LastName
      City
      StateName
    
  
  
    
      0
      10101
      1
      John
      Gray
      Lynden
      Washington
    
    
      1
      10298
      4
      Leroy
      Brown
      Pinetop
      Arizona
    
    
      2
      10299
      1
      Elroy
      Keller
      Snoqualmie
      Washington
    
    
      3
      10315
      3
      Lisa
      Jones
      Oshkosh
      Wisconsin
    
    
      4
      10325
      1
      Ginger
      Schultz
      Pocatello
      Idaho
    
    
      5
      10329
      5
      Kelly
      Mendoza
      Kailua
      Hawaii
    
    
      6
      10330
      1
      Shawn
      Dalton
      Cannon Beach
      Oregon
    
    
      7
      10338
      1
      Michael
      Howell
      Tillamook
      Oregon
    
    
      8
      10339
      4
      Anthony
      Sanchez
      Winslow
      Arizona
    
    
      9
      10408
      4
      Elroy
      Cleaver
      Globe
      Arizona
    
    
      10
      10410
      5
      Mary Ann
      Howell
      Charleston
      South Carolina
    
    
      11
      10413
      4
      Donald
      Davids
      Gila Bend
      Arizona
    
    
      12
      10419
      4
      Linda
      Sakahara
      Nogales
      Arizona
    
    
      13
      10429
      5
      Sarah
      Graham
      Greensboro
      North Carolina
    
    
      14
      10438
      3
      Kevin
      Smith
      Durango
      Colorado
    
    
      15
      10439
      3
      Conrad
      Giles
      Telluride
      Colorado
    
    
      16
      10449
      4
      Isabela
      Moore
      Yuma
      Arizona

	CustomerID	OrderDate	Item	Quantity	Price
0	10330	2004-06-30	Pogo stick	1	28.0
1	10101	2004-06-30	Raft	1	58.0
2	10298	2004-07-01	Skateboard	1	33.0
3	10101	2004-07-01	Life Vest	4	125.0
4	10299	2004-07-06	Parachute	1	1250.0

	EmployeeID	ManagerID	TerritoryID	Title	FirstName	MiddleName	LastName	Suffix	JobTitle	NationalIDNumber	...	SickLeaveHours	PhoneNumber	PhoneNumberType	EmailAddress	AddressLine1	AddressLine2	City	StateProvinceName	PostalCode	CountryName
0	259	250.0	NaN	NaN	Ben	T	Miller	NaN	Buyer	20269531	...	47	151-555-0113	Work	ben0@adventure-works.com	101 Candy Rd.	NaN	Redmond	Washington	98052	United States
1	278	274.0	6.0	NaN	Garrett	R	Vargas	NaN	Sales Representative	234474252	...	36	922-555-0165	Work	garrett1@mapleleafmail.ca	10203 Acorn Avenue	NaN	Calgary	Alberta	T2P 2G8	Canada
2	204	26.0	NaN	NaN	Gabe	B	Mares	NaN	Production Technician - WC40	440379437	...	48	310-555-0117	Work	gabe0@adventure-works.com	1061 Buskrik Avenue	NaN	Edmonds	Washington	98020	United States
3	78	26.0	NaN	NaN	Reuben	H	D'sa	NaN	Production Supervisor - WC40	370989364	...	56	191-555-0112	Work	reuben0@adventure-works.com	1064 Slow Creek Road	NaN	Seattle	Washington	98104	United States
4	255	250.0	NaN	NaN	Gordon	L	Hee	NaN	Buyer	466142721	...	46	230-555-0144	Cell	gordon0@adventure-works.com	108 Lakeside Court	NaN	Bellevue	Washington	98004	United States

	TerritoryID	Name	CountryCode	Region	SalesYTD	SalesLastYear
0	1	Northwest	US	North America	7887186.79	3298694.49
1	2	Northeast	US	North America	2402176.85	3607148.94
2	3	Central	US	North America	3072175.12	3205014.08
3	4	Southwest	US	North America	10510853.87	5366575.71
4	5	Southeast	US	North America	2538667.25	3925071.43

	CustomerID	SalesTerritoryID	FirstName	LastName	City	StateName
0	10101	1	John	Gray	Lynden	Washington
1	10298	4	Leroy	Brown	Pinetop	Arizona
2	10299	1	Elroy	Keller	Snoqualmie	Washington
3	10315	3	Lisa	Jones	Oshkosh	Wisconsin
4	10325	1	Ginger	Schultz	Pocatello	Idaho

	TerritoryID	Name	CountryCode	Region	SalesYTD	SalesLastYear
5	6	Canada	CA	North America	6771829.14	5693988.860
6	7	France	FR	Europe	4772398.31	2396539.760
7	8	Germany	DE	Europe	3805202.35	1307949.790
8	9	Australia	AU	Pacific	5977814.92	2278548.980
9	10	United Kingdom	GB	Europe	5012905.37	1635823.400
10	11	Brazil	BR	South America	0.00	261589.958
11	12	Mexico	MX	North America	0.00	0.000

	CountryCode	Name	Region	RepID	SalesLastYear	SalesYTD	TerritoryID
0	US	Northwest	North America	445.0	3298694.490	7887186.79	1
1	US	Northeast	North America	717.0	3607148.940	2402176.85	2
2	US	Central	North America	328.0	3205014.080	3072175.12	3
3	US	Southwest	North America	206.0	5366575.710	10510853.87	4
4	US	Southeast	North America	886.0	3925071.430	2538667.25	5
5	CA	Canada	North America	NaN	5693988.860	6771829.14	6
6	FR	France	Europe	NaN	2396539.760	4772398.31	7
7	DE	Germany	Europe	NaN	1307949.790	3805202.35	8
8	AU	Australia	Pacific	NaN	2278548.980	5977814.92	9
9	GB	United Kingdom	Europe	NaN	1635823.400	5012905.37	10
10	BR	Brazil	South America	NaN	261589.958	0.00	11
11	MX	Mexico	North America	NaN	0.000	0.00	12

	EmployeeName	EmployeeID	JobTitle	TerritoryID	Name	CountryCode	Region	SalesYTD	SalesLastYear
0	Miller, Ben T	259	Buyer	NaN	NaN	NaN	NaN	NaN	NaN
1	Vargas, Garrett R	278	Sales Representative	6.0	Canada	CA	North America	6771829.14	5693988.86
2	Mares, Gabe B	204	Production Technician - WC40	NaN	NaN	NaN	NaN	NaN	NaN
3	D'sa, Reuben H	78	Production Supervisor - WC40	NaN	NaN	NaN	NaN	NaN	NaN
4	Hee, Gordon L	255	Buyer	NaN	NaN	NaN	NaN	NaN	NaN
5	Khanna, Karan R	66	Production Technician - WC60	NaN	NaN	NaN	NaN	NaN	NaN
6	Ajenstat, François P	270	Database Administrator	NaN	NaN	NaN	NaN	NaN	NaN
7	Harnpadoungsataya, Sariya E	22	Marketing Specialist	NaN	NaN	NaN	NaN	NaN	NaN
8	Koenigsbauer, Kirk J	161	Production Technician - WC45	NaN	NaN	NaN	NaN	NaN	NaN
9	Ralls, Kim T	124	Stocker	NaN	NaN	NaN	NaN	NaN	NaN
10	Raheem, Michael nan	10	Research and Development Manager	NaN	NaN	NaN	NaN	NaN	NaN
11	Seamans, Mike K	248	Accountant	NaN	NaN	NaN	NaN	NaN	NaN
12	Koch, Reed T	175	Production Technician - WC30	NaN	NaN	NaN	NaN	NaN	NaN
13	Fakhouri, Fadi K	155	Production Technician - WC20	NaN	NaN	NaN	NaN	NaN	NaN
14	Singh, Paul R	144	Production Technician - WC20	NaN	NaN	NaN	NaN	NaN	NaN
15	Diaz, Brenda M	192	Production Supervisor - WC40	NaN	NaN	NaN	NaN	NaN	NaN
16	Richins, Jack S	166	Production Supervisor - WC30	NaN	NaN	NaN	NaN	NaN	NaN
17	Evans, John P	112	Production Technician - WC50	NaN	NaN	NaN	NaN	NaN	NaN
18	Myer, Ken L	203	Production Technician - WC40	NaN	NaN	NaN	NaN	NaN	NaN
19	Moreland, Barbara C	245	Accountant	NaN	NaN	NaN	NaN	NaN	NaN
20	Kurjan, Eric S	257	Buyer	NaN	NaN	NaN	NaN	NaN	NaN
21	Walton, Bryan A	244	Accounts Receivable Specialist	NaN	NaN	NaN	NaN	NaN	NaN
22	Masters, Steve F	69	Production Technician - WC60	NaN	NaN	NaN	NaN	NaN	NaN
23	Newman, Belinda M	210	Production Technician - WC45	NaN	NaN	NaN	NaN	NaN	NaN
24	Nayberg, Alex M	163	Production Technician - WC45	NaN	NaN	NaN	NaN	NaN	NaN
25	Hall, Don L	118	Production Technician - WC50	NaN	NaN	NaN	NaN	NaN	NaN
26	Lawrence, David Oliver	176	Production Technician - WC30	NaN	NaN	NaN	NaN	NaN	NaN
27	Reátegui Alayo, Sandra nan	147	Production Technician - WC30	NaN	NaN	NaN	NaN	NaN	NaN
28	Smith, Samantha H	138	Production Technician - WC20	NaN	NaN	NaN	NaN	NaN	NaN
29	Scardelis, Jim H	95	Production Technician - WC50	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...
261	Ortiz, David J	70	Production Technician - WC60	NaN	NaN	NaN	NaN	NaN	NaN
262	Chai, Sean N	219	Document Control Assistant	NaN	NaN	NaN	NaN	NaN	NaN
263	Watters, Jason M	148	Production Technician - WC30	NaN	NaN	NaN	NaN	NaN	NaN
264	Samarawickrama, Prasanna E	140	Production Technician - WC20	NaN	NaN	NaN	NaN	NaN	NaN
265	Varkey Chudukatil, Ranjit R	290	Sales Representative	7.0	France	FR	Europe	4772398.31	2396539.76
266	Erickson, Gail A	5	Design Engineer	NaN	NaN	NaN	NaN	NaN	NaN
267	Rao, Arvind B	252	Buyer	NaN	NaN	NaN	NaN	NaN	NaN
268	Connelly, Peter I	266	Network Administrator	NaN	NaN	NaN	NaN	NaN	NaN
269	Hedlund, Magnus E	233	Facilities Administrative Assistant	NaN	NaN	NaN	NaN	NaN	NaN
270	Cook, Patrick M	83	Production Technician - WC40	NaN	NaN	NaN	NaN	NaN	NaN
271	Sandberg, Mikael Q	251	Buyer	NaN	NaN	NaN	NaN	NaN	NaN
272	Liu, David J	241	Accounts Manager	NaN	NaN	NaN	NaN	NaN	NaN
273	Hamilton, James R	25	Vice President of Production	NaN	NaN	NaN	NaN	NaN	NaN
274	Liu, Jinghao K	108	Production Supervisor - WC50	NaN	NaN	NaN	NaN	NaN	NaN
275	Martin, Mindy C	239	Benefits Specialist	NaN	NaN	NaN	NaN	NaN	NaN
276	Brown, Jo A	27	Production Supervisor - WC60	NaN	NaN	NaN	NaN	NaN	NaN
277	Kim, Shane S	186	Production Supervisor - WC45	NaN	NaN	NaN	NaN	NaN	NaN
278	Abercrombie, Kim B	38	Production Technician - WC60	NaN	NaN	NaN	NaN	NaN	NaN
279	Sam, Raymond K	154	Production Technician - WC20	NaN	NaN	NaN	NaN	NaN	NaN
280	Krapauskas, Mindaugas J	114	Production Technician - WC50	NaN	NaN	NaN	NaN	NaN	NaN
281	Kuppa, Vamsi N	123	Shipping and Receiving Clerk	NaN	NaN	NaN	NaN	NaN	NaN
282	McArthur, Mark K	29	Production Technician - WC60	NaN	NaN	NaN	NaN	NaN	NaN
283	Pournasseh, Houman N	101	Production Technician - WC50	NaN	NaN	NaN	NaN	NaN	NaN
284	Norred, Chris K	221	Control Specialist	NaN	NaN	NaN	NaN	NaN	NaN
285	Wood, John L	18	Marketing Specialist	NaN	NaN	NaN	NaN	NaN	NaN
286	Kogan, Eugene O	46	Production Technician - WC60	NaN	NaN	NaN	NaN	NaN	NaN
287	Nusbaum, Tawana G	202	Production Technician - WC40	NaN	NaN	NaN	NaN	NaN	NaN
288	Pak, Jae B	289	Sales Representative	10.0	United Kingdom	GB	Europe	5012905.37	1635823.40
289	Valdez, Rachel B	288	Sales Representative	8.0	Germany	DE	Europe	3805202.35	1307949.79
290	Smith, Chadwick nan	999	BI Professor	NaN	NaN	NaN	NaN	NaN	NaN

			TotalItemPrice
CustomerID	FirstName	LastName
10101	John	Gray	813.95
10298	Leroy	Brown	147.88
10299	Elroy	Keller	1288.00
10315	Lisa	Jones	8.00
10330	Shawn	Dalton	156.75
10339	Anthony	Sanchez	4.50
10410	Mary Ann	Howell	281.72
10413	Donald	Davids	128.00
10438	Kevin	Smith	95.24
10439	Conrad	Giles	139.00
10449	Isabela	Moore	970.79