Rename this file before you work on it!

Exercise on Joins :: Part 1

Use the Adventure Works dataset to create the following reports. The dataset is availablt for download in it's original format at https://msdn.microsoft.com/en-us/library/hh403424.aspx (follow instructions to download).

Task

  1. write the Python Pandas expression to produce a table as described in the problem statements.
  2. The SQL expression may give you a hint. It also allows you to see both systems side-by-side.
  3. If you don't know SQL just ignore the SQL code.

In [3]:
import pandas as pd
import numpy as np

In [4]:
Empoyees = pd.read_excel('/home/data/AdventureWorks/Employees.xls')
Territory = pd.read_excel('/home/data/AdventureWorks/SalesTerritory.xls')
Customers = pd.read_excel('/home/data/AdventureWorks/Customers.xls')
Orders = pd.read_excel('/home/data/AdventureWorks/ItemsOrdered.xls')

1a. Show me all the employees, and if any are salespeople then show me the details about their sales territory

SELECT e.EmployeeID ,e.FirstName + ' ' + e.LastName AS EmployeeName ,st.* FROM dbo.Employees AS e INNER JOIN dbo.SalesTerritory AS st ON e.TerritoryID = st.TerritoryID ;

In [ ]:

1b. Change the columns above

From Employees table use: EmployeeID, FirstName, LastName, TerritoryID From the SalesTerritory table use: show all the columns except for the TerritoryID

SELECT e.EmployeeID ,e.FirstName + ' ' + e.LastName AS EmployeeName ,e.TerritoryID ,st.Name ,st.CountryCode ,st.Region ,st.SalesYTD ,st.SalesLastYear FROM dbo.Employees AS e LEFT OUTER JOIN dbo.SalesTerritory AS st ON e.TerritoryID = st.TerritoryID ;

In [ ]:

1c. For the list above, limit the results to just salespeople

SELECT e.EmployeeID ,e.FirstName + ' ' + e.LastName AS EmployeeName ,e.TerritoryID ,st.Name ,st.CountryCode ,st.Region ,st.SalesYTD ,st.SalesLastYear FROM dbo.Employees AS e INNER JOIN dbo.SalesTerritory AS st ON e.TerritoryID = st.TerritoryID ;

In [ ]:

2a. Give me a list of customers

SELECT * FROM dbo.Customers as c ;

In [ ]:

2b. For those customers, also tell me which sales territory they fall in

SELECT * FROM dbo.Customers as c INNER JOIN dbo.SalesTerritory as st ON c.SalesTerritoryID = st.TerritoryID ;

In [ ]:

3a. Give me a list of sales territories

SELECT * FROM dbo.SalesTerritory as st ;

In [ ]:

3b. For those sales territories, also show what customers fall under them

SELECT * FROM dbo.SalesTerritory AS st LEFT OUTER JOIN dbo.Customers AS c ON c.SalesTerritoryID = st.TerritoryID ;

In [ ]:

3c. Make sure, if you didn't already, that any sales territories with 0 customers are also shown in the list

SELECT * FROM dbo.SalesTerritory AS st LEFT OUTER JOIN dbo.Customers AS c ON c.SalesTerritoryID = st.TerritoryID ;

In [ ]:

4a. Give me a list of the customers we have in North Carolina, and note how many there are.

SELECT c.StateName, COUNT(*) FROM dbo.Customers as c GROUP BY c.StateName ;
SELECT * FROM dbo.Customers as c WHERE c.StateName = 'North Carolina' ;

In [ ]:

4b. For our North Carolina customers, show me the items they have ordered

SELECT * FROM dbo.Customers as c LEFT OUTER JOIN dbo.ItemsOrdered as ords ON c.CustomerID = ords.CustomerID WHERE c.StateName = 'North Carolina' ;

In [ ]:

4c. Change it so we're only looking at customers who are in Arizona

SELECT * FROM dbo.Customers as c LEFT OUTER JOIN dbo.ItemsOrdered as ords ON c.CustomerID = ords.CustomerID WHERE c.StateName = 'Arizona' ;

In [ ]:

4d. For each of the items ordered, show the total price of the order (sometimes they ordered more than 1 item)

SELECT *, ords.Quantity*ords.Price AS Total FROM dbo.Customers as c LEFT OUTER JOIN dbo.ItemsOrdered as ords ON c.CustomerID = ords.CustomerID ;

In [ ]:

4e. For the Arizona customers above, show the maximum total price per customer. Choose the columns wisely

SELECT c.CustomerID, c.FirstName+' '+c.LastName AS Name, MAX(ords.Quantity*ords.Price) AS MaxTotalOrder FROM dbo.Customers as c INNER JOIN dbo.ItemsOrdered as ords ON c.CustomerID = ords.CustomerID WHERE c.StateName = 'Arizona' GROUP BY c.CustomerID, c.FirstName+' '+c.LastName ;

In [ ]:

4f. For the results above, only show the ones where the MaxTotalPrice is more than $5.00

SELECT c.CustomerID, c.FirstName+' '+c.LastName AS Name, MAX(ords.Quantity*ords.Price) AS MaxTotalOrder FROM dbo.Customers as c INNER JOIN dbo.ItemsOrdered as ords ON c.CustomerID = ords.CustomerID WHERE c.StateName = 'Arizona' GROUP BY c.CustomerID, c.FirstName+' '+c.LastName HAVING MAX(ords.Quantity*ords.Price)>5.0 ;

In [ ]:

BONUS 1: Take the query from #3c and add a column called "CityRegion" that combines dbo.Customers.City and

dbo.SalesTerritory.Region (in the format "City - Region") SELECT * ,COALESCE(c.City, 'n/a') -- , CONCAT(c.City, ' - ', st.Region) AS CityRegion FROM dbo.SalesTerritory AS st LEFT OUTER JOIN dbo.Customers AS c ON c.SalesTerritoryID = st.TerritoryID ;

In [ ]:

BONUS 2: Fix the problem above where CityRegion is NULL sometimes; change it to, for example, "n/a - North America"

SELECT *, CONCAT(COALESCE(c.City, 'n/a'), ' - ', st.Region) AS CityRegion FROM dbo.SalesTerritory AS st LEFT OUTER JOIN dbo.Customers AS c ON c.SalesTerritoryID = st.TerritoryID ;

In [ ]:

OVERACHIEVER 1: Why do you think a database is designed so the data is split up into different tables? Why not just put it all in one table?


In [ ]:

OVERACHIEVER 2a: Take the query from 1b and sort it by the sales region so that all the salespeople are the top of the list

SELECT e.EmployeeID ,e.FirstName + ' ' + e.LastName AS EmployeeName ,e.TerritoryID ,st.Name ,st.CountryCode ,st.Region ,st.SalesYTD ,st.SalesLastYear FROM dbo.Employees AS e LEFT OUTER JOIN dbo.SalesTerritory AS st ON e.TerritoryID = st.TerritoryID ORDER BY COALESCE(e.TerritoryID, 99999) ;

In [ ]:

OVERACHIEVER 2b: From the query above, if the sales territory is outside Europe, do not display it (but we still want to see all the employees in the company!)

SELECT e.EmployeeID ,e.FirstName + ' ' + e.LastName AS EmployeeName ,e.TerritoryID ,st.Name ,st.CountryCode ,st.Region ,st.SalesYTD ,st.SalesLastYear FROM dbo.Employees AS e LEFT OUTER JOIN dbo.SalesTerritory AS st ON e.TerritoryID = st.TerritoryID AND st.Region = 'Europe' ORDER BY COALESCE(e.TerritoryID, 99999) ;

In [ ]: