Rename this file before you work on it!
Exercise on Joins :: Part 1
Use the Adventure Works dataset to create the following reports. The dataset is availablt for download in it's original format at https://msdn.microsoft.com/en-us/library/hh403424.aspx (follow instructions to download).
Task
- write the Python Pandas expression to produce a table as described in the problem statements.
- The SQL expression may give you a hint. It also allows you to see both systems side-by-side.
- If you don't know SQL just ignore the SQL code.
1a. Show me all the employees, and if any are salespeople then show me the details about their sales territory
SELECT e.EmployeeID
,e.FirstName + ' ' + e.LastName AS EmployeeName
,st.*
FROM dbo.Employees AS e
INNER JOIN dbo.SalesTerritory AS st
ON e.TerritoryID = st.TerritoryID
;
1b. Change the columns above
From Employees table use: EmployeeID, FirstName, LastName, TerritoryID
From the SalesTerritory table use: show all the columns except for the TerritoryID
SELECT e.EmployeeID
,e.FirstName + ' ' + e.LastName AS EmployeeName
,e.TerritoryID
,st.Name
,st.CountryCode
,st.Region
,st.SalesYTD
,st.SalesLastYear
FROM dbo.Employees AS e
LEFT OUTER JOIN dbo.SalesTerritory AS st
ON e.TerritoryID = st.TerritoryID
;
1c. For the list above, limit the results to just salespeople
SELECT e.EmployeeID
,e.FirstName + ' ' + e.LastName AS EmployeeName
,e.TerritoryID
,st.Name
,st.CountryCode
,st.Region
,st.SalesYTD
,st.SalesLastYear
FROM dbo.Employees AS e
INNER JOIN dbo.SalesTerritory AS st
ON e.TerritoryID = st.TerritoryID
;
2a. Give me a list of customers
SELECT *
FROM dbo.Customers as c
;
2b. For those customers, also tell me which sales territory they fall in
SELECT *
FROM dbo.Customers as c
INNER JOIN dbo.SalesTerritory as st
ON c.SalesTerritoryID = st.TerritoryID
;
3a. Give me a list of sales territories
SELECT *
FROM dbo.SalesTerritory as st
;
3b. For those sales territories, also show what customers fall under them
SELECT *
FROM dbo.SalesTerritory AS st
LEFT OUTER JOIN dbo.Customers AS c
ON c.SalesTerritoryID = st.TerritoryID
;
3c. Make sure, if you didn't already, that any sales territories with 0 customers are also shown in the list
SELECT *
FROM dbo.SalesTerritory AS st
LEFT OUTER JOIN dbo.Customers AS c
ON c.SalesTerritoryID = st.TerritoryID
;
4a. Give me a list of the customers we have in North Carolina, and note how many there are.
SELECT c.StateName, COUNT(*)
FROM dbo.Customers as c
GROUP BY c.StateName
;
SELECT *
FROM dbo.Customers as c
WHERE c.StateName = 'North Carolina'
;
4b. For our North Carolina customers, show me the items they have ordered
SELECT *
FROM dbo.Customers as c
LEFT OUTER JOIN dbo.ItemsOrdered as ords
ON c.CustomerID = ords.CustomerID
WHERE c.StateName = 'North Carolina'
;
4c. Change it so we're only looking at customers who are in Arizona
SELECT *
FROM dbo.Customers as c
LEFT OUTER JOIN dbo.ItemsOrdered as ords
ON c.CustomerID = ords.CustomerID
WHERE c.StateName = 'Arizona'
;
4d. For each of the items ordered, show the total price of the order (sometimes they ordered more than 1 item)
SELECT *, ords.Quantity*ords.Price AS Total
FROM dbo.Customers as c
LEFT OUTER JOIN dbo.ItemsOrdered as ords
ON c.CustomerID = ords.CustomerID
;
4e. For the Arizona customers above, show the maximum total price per customer. Choose the columns wisely
SELECT c.CustomerID, c.FirstName+' '+c.LastName AS Name, MAX(ords.Quantity*ords.Price) AS MaxTotalOrder
FROM dbo.Customers as c
INNER JOIN dbo.ItemsOrdered as ords
ON c.CustomerID = ords.CustomerID
WHERE c.StateName = 'Arizona'
GROUP BY c.CustomerID, c.FirstName+' '+c.LastName
;
4f. For the results above, only show the ones where the MaxTotalPrice is more than $5.00
SELECT c.CustomerID, c.FirstName+' '+c.LastName AS Name, MAX(ords.Quantity*ords.Price) AS MaxTotalOrder
FROM dbo.Customers as c
INNER JOIN dbo.ItemsOrdered as ords
ON c.CustomerID = ords.CustomerID
WHERE c.StateName = 'Arizona'
GROUP BY c.CustomerID, c.FirstName+' '+c.LastName
HAVING MAX(ords.Quantity*ords.Price)>5.0
;
BONUS 1: Take the query from #3c and add a column called "CityRegion" that combines dbo.Customers.City and
dbo.SalesTerritory.Region (in the format "City - Region")
SELECT *
,COALESCE(c.City, 'n/a') -- , CONCAT(c.City, ' - ', st.Region) AS CityRegion
FROM dbo.SalesTerritory AS st
LEFT OUTER JOIN dbo.Customers AS c
ON c.SalesTerritoryID = st.TerritoryID
;
BONUS 2: Fix the problem above where CityRegion is NULL sometimes; change it to, for example, "n/a - North America"
SELECT *, CONCAT(COALESCE(c.City, 'n/a'), ' - ', st.Region) AS CityRegion
FROM dbo.SalesTerritory AS st
LEFT OUTER JOIN dbo.Customers AS c
ON c.SalesTerritoryID = st.TerritoryID
;
OVERACHIEVER 1: Why do you think a database is designed so the data is split up into different tables? Why not just put it all in one table?
OVERACHIEVER 2a: Take the query from 1b and sort it by the sales region so that all the salespeople are the top of the list
SELECT e.EmployeeID
,e.FirstName + ' ' + e.LastName AS EmployeeName
,e.TerritoryID
,st.Name
,st.CountryCode
,st.Region
,st.SalesYTD
,st.SalesLastYear
FROM dbo.Employees AS e
LEFT OUTER JOIN dbo.SalesTerritory AS st
ON e.TerritoryID = st.TerritoryID
ORDER BY COALESCE(e.TerritoryID, 99999)
;
OVERACHIEVER 2b: From the query above, if the sales territory is outside Europe, do not display it (but we still want to see all the employees in the company!)
SELECT e.EmployeeID
,e.FirstName + ' ' + e.LastName AS EmployeeName
,e.TerritoryID
,st.Name
,st.CountryCode
,st.Region
,st.SalesYTD
,st.SalesLastYear
FROM dbo.Employees AS e
LEFT OUTER JOIN dbo.SalesTerritory AS st
ON e.TerritoryID = st.TerritoryID AND st.Region = 'Europe'
ORDER BY COALESCE(e.TerritoryID, 99999)
;