Rename this file before you work on it!

Exercise on Joins :: Part 2

Use the Adventure Works dataset to create the following reports. The dataset is availablt for download in it's original format at https://msdn.microsoft.com/en-us/library/hh403424.aspx (follow instructions to download).

Task

  1. write the Python Pandas expression to produce a table as described in the problem statements.
  2. The SQL expression may give you a hint. It also allows you to see both systems side-by-side.
  3. If you don't know SQL just ignore the SQL code.

In [1]:
import pandas as pd
import numpy as np

In [2]:
Empoyees = pd.read_excel('/home/data/AdventureWorks/Employees.xls')
Territory = pd.read_excel('/home/data/AdventureWorks/SalesTerritory.xls')
Customers = pd.read_excel('/home/data/AdventureWorks/Customers.xls')
Orders = pd.read_excel('/home/data/AdventureWorks/ItemsOrdered.xls')

1a. Provide a count of the number of salespeople in the company

SELECT COUNT(*) FROM dbo.Employees AS e INNER JOIN dbo.SalesTerritory AS st ON e.TerritoryID = st.TerritoryID ;

In [ ]:

1b. Instead of a count, list the salespeople along with their territories

Show columns: EmployeeID, FirstName, LastName, TerritoryName, CountryCode, Region

SELECT e.EmployeeID, e.FirstName, e.LastName, st.Name, st.CountryCode, st.Region FROM dbo.Employees AS e INNER JOIN dbo.SalesTerritory AS st ON e.TerritoryID = st.TerritoryID ;

In [ ]:

1c. Now include in the result set all other non-sales employees

SELECT e.EmployeeID, e.FirstName, e.LastName, st.Name, st.CountryCode, st.Region FROM dbo.Employees AS e LEFT OUTER JOIN dbo.SalesTerritory AS st ON e.TerritoryID = st.TerritoryID ;

In [ ]:

1d. Sort the results with the salespeople at the top

SELECT e.EmployeeID, e.FirstName, e.LastName, st.Name, st.CountryCode, st.Region FROM dbo.Employees AS e LEFT OUTER JOIN dbo.SalesTerritory AS st ON e.TerritoryID = st.TerritoryID ORDER BY COALESCE(st.Region, 'ZZZZZZZZZZZZZZ') ;

In [ ]:

1e. It turns out we don't need territory information for our American salespeople

Only show information for territories outside the U.S.

SELECT e.EmployeeID, e.FirstName, e.LastName, st.Name, st.CountryCode, st.Region FROM dbo.Employees AS e LEFT OUTER JOIN dbo.SalesTerritory AS st ON e.TerritoryID = st.TerritoryID AND st.CountryCode != 'US' ORDER BY COALESCE(st.Region, 'ZZZZZZZZZZZZZZ') ;

In [ ]:

2a. Take the query from #1b above, but add information for each employee's manager

Show additional columns: ManagerFirstName, ManagerLastName, ManagerJobTitle

SELECT e.EmployeeID, e.FirstName, e.LastName, st.Name, st.CountryCode, st.Region, mgr.FirstName, mgr.LastName, mgr.JobTitle FROM dbo.Employees AS e INNER JOIN dbo.SalesTerritory AS st ON e.TerritoryID = st.TerritoryID LEFT OUTER JOIN dbo.Employees as mgr on e.ManagerID = mgr.EmployeeID ;

In [ ]:

2b. Only show salespeople who report to Amy Alberts

SELECT e.EmployeeID, e.FirstName, e.LastName, st.Name, st.CountryCode, st.Region, mgr.FirstName, mgr.LastName, mgr.JobTitle FROM dbo.Employees AS e INNER JOIN dbo.SalesTerritory AS st ON e.TerritoryID = st.TerritoryID LEFT OUTER JOIN dbo.Employees as mgr ON e.ManagerID = mgr.EmployeeID WHERE CONCAT(mgr.FirstName, mgr.LastName) = 'AmyAlberts' ;

In [ ]:

3a. Give a list of all customers, and the average price of items ordered for each one

Show columns: CustomerID, FirstName, LastName, City, StateName, AvgPrice

SELECT c.CustomerID ,c.FirstName ,c.LastName ,c.City ,c.StateName ,AVG(io.Price) AS AvgPrice FROM dbo.Customers AS c LEFT OUTER JOIN dbo.ItemsOrdered AS io ON c.CustomerID = io.CustomerID GROUP BY c.CustomerID ,c.FirstName ,c.LastName ,c.City ,c.StateName ;

In [ ]:

3b. For those customers who have never ordered anything, put 0 as the average price

SELECT c.CustomerID ,c.FirstName ,c.LastName ,c.City ,c.StateName ,ISNULL(AVG(io.Price),0) AS AvgPrice FROM dbo.Customers AS c LEFT OUTER JOIN dbo.ItemsOrdered AS io ON c.CustomerID = io.CustomerID GROUP BY c.CustomerID ,c.FirstName ,c.LastName ,c.City ,c.StateName ;

In [ ]:

3c. Show the same results, but instead of average price per customer, tell me the average "Total Price" per customer

Make sure you change the alias of the AvgPrice column accordingly

SELECT c.CustomerID ,c.FirstName ,c.LastName ,c.City ,c.StateName ,ISNULL(AVG(io.Price*io.Quantity),0) AS AvgTotalPrice FROM dbo.Customers AS c LEFT OUTER JOIN dbo.ItemsOrdered AS io ON c.CustomerID = io.CustomerID GROUP BY c.CustomerID ,c.FirstName ,c.LastName ,c.City ,c.StateName ;

In [ ]:

4a. Someone has requested a report of all employees and their managers (if they don't have a manager, still display them)

Show columns: EmployeeID, EmployeeFullName, EmployeeTitle, BossFullName, BossTitle

SELECT emp.EmployeeID ,emp.Firstname + ' ' + emp.LastName AS EmployeeFullNameAlt ,CONCAT(emp.FirstName,' ',emp.LastName) AS EmployeeFullName ,emp.JobTitle AS EmployeeJobTitle ,CONCAT(mgr.FirstName,' ',mgr.LastName) AS BossFullName ,mgr.JobTitle AS BossJobTitle FROM dbo.Employees AS emp LEFT OUTER JOIN dbo.Employees AS mgr ON emp.ManagerID = mgr.EmployeeID ;

In [ ]:

4b. Another request from someone else has just come in, asking for a list of managers and their employees

Columns: BossFullName, BossTitle, EmployeeFullName, EmployeeTitle Sort by BossTitle then EmployeeTitle

SELECT CONCAT(mgr.FirstName,' ',mgr.LastName) AS BossFullName ,mgr.JobTitle AS BossJobTitle ,CONCAT(emp.FirstName,' ',emp.LastName) AS EmployeeFullName ,emp.JobTitle AS EmployeeJobTitle FROM dbo.Employees AS mgr INNER JOIN dbo.Employees AS emp ON emp.ManagerID = mgr.EmployeeID ORDER BY BossFullName ;

In [ ]:

4c. Provide a list of managers in the company

SELECT DISTINCT CONCAT(mgr.FirstName,' ',mgr.LastName) AS BossFullName ,mgr.JobTitle AS BossJobTitle FROM dbo.Employees AS mgr INNER JOIN dbo.Employees AS emp ON mgr.EmployeeID = emp.ManagerID ORDER BY BossFullName ;

In [ ]:

5a. Show me only the salespeople who live in the U.S. and their hire dates

Columns: FullName, HireDate

SELECT e.FirstName + ' ' + e.LastName AS FullName ,e.HireDate FROM dbo.Employees e INNER JOIN dbo.SalesTerritory s ON e.TerritoryID = s.TerritoryID WHERE e.CountryName = 'United States'

In [ ]:

5b. Show me only the salespeople who sell to U.S. customers and their hire dates, plus how much revenue their territory has generated this year

Columns: FullName, HireDate, TerritoryName, Revenue

SELECT e.FirstName + ' ' + e.LastName AS FullName ,e.HireDate ,s.Name AS TerritoryName ,s.SalesYTD AS Revenue FROM dbo.Employees e INNER JOIN dbo.SalesTerritory s ON e.TerritoryID = s.TerritoryID WHERE s.CountryCode = 'US'

In [ ]:

6. I need a list of all the salespeople and their territories, but also show the territories that don't have any salespeople assigned.

Columns: EmployeeID, FirstName, LastName, TerritoryName, TerritoryCountry, TerritoryRegion

SELECT e.EmployeeID ,e.FirstName ,e.LastName ,s.Name AS TerritoryName ,s.CountryCode AS TerritoryCountry ,s.Region AS TerritoryRegion FROM dbo.SalesTerritory AS s LEFT OUTER JOIN dbo.Employees AS e ON s.TerritoryID = e.TerritoryID ;

In [ ]:

7. Give me a list of all the customers, and if they ordered anything less than $30 show me those orders too

SELECT c.CustomerID ,c.FirstName ,c.LastName ,io.Item ,io.Price FROM dbo.Customers AS c LEFT OUTER JOIN dbo.ItemsOrdered AS io ON c.CustomerID = io.CustomerID AND io.Price < 30 ;

In [ ]:

9a. We need a list of each employee's boss's boss (i.e., 2 levels up)

Columns: EmployeeID, EmployeeFullName, EmployeeTitle, BossBossFullName, BossBossTitle
Sort by BossBossTitle and then EmployeeTitle

SELECT emp.EmployeeID ,emp.FirstName + ' ' + emp.LastName AS EmployeeFullName ,emp.JobTitle AS EmployeeTitle ,mgr2.FirstName + ' ' + mgr2.LastName AS BossBossFullName ,mgr2.JobTitle AS BossBossTitle FROM dbo.Employees emp INNER JOIN dbo.Employees mgr ON emp.ManagerID = mgr.EmployeeID INNER JOIN dbo.Employees mgr2 ON mgr.ManagerID = mgr2.EmployeeID ORDER BY BossBossTitle, EmployeeTitle ;

In [ ]:

9b. Same as above, except show all employees in the company

SELECT emp.EmployeeID ,emp.FirstName + ' ' + emp.LastName AS EmployeeFullName ,emp.JobTitle AS EmployeeTitle ,mgr2.FirstName + ' ' + mgr2.LastName AS BossBossFullName ,mgr2.JobTitle AS BossBossTitle FROM dbo.Employees emp LEFT OUTER JOIN dbo.Employees mgr ON emp.ManagerID = mgr.EmployeeID LEFT OUTER JOIN dbo.Employees mgr2 ON mgr.ManagerID = mgr2.EmployeeID ORDER BY BossBossTitle, EmployeeTitle ;

In [ ]:

9c. For those employees who don't have a boss's boss, display "n/a" for BossBossFullName and BossBossTitle

SELECT emp.EmployeeID ,emp.FirstName + ' ' + emp.LastName AS EmployeeFullName ,emp.JobTitle AS EmployeeTitle ,COALESCE(mgr2.FirstName + ' ' + mgr2.LastName, 'n/a') AS BossBossFullName ,COALESCE(mgr2.JobTitle, 'n/a') AS BossBossTitle FROM dbo.Employees emp LEFT OUTER JOIN dbo.Employees mgr ON emp.ManagerID = mgr.EmployeeID LEFT OUTER JOIN dbo.Employees mgr2 ON mgr.ManagerID = mgr2.EmployeeID ORDER BY BossBossTitle, EmployeeTitle ;

In [ ]:

9d. Change the sort so that the ones with "n/a" are at the top of the list

SELECT emp.EmployeeID ,emp.FirstName + ' ' + emp.LastName AS EmployeeFullName ,emp.JobTitle AS EmployeeTitle ,COALESCE(mgr2.FirstName + ' ' + mgr2.LastName, 'n/a') AS BossBossFullName ,COALESCE(mgr2.JobTitle, 'n/a') AS BossBossTitle FROM dbo.Employees emp LEFT OUTER JOIN dbo.Employees mgr ON emp.ManagerID = mgr.EmployeeID LEFT OUTER JOIN dbo.Employees mgr2 ON mgr.ManagerID = mgr2.EmployeeID ORDER BY mgr2.JobTitle, EmployeeTitle ;

In [ ]:

10a. We need a list of each manager's indirect reports (i.e., 2 levels down)

Columns: ManagerFullName, ManagerTitle, IndirectReportFullName, IndirectReportTitle
Sort by ManagerTitle and IndirectReportTitle

SELECT mgr.FirstName + ' ' + mgr.LastName AS ManagerFullName ,mgr.JobTitle AS ManagerTitle ,COUNT(*) AS NumIndirectReports FROM dbo.Employees mgr INNER JOIN dbo.Employees emp ON mgr.EmployeeID = emp.ManagerID INNER JOIN dbo.Employees emp2 ON emp.EmployeeID = emp2.ManagerID GROUP BY mgr.EmployeeID ,mgr.FirstName ,mgr.LastName ,mgr.JobTitle ORDER BY ManagerTitle ;

In [ ]:

10b. Please tell us how many indirect reports each manager has

Columns: ManagerFullName, ManagerTitle, NumIndirectReports

SELECT mgr.FirstName + ' ' + mgr.LastName AS ManagerFullName ,mgr.JobTitle AS ManagerTitle ,COUNT(*) AS NumIndirectReports FROM dbo.Employees mgr INNER JOIN dbo.Employees emp ON mgr.EmployeeID = emp.ManagerID INNER JOIN dbo.Employees emp2 ON emp.EmployeeID = emp2.ManagerID GROUP BY mgr.EmployeeID ,mgr.FirstName ,mgr.LastName ,mgr.JobTitle ORDER BY ManagerTitle ;

In [ ]:

11. Start with the query from #4c and change it so it returns just one number in the result: the count of managers in the company

SELECT COUNT(DISTINCT mgr.EmployeeID) AS NumManagers FROM dbo.Employees emp INNER JOIN dbo.Employees mgr ON emp.ManagerID = mgr.EmployeeID

In [ ]:

BONUS 1a. The HR people have come to you with a problem, they realized that some of the employee names were entered into the system with 2 spaces instead of 1

Find anyone with a FirstName, MiddleName, or LastName that has 2 spaces
Columns: EmployeeID, FirstName, MiddleName, LastName

SELECT e.EmployeeID ,e.FirstName ,e.MiddleName ,e.LastName FROM dbo.Employees e WHERE e.FirstName LIKE '% %' OR e.MiddleName LIKE '% %' OR e.LastName LIKE '% %'

In [ ]:

BONUS 1b. Display the same list but correct the names to display what they should look like

SELECT e.EmailAddress ,SUBSTRING(e.EmailAddress, CHARINDEX('@', e.EmailAddress)+1, 1000) FROM dbo.Employees as e ;

In [ ]: