Rename this file before you work on it!

Pandas Exercise

Use the Adventure Works dataset to create the following reports. The dataset is availablt for download in it's original format at https://msdn.microsoft.com/en-us/library/hh403424.aspx (follow instructions to download).

Task

  1. write the Python Pandas expression to produce a table as described in the problem statements.
  2. The SQL expression may give you a hint. It also allows you to see both systems side-by-side.
  3. If you don't know SQL just ignore the SQL code.

In [1]:
import pandas as pd
import numpy as np

In [3]:
%%time
Empoyees = pd.read_excel('/home/data/AdventureWorks/Employees.xls')


CPU times: user 69.4 ms, sys: 1.73 ms, total: 71.2 ms
Wall time: 69.8 ms

In [4]:
%%time
Territory = pd.read_excel('/home/data/AdventureWorks/SalesTerritory.xls')


CPU times: user 8.81 ms, sys: 4.56 ms, total: 13.4 ms
Wall time: 12.5 ms

In [5]:
%%time
Customers = pd.read_excel('/home/data/AdventureWorks/Customers.xls')


CPU times: user 8.05 ms, sys: 3.07 ms, total: 11.1 ms
Wall time: 10.3 ms

In [6]:
%%time
Orders = pd.read_excel('/home/data/AdventureWorks/ItemsOrdered.xls')


CPU times: user 11.2 ms, sys: 438 µs, total: 11.7 ms
Wall time: 10.3 ms

Filtering (with)

1. Provide a list of employees that are married.

SELECT e.EmployeeID, e.FirstName, e.LastName FROM dbo.Employees AS e WHERE e.MaritalStatus = 'M';

In [ ]:

2a. Show me a list of employees that have a lastname that begins with "R".

SELECT e.EmployeeID, e.FirstName, e.LastName FROM dbo.Employees AS e WHERE e.LastName LIKE 'R%';

In [ ]:

2b. Show me a list of employees that have a lastname that ends with "r"

SELECT e.EmployeeID, e.FirstName, e.LastName FROM dbo.Employees AS e WHERE e.LastName LIKE '%r';

In [ ]:

2c. Provide a list of employees that have a hyphenated lastname.

SELECT e.EmployeeID, e.FirstName, e.LastName FROM dbo.Employees AS e WHERE e.LastName LIKE '%-%';

In [ ]:

3a. Provide a list of employees that are on salary and have more than 35 vacation hours left.

SELECT e.EmployeeID ,e.FirstName ,e.LastName ,e.VacationHours ,e.SalariedFlag FROM dbo.Employees AS e WHERE (e.SalariedFlag = 1) AND (e.VacationHours > 35);

In [ ]:

3b. Show the same as above but limit it to American employees.

SELECT DISTINCT CountryName FROM dbo.Employees; SELECT e.EmployeeID ,e.FirstName ,e.LastName ,e.VacationHours ,e.SalariedFlag ,e.CountryName FROM dbo.Employees AS e WHERE (e.SalariedFlag = 1) AND (e.VacationHours > 5) AND e.CountryName = 'United States';

In [ ]:

3c. Show the same as above but limit it to non-American employees.

SELECT e.EmployeeID ,e.FirstName ,e.LastName ,e.VacationHours ,e.SalariedFlag ,e.CountryName FROM dbo.Employees AS e WHERE (e.SalariedFlag = 1) AND (e.VacationHours > 5) AND e.CountryName != 'United States';

In [ ]:

4a. List the married employees with more than 35 vacation hours, only ones living in Washington state.

SELECT e.EmployeeID ,e.FirstName ,e.LastName ,e.VacationHours ,e.SalariedFlag ,e.StateProvinceName ,e.CountryName FROM dbo.Employees AS e WHERE e.MaritalStatus = 'M' AND e.VacationHours > 35 AND e.StateProvinceName = 'Washington' ;

In [ ]:

4b. Change the logic to include anyone who meets any of the 3 conditions (i.e., people who are either married, live in Washington state, or have more than 35 vacation hours left)

SELECT e.EmployeeID ,e.FirstName ,e.LastName ,e.MaritalStatus ,e.VacationHours ,e.SalariedFlag ,e.StateProvinceName ,e.CountryName FROM dbo.Employees AS e WHERE e.MaritalStatus = 'M' OR e.VacationHours > 35 OR e.StateProvinceName = 'Washington' ;

In [ ]:

4c. Show the same as above, but only for Production Technicians

SELECT DISTINCT JobTitle FROM dbo.Employees; --- look at job titles SELECT e.EmployeeID ,e.FirstName ,e.LastName ,e.MaritalStatus ,e.JobTitle ,e.VacationHours ,e.SalariedFlag ,e.StateProvinceName ,e.CountryName FROM dbo.Employees AS e WHERE e.JobTitle LIKE 'Production Technician%' AND (e.MaritalStatus = 'M' OR e.VacationHours > 35 OR e.StateProvinceName = 'Washington') ;

In [ ]:

5a. List all employees living in Redmond, Seattle, and Bellevue, showing EmployeeID, FirstName, LastName, and City. Sort the list alphabetically by city.

SELECT e.EmployeeID, e.FirstName, e.LastName, e.City FROM dbo.Employees AS e WHERE e.City in ('Redmond', 'Seattle', 'Bellevue') ORDER BY e.City ;

In [ ]:

5b. For the list above, make sure these are only in the state of Washington, just to be careful.

SELECT DISTINCT e.City, e.StateProvinceName, e.CountryName FROM dbo.Employees AS e WHERE e.City in ('Redmond', 'Seattle', 'Bellevue') ; SELECT e.EmployeeID, e.FirstName, e.LastName, e.City, e.StateProvinceName FROM dbo.Employees AS e WHERE e.City in ('Redmond', 'Seattle', 'Bellevue') AND e.StateProvinceName = 'Washington' ORDER BY e.City ;

In [ ]:

6. Provide a list of employees who have no title, whether it's a NULL or empty string.

SELECT e.EmployeeID, e.Title, e.FirstName, e.LastName FROM dbo.Employees AS e WHERE e.Title IS NULL OR e.Title = '' ;

In [ ]:

7a. Provide a list of employees who have at least 60 vacation hours left.

SELECT e.EmployeeID, e.Title, e.FirstName, e.LastName, e.VacationHours, e.SickLeaveHours FROM dbo.Employees AS e WHERE e.VacationHours >= 60 ;

In [ ]:

7b. Provide a list of employees who have less than 60 vacation hours left.


In [ ]:
SELECT e.EmployeeID, e.Title, e.FirstName, e.LastName, e.VacationHours, e.SickLeaveHours
	FROM dbo.Employees AS e
	WHERE e.VacationHours < 60
	;

In [ ]:

7c. Show me employees who have more than 20 and less than 60 vacation hours left.

SELECT e.EmployeeID, e.Title, e.FirstName, e.LastName, e.VacationHours, e.SickLeaveHours FROM dbo.Employees AS e WHERE e.VacationHours BETWEEN 21 AND 59 ;

In [ ]:

7d. If you did not use BETWEEN for 7c, do the same but use BETWEEN. If you did use BETWEEN for 7c, do it another way.

SELECT e.EmployeeID, e.Title, e.FirstName, e.LastName, e.VacationHours, e.SickLeaveHours FROM dbo.Employees AS e WHERE e.VacationHours > 20 AND e.VacationHours < 60 ;

In [ ]:

Grouping

1a. What is the earliest birthdate for all employees?

SELECT MIN(e.BirthDate) FROM dbo.Employees AS e;

In [ ]:

1b. Add to the above, the most recent birthdate for all employees

SELECT MIN(e.BirthDate) AS 'Earliest Birthday', MAX(e.BirthDate) AS 'Most Reecent Birthday' FROM dbo.Employees AS e;

In [ ]:

1c. Show the above results broken down by gender

SELECT e.Gender, MIN(e.BirthDate) AS 'Earliest Birthday', MAX(e.BirthDate) AS 'Most Reecent Birthday' FROM dbo.Employees AS e GROUP BY e.Gender ;

In [ ]:

1d. Show the above results broken down by gender, and salaried/hourly

SELECT e.Gender, e.SalariedFlag, MIN(e.BirthDate) AS 'Earliest Birthday', MAX(e.BirthDate) AS 'Most Reecent Birthday' FROM dbo.Employees AS e GROUP BY e.Gender, e.SalariedFlag ;

In [ ]:

2a. What are the average vacation hours for all employees?

SELECT AVG(e.VacationHours) FROM dbo.Employees AS e ;

In [ ]:

2b. Add to the above, the minimum vacation hours for all employees

SELECT AVG(e.VacationHours) AS 'Average Vacation', MIN(e.VacationHours) AS 'Minumum Vacation' FROM dbo.Employees AS e ;

In [ ]:

2c. Show the above results broken down and ordered by job title

SELECT e.JobTitle, AVG(e.VacationHours) AS 'Average Vacation', MIN(e.VacationHours) AS 'Minimum Vacation' FROM dbo.Employees AS e GROUP BY e.JobTitle ;

In [ ]:

2d. Show the above results broken down by job title, and married/single employees

SELECT e.JobTitle, e.MaritalStatus, AVG(e.VacationHours) AS 'Average Vacation', MIN(e.VacationHours) AS 'Minimum Vacation' FROM dbo.Employees AS e GROUP BY e.JobTitle, e.MaritalStatus ;

In [ ]:

2e. Add to the above, the maximum vacation hours per group

SELECT e.JobTitle, e.MaritalStatus , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e GROUP BY e.JobTitle, e.MaritalStatus ;

In [ ]:

2f. Show the above results broken down by job title, married/single employees, and State

SELECT e.JobTitle, e.MaritalStatus, e.StateProvinceName , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e GROUP BY e.JobTitle, e.MaritalStatus, e.StateProvinceName ;

In [ ]:

2g. Show the above results but only for American employees

SELECT e.JobTitle, e.MaritalStatus, e.StateProvinceName , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e WHERE e.CountryName = 'United States' GROUP BY e.JobTitle, e.MaritalStatus, e.StateProvinceName ;

In [ ]:

2h. Change the grouping above so it's broken down by married/single and State, no more job title

SELECT e.MaritalStatus, e.StateProvinceName , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e WHERE e.CountryName = 'United States' GROUP BY e.MaritalStatus, e.StateProvinceName ;

In [ ]:

2i. Limit the results above to States where the average vacation hours is greater than 30

SELECT e.MaritalStatus, e.StateProvinceName , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e WHERE e.CountryName = 'United States' GROUP BY e.MaritalStatus, e.StateProvinceName HAVING AVG(e.VacationHours) > 30 ;
SELECT e.StateProvinceName , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e WHERE e.CountryName = 'United States' GROUP BY e.StateProvinceName HAVING AVG(e.VacationHours) > 30 ;

In [ ]:

2j. Limit the results above to States where the average vacation hours is greater than 30 and the maximum vacation hours is less than 50

SELECT e.StateProvinceName , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e WHERE e.CountryName = 'United States' GROUP BY e.StateProvinceName HAVING AVG(e.VacationHours) BETWEEN 31 AND 49 ;

In [ ]:

2k. Show the same results but only for non-American employees

SELECT e.StateProvinceName , AVG(e.VacationHours) AS 'Average Vacation' , MIN(e.VacationHours) AS 'Minimum Vacation' , MAX(e.VacationHours) AS 'Maximum Vacation' FROM dbo.Employees AS e WHERE e.CountryName != 'United States' GROUP BY e.StateProvinceName HAVING AVG(e.VacationHours) BETWEEN 31 AND 49 ;

In [ ]:

3a. Report how many employees are in the company

SELECT COUNT(1) FROM dbo.Employees;

In [ ]:

3b. For the above report, show the number of employees per manager (hint: use ManagerID)

SELECT e.ManagerID, COUNT(1) AS 'NumEmployees' FROM dbo.Employees AS e GROUP BY e.ManagerID ;

In [ ]:

3c. Remove any manager ID's that are NULL from the results above

SELECT e.ManagerID, COUNT(1) AS 'NumEmployees' FROM dbo.Employees AS e WHERE e.ManagerID IS NOT NULL GROUP BY e.ManagerID ;

In [ ]:

3d. Show the same results as above, but only for managers who have at least 5 employees

SELECT e.ManagerID, COUNT(1) AS 'NumEmployees' FROM dbo.Employees AS e WHERE e.ManagerID IS NOT NULL GROUP BY e.ManagerID HAVING COUNT(1) >= 5 ;

In [ ]:

4a. List the average vacation hours of all employees

SELECT AVG(e.VacationHours) FROM dbo.Employees AS e ;

In [ ]:

4b. Break down the results by State

SELECT e.StateProvinceName, AVG(e.VacationHours) AS 'Average Vacation Hours' FROM dbo.Employees AS e GROUP BY e.StateProvinceName ;

In [ ]:

4c. Break down the results by city and State

SELECT e.StateProvinceName, e.City, AVG(e.VacationHours) AS 'Average Vacation Hours' FROM dbo.Employees AS e GROUP BY e.StateProvinceName, e.City ;

In [ ]:

4d. Add something that shows the number of employees per city

SELECT e.StateProvinceName, e.City , AVG(e.VacationHours) AS 'Average Vacation Hours' , COUNT(1) AS 'NumEmployees' FROM dbo.Employees AS e GROUP BY e.StateProvinceName, e.City ;

In [ ]:

4e. Sort the results by the city and state

SELECT e.StateProvinceName, e.City , AVG(e.VacationHours) AS 'Average Vacation Hours' , COUNT(1) AS 'NumEmployees' FROM dbo.Employees AS e GROUP BY e.StateProvinceName, e.City ORDER BY e.StateProvinceName, e.City ;

In [ ]:

4f. Make city and State a single column in the format of "City, State"

SELECT CONCAT(e.StateProvinceName, ', ', e.City) AS 'Location' , AVG(e.VacationHours) AS 'Average Vacation Hours' , COUNT(1) AS 'NumEmployees' FROM dbo.Employees AS e GROUP BY e.StateProvinceName, e.City ORDER BY e.StateProvinceName, e.City ;

In [ ]:

4g. Add a column that shows the difference between the maximum vacation hours and minimum vacation hours for each city

SELECT CONCAT(e.StateProvinceName, ', ', e.City) AS 'Location' , AVG(e.VacationHours) AS 'Average Vacation Hours' , COUNT(1) AS 'NumEmployees' , MAX(e.VacationHours)-MIN(e.VacationHours) AS 'Difference Vacation Hours' FROM dbo.Employees AS e GROUP BY e.StateProvinceName, e.City ORDER BY e.StateProvinceName, e.City ;

In [ ]:

4h. Now sort the results by the new column created above

SELECT CONCAT(e.StateProvinceName, ', ', e.City) AS 'Location' , AVG(e.VacationHours) AS 'Average Vacation Hours' , COUNT(1) AS 'NumEmployees' , MAX(e.VacationHours)-MIN(e.VacationHours) AS 'Difference Vacation Hours' FROM dbo.Employees AS e GROUP BY e.StateProvinceName, e.City ORDER BY 'Difference Vacation Hours' DESC ;

In [ ]:

4i. Limit the results to cities that have more than 1 employee

SELECT CONCAT(e.StateProvinceName, ', ', e.City) AS 'Location' , AVG(e.VacationHours) AS 'Average Vacation Hours' , COUNT(1) AS 'NumEmployees' , MAX(e.VacationHours)-MIN(e.VacationHours) AS 'Difference Vacation Hours' FROM dbo.Employees AS e GROUP BY e.StateProvinceName, e.City HAVING COUNT(1) > 1 ORDER BY 'Difference Vacation Hours' DESC ;

In [ ]:

4j. Limit the results to non-U.S. cities

SELECT CONCAT(e.StateProvinceName, ', ', e.City) AS 'Location' , AVG(e.VacationHours) AS 'Average Vacation Hours' , COUNT(1) AS 'NumEmployees' , MAX(e.VacationHours)-MIN(e.VacationHours) AS 'Difference Vacation Hours' FROM dbo.Employees AS e WHERE e.CountryName != 'United States' GROUP BY e.StateProvinceName, e.City HAVING COUNT(1) > 1 ORDER BY 'Difference Vacation Hours' DESC ;

In [ ]: