Rename this file before you work on it!
Pandas Exercise
Use the Adventure Works dataset to create the following reports. The dataset is availablt for download in it's original format at https://msdn.microsoft.com/en-us/library/hh403424.aspx (follow instructions to download).
Task
- write the Python Pandas expression to produce a table as described in the problem statements.
- The SQL expression may give you a hint. It also allows you to see both systems side-by-side.
- If you don't know SQL just ignore the SQL code.
CPU times: user 69.4 ms, sys: 1.73 ms, total: 71.2 ms
Wall time: 69.8 ms
CPU times: user 8.81 ms, sys: 4.56 ms, total: 13.4 ms
Wall time: 12.5 ms
CPU times: user 8.05 ms, sys: 3.07 ms, total: 11.1 ms
Wall time: 10.3 ms
CPU times: user 11.2 ms, sys: 438 µs, total: 11.7 ms
Wall time: 10.3 ms
1. Provide a list of employees that are married.
SELECT e.EmployeeID, e.FirstName, e.LastName FROM dbo.Employees AS e
WHERE e.MaritalStatus = 'M';
2a. Show me a list of employees that have a lastname that begins with "R".
SELECT e.EmployeeID, e.FirstName, e.LastName FROM dbo.Employees AS e
WHERE e.LastName LIKE 'R%';
2b. Show me a list of employees that have a lastname that ends with "r"
SELECT e.EmployeeID, e.FirstName, e.LastName FROM dbo.Employees AS e
WHERE e.LastName LIKE '%r';
2c. Provide a list of employees that have a hyphenated lastname.
SELECT e.EmployeeID, e.FirstName, e.LastName FROM dbo.Employees AS e
WHERE e.LastName LIKE '%-%';
3a. Provide a list of employees that are on salary and have more than 35 vacation hours left.
SELECT e.EmployeeID
,e.FirstName
,e.LastName
,e.VacationHours
,e.SalariedFlag
FROM dbo.Employees AS e
WHERE (e.SalariedFlag = 1) AND (e.VacationHours > 35);
3b. Show the same as above but limit it to American employees.
SELECT DISTINCT CountryName FROM dbo.Employees;
SELECT e.EmployeeID
,e.FirstName
,e.LastName
,e.VacationHours
,e.SalariedFlag
,e.CountryName
FROM dbo.Employees AS e
WHERE (e.SalariedFlag = 1) AND (e.VacationHours > 5) AND e.CountryName = 'United States';
3c. Show the same as above but limit it to non-American employees.
SELECT e.EmployeeID
,e.FirstName
,e.LastName
,e.VacationHours
,e.SalariedFlag
,e.CountryName
FROM dbo.Employees AS e
WHERE (e.SalariedFlag = 1) AND (e.VacationHours > 5) AND e.CountryName != 'United States';
4a. List the married employees with more than 35 vacation hours, only ones living in Washington state.
SELECT e.EmployeeID
,e.FirstName
,e.LastName
,e.VacationHours
,e.SalariedFlag
,e.StateProvinceName
,e.CountryName
FROM dbo.Employees AS e
WHERE e.MaritalStatus = 'M' AND e.VacationHours > 35 AND e.StateProvinceName = 'Washington'
;
4b. Change the logic to include anyone who meets any of the 3 conditions (i.e., people who are either married, live in Washington state, or have more than 35 vacation hours left)
SELECT e.EmployeeID
,e.FirstName
,e.LastName
,e.MaritalStatus
,e.VacationHours
,e.SalariedFlag
,e.StateProvinceName
,e.CountryName
FROM dbo.Employees AS e
WHERE e.MaritalStatus = 'M' OR e.VacationHours > 35 OR e.StateProvinceName = 'Washington'
;
4c. Show the same as above, but only for Production Technicians
SELECT DISTINCT JobTitle FROM dbo.Employees; --- look at job titles
SELECT e.EmployeeID
,e.FirstName
,e.LastName
,e.MaritalStatus
,e.JobTitle
,e.VacationHours
,e.SalariedFlag
,e.StateProvinceName
,e.CountryName
FROM dbo.Employees AS e
WHERE e.JobTitle LIKE 'Production Technician%'
AND (e.MaritalStatus = 'M' OR e.VacationHours > 35 OR e.StateProvinceName = 'Washington')
;
5a. List all employees living in Redmond, Seattle, and Bellevue, showing EmployeeID, FirstName, LastName, and City. Sort the list alphabetically by city.
SELECT e.EmployeeID, e.FirstName, e.LastName, e.City
FROM dbo.Employees AS e
WHERE e.City in ('Redmond', 'Seattle', 'Bellevue')
ORDER BY e.City
;
5b. For the list above, make sure these are only in the state of Washington, just to be careful.
SELECT DISTINCT e.City, e.StateProvinceName, e.CountryName
FROM dbo.Employees AS e
WHERE e.City in ('Redmond', 'Seattle', 'Bellevue')
;
SELECT e.EmployeeID, e.FirstName, e.LastName, e.City, e.StateProvinceName
FROM dbo.Employees AS e
WHERE e.City in ('Redmond', 'Seattle', 'Bellevue') AND e.StateProvinceName = 'Washington'
ORDER BY e.City
;
6. Provide a list of employees who have no title, whether it's a NULL or empty string.
SELECT e.EmployeeID, e.Title, e.FirstName, e.LastName
FROM dbo.Employees AS e
WHERE e.Title IS NULL OR e.Title = ''
;
7a. Provide a list of employees who have at least 60 vacation hours left.
SELECT e.EmployeeID, e.Title, e.FirstName, e.LastName, e.VacationHours, e.SickLeaveHours
FROM dbo.Employees AS e
WHERE e.VacationHours >= 60
;
7b. Provide a list of employees who have less than 60 vacation hours left.
7c. Show me employees who have more than 20 and less than 60 vacation hours left.
SELECT e.EmployeeID, e.Title, e.FirstName, e.LastName, e.VacationHours, e.SickLeaveHours
FROM dbo.Employees AS e
WHERE e.VacationHours BETWEEN 21 AND 59
;
7d. If you did not use BETWEEN for 7c, do the same but use BETWEEN. If you did use BETWEEN for 7c, do it another way.
SELECT e.EmployeeID, e.Title, e.FirstName, e.LastName, e.VacationHours, e.SickLeaveHours
FROM dbo.Employees AS e
WHERE e.VacationHours > 20 AND e.VacationHours < 60
;
1a. What is the earliest birthdate for all employees?
SELECT MIN(e.BirthDate) FROM dbo.Employees AS e;
1b. Add to the above, the most recent birthdate for all employees
SELECT MIN(e.BirthDate) AS 'Earliest Birthday', MAX(e.BirthDate) AS 'Most Reecent Birthday'
FROM dbo.Employees AS e;
1c. Show the above results broken down by gender
SELECT e.Gender, MIN(e.BirthDate) AS 'Earliest Birthday', MAX(e.BirthDate) AS 'Most Reecent Birthday'
FROM dbo.Employees AS e
GROUP BY e.Gender
;
1d. Show the above results broken down by gender, and salaried/hourly
SELECT e.Gender, e.SalariedFlag, MIN(e.BirthDate) AS 'Earliest Birthday', MAX(e.BirthDate) AS 'Most Reecent Birthday'
FROM dbo.Employees AS e
GROUP BY e.Gender, e.SalariedFlag
;
2a. What are the average vacation hours for all employees?
SELECT AVG(e.VacationHours)
FROM dbo.Employees AS e
;
2b. Add to the above, the minimum vacation hours for all employees
SELECT AVG(e.VacationHours) AS 'Average Vacation', MIN(e.VacationHours) AS 'Minumum Vacation'
FROM dbo.Employees AS e
;
2c. Show the above results broken down and ordered by job title
SELECT e.JobTitle, AVG(e.VacationHours) AS 'Average Vacation', MIN(e.VacationHours) AS 'Minimum Vacation'
FROM dbo.Employees AS e
GROUP BY e.JobTitle
;
2d. Show the above results broken down by job title, and married/single employees
SELECT e.JobTitle, e.MaritalStatus, AVG(e.VacationHours) AS 'Average Vacation', MIN(e.VacationHours) AS 'Minimum Vacation'
FROM dbo.Employees AS e
GROUP BY e.JobTitle, e.MaritalStatus
;
2e. Add to the above, the maximum vacation hours per group
SELECT e.JobTitle, e.MaritalStatus
, AVG(e.VacationHours) AS 'Average Vacation'
, MIN(e.VacationHours) AS 'Minimum Vacation'
, MAX(e.VacationHours) AS 'Maximum Vacation'
FROM dbo.Employees AS e
GROUP BY e.JobTitle, e.MaritalStatus
;
2f. Show the above results broken down by job title, married/single employees, and State
SELECT e.JobTitle, e.MaritalStatus, e.StateProvinceName
, AVG(e.VacationHours) AS 'Average Vacation'
, MIN(e.VacationHours) AS 'Minimum Vacation'
, MAX(e.VacationHours) AS 'Maximum Vacation'
FROM dbo.Employees AS e
GROUP BY e.JobTitle, e.MaritalStatus, e.StateProvinceName
;
2g. Show the above results but only for American employees
SELECT e.JobTitle, e.MaritalStatus, e.StateProvinceName
, AVG(e.VacationHours) AS 'Average Vacation'
, MIN(e.VacationHours) AS 'Minimum Vacation'
, MAX(e.VacationHours) AS 'Maximum Vacation'
FROM dbo.Employees AS e
WHERE e.CountryName = 'United States'
GROUP BY e.JobTitle, e.MaritalStatus, e.StateProvinceName
;
2h. Change the grouping above so it's broken down by married/single and State, no more job title
SELECT e.MaritalStatus, e.StateProvinceName
, AVG(e.VacationHours) AS 'Average Vacation'
, MIN(e.VacationHours) AS 'Minimum Vacation'
, MAX(e.VacationHours) AS 'Maximum Vacation'
FROM dbo.Employees AS e
WHERE e.CountryName = 'United States'
GROUP BY e.MaritalStatus, e.StateProvinceName
;
2i. Limit the results above to States where the average vacation hours is greater than 30
SELECT e.MaritalStatus, e.StateProvinceName
, AVG(e.VacationHours) AS 'Average Vacation'
, MIN(e.VacationHours) AS 'Minimum Vacation'
, MAX(e.VacationHours) AS 'Maximum Vacation'
FROM dbo.Employees AS e
WHERE e.CountryName = 'United States'
GROUP BY e.MaritalStatus, e.StateProvinceName
HAVING AVG(e.VacationHours) > 30
;
SELECT e.StateProvinceName
, AVG(e.VacationHours) AS 'Average Vacation'
, MIN(e.VacationHours) AS 'Minimum Vacation'
, MAX(e.VacationHours) AS 'Maximum Vacation'
FROM dbo.Employees AS e
WHERE e.CountryName = 'United States'
GROUP BY e.StateProvinceName
HAVING AVG(e.VacationHours) > 30
;
2j. Limit the results above to States where the average vacation hours is greater than 30 and the maximum vacation hours is less than 50
SELECT e.StateProvinceName
, AVG(e.VacationHours) AS 'Average Vacation'
, MIN(e.VacationHours) AS 'Minimum Vacation'
, MAX(e.VacationHours) AS 'Maximum Vacation'
FROM dbo.Employees AS e
WHERE e.CountryName = 'United States'
GROUP BY e.StateProvinceName
HAVING AVG(e.VacationHours) BETWEEN 31 AND 49
;
2k. Show the same results but only for non-American employees
SELECT e.StateProvinceName
, AVG(e.VacationHours) AS 'Average Vacation'
, MIN(e.VacationHours) AS 'Minimum Vacation'
, MAX(e.VacationHours) AS 'Maximum Vacation'
FROM dbo.Employees AS e
WHERE e.CountryName != 'United States'
GROUP BY e.StateProvinceName
HAVING AVG(e.VacationHours) BETWEEN 31 AND 49
;
3a. Report how many employees are in the company
SELECT COUNT(1) FROM dbo.Employees;
3b. For the above report, show the number of employees per manager (hint: use ManagerID)
SELECT e.ManagerID, COUNT(1) AS 'NumEmployees'
FROM dbo.Employees AS e
GROUP BY e.ManagerID
;
3c. Remove any manager ID's that are NULL from the results above
SELECT e.ManagerID, COUNT(1) AS 'NumEmployees'
FROM dbo.Employees AS e
WHERE e.ManagerID IS NOT NULL
GROUP BY e.ManagerID
;
3d. Show the same results as above, but only for managers who have at least 5 employees
SELECT e.ManagerID, COUNT(1) AS 'NumEmployees'
FROM dbo.Employees AS e
WHERE e.ManagerID IS NOT NULL
GROUP BY e.ManagerID
HAVING COUNT(1) >= 5
;
4a. List the average vacation hours of all employees
SELECT AVG(e.VacationHours)
FROM dbo.Employees AS e
;
4b. Break down the results by State
SELECT e.StateProvinceName, AVG(e.VacationHours) AS 'Average Vacation Hours'
FROM dbo.Employees AS e
GROUP BY e.StateProvinceName
;
4c. Break down the results by city and State
SELECT e.StateProvinceName, e.City, AVG(e.VacationHours) AS 'Average Vacation Hours'
FROM dbo.Employees AS e
GROUP BY e.StateProvinceName, e.City
;
4d. Add something that shows the number of employees per city
SELECT e.StateProvinceName, e.City
, AVG(e.VacationHours) AS 'Average Vacation Hours'
, COUNT(1) AS 'NumEmployees'
FROM dbo.Employees AS e
GROUP BY e.StateProvinceName, e.City
;
4e. Sort the results by the city and state
SELECT e.StateProvinceName, e.City
, AVG(e.VacationHours) AS 'Average Vacation Hours'
, COUNT(1) AS 'NumEmployees'
FROM dbo.Employees AS e
GROUP BY e.StateProvinceName, e.City
ORDER BY e.StateProvinceName, e.City
;
SELECT CONCAT(e.StateProvinceName, ', ', e.City) AS 'Location'
, AVG(e.VacationHours) AS 'Average Vacation Hours'
, COUNT(1) AS 'NumEmployees'
FROM dbo.Employees AS e
GROUP BY e.StateProvinceName, e.City
ORDER BY e.StateProvinceName, e.City
;
4g. Add a column that shows the difference between the maximum vacation hours and minimum vacation hours for each city
SELECT CONCAT(e.StateProvinceName, ', ', e.City) AS 'Location'
, AVG(e.VacationHours) AS 'Average Vacation Hours'
, COUNT(1) AS 'NumEmployees'
, MAX(e.VacationHours)-MIN(e.VacationHours) AS 'Difference Vacation Hours'
FROM dbo.Employees AS e
GROUP BY e.StateProvinceName, e.City
ORDER BY e.StateProvinceName, e.City
;
4h. Now sort the results by the new column created above
SELECT CONCAT(e.StateProvinceName, ', ', e.City) AS 'Location'
, AVG(e.VacationHours) AS 'Average Vacation Hours'
, COUNT(1) AS 'NumEmployees'
, MAX(e.VacationHours)-MIN(e.VacationHours) AS 'Difference Vacation Hours'
FROM dbo.Employees AS e
GROUP BY e.StateProvinceName, e.City
ORDER BY 'Difference Vacation Hours' DESC
;
4i. Limit the results to cities that have more than 1 employee
SELECT CONCAT(e.StateProvinceName, ', ', e.City) AS 'Location'
, AVG(e.VacationHours) AS 'Average Vacation Hours'
, COUNT(1) AS 'NumEmployees'
, MAX(e.VacationHours)-MIN(e.VacationHours) AS 'Difference Vacation Hours'
FROM dbo.Employees AS e
GROUP BY e.StateProvinceName, e.City
HAVING COUNT(1) > 1
ORDER BY 'Difference Vacation Hours' DESC
;
4j. Limit the results to non-U.S. cities
SELECT CONCAT(e.StateProvinceName, ', ', e.City) AS 'Location'
, AVG(e.VacationHours) AS 'Average Vacation Hours'
, COUNT(1) AS 'NumEmployees'
, MAX(e.VacationHours)-MIN(e.VacationHours) AS 'Difference Vacation Hours'
FROM dbo.Employees AS e
WHERE e.CountryName != 'United States'
GROUP BY e.StateProvinceName, e.City
HAVING COUNT(1) > 1
ORDER BY 'Difference Vacation Hours' DESC
;