In [ ]:
import pandas as pd
import numpy as np
Now, import your solutions from hw4_answers.py
. The following code looks a bit redundant. However, we do this to allow reloading the hw4_answers.py
in case you made some changes. Normally, Python assumes that modules don't change and therefore does not try to import them again.
In [ ]:
import hw4_answers
reload(hw4_answers)
from hw4_answers import *
In [ ]:
employees_df = load_employees()
print "Number of rows: %d\nNumber of cols: %d\n" % (employees_df.shape[0], employees_df.shape[1])
print "Head of index: %s\n" % (employees_df.index[:10])
print "Record of employee with ID=999\n"
print employees_df.loc[999]
The output should be
Number of rows: 291 Number of cols: 25 Head of index: Int64Index([259, 278, 204, 78, 255, 66, 270, 22, 161, 124], dtype='int64', name=u'EmployeeID') Record of employee with ID=999 ManagerID 1 TerritoryID NaN Title NaN FirstName Chadwick MiddleName NaN LastName Smith Suffix NaN JobTitle BI Professor NationalIDNumber 123456789 BirthDate 1967-07-05 MaritalStatus M Gender M HireDate 2003-12-31 23:59:59.997000 SalariedFlag 0 VacationHours 55 SickLeaveHours 47 PhoneNumber 555-887-9788 PhoneNumberType Work EmailAddress chadwick.smith@rentpath.com AddressLine1 565 Peachtree Rd. AddressLine2 NaN City Atlanta StateProvinceName Georgia PostalCode 30084 CountryName United States Name: 999, dtype: object
Define a function getFullName
which takes the employees table and a single employee ID as arguments, and returns a string with the full name of the employee in the format "LAST, FIRST MIDDLE".
If the given ID does not belong to any employee return the string "UNKNOWN" (in all caps)
If no middle name is given only return "LAST, FIRST". Make sure there are not trailing spaces!
If only the middle initial is given the return the full name in the format "LAST, FIRST M." with the middle initial followed by a '.'.
Arguments:
Returns:
In [ ]:
for eid in [274, 999, 102]:
print '%d, "%s"' %(eid, getFullName(employees_df, eid))
The output should be
274, "Jiang, Stephen Y." 999, "Smith, Chadwick" 102, "Mu, Zheng W."
In [ ]:
for jt in ['Chief Data Scientist', 'Sales Manager', 'Vice President of Sales']:
if isSales(jt):
print "The job title '%s' is part of the Sales Department." % jt
else:
print "The job title '%s' belongs to a different department." % jt
The output should be
The job title 'Chief Data Scientist' belongs to a different department. The job title 'Sales Manager' is part of the Sales Department. The job title 'Vice President of Sales' is part of the Sales Department.
Define a function filterSales
with the employee tables as an argument, that returns a new table of the same schema (i.e. columns and index) containing only row of sales people. You should use the isSales
function from the previous problem.
Arguments:
Returns:
In [ ]:
sales_df = filterSales(employees_df)
print "Number of rows: %d\nNumber of cols: %d\n" % (sales_df.shape[0], sales_df.shape[1])
print "Head of index: %s\n" % (sales_df.index[:10])
print "Record of sales employee with ID=280\n"
print sales_df.loc[280]
The output should be
Number of rows: 18 Number of cols: 25 Head of index: Int64Index([278, 283, 274, 276, 286, 284, 287, 281, 280, 285], dtype='int64', name=u'EmployeeID') Record of sales employee with ID=280 ManagerID 274 TerritoryID 1 Title NaN FirstName Pamela MiddleName O LastName Ansman-Wolfe Suffix NaN JobTitle Sales Representative NationalIDNumber 61161660 BirthDate 1969-01-06 MaritalStatus S Gender F HireDate 2005-10-01 00:00:00 SalariedFlag 1 VacationHours 22 SickLeaveHours 31 PhoneNumber 340-555-0193 PhoneNumberType Cell EmailAddress pamela0@yahoo.com AddressLine1 636 Vine Hill Way AddressLine2 NaN City Portland StateProvinceName Oregon PostalCode 97205 CountryName United States Name: 280, dtype: object
Define a function getEmailList
with that returns a Series of strings of all email addresses of employees in this state or province. The email addresses should be separated by a given character, usually a comma ',' or semicolon ';'.
Arguments:
Returns:
In [ ]:
emails = getEmailListByState(sales_df, ", ")
for state in sorted(emails.index):
print "%15s: %s" % (state, emails[state])
The output should be
Alberta: garrett1@mapleleafmail.ca California: shu0@adventure-works.com England: jae0@aol.co.uk Gironde: ranjit0@adventure-works.com Hamburg: rachel0@adventure-works.com Massachusetts: tete0@adventure-works.com Michigan: michael9@adventure-works.com Minnesota: jillian0@adventure-works.com Ontario: josé1@safe-mail.net Oregon: pamela0@yahoo.com Tennessee: tsvi0@adventure-works.com Utah: linda3@adventure-works.com Victoria: lynn0@adventure-works.com Washington: david8@adventure-works.com, stephen0@adventure-works.com, amy0@yahoo.com, syed0@yahoo.com, brian3@aol.com
Define a function managementCounts
which produces a Series of how many employees report to a manager. The Series is indexed by the ManagerID
, the count should be performed on the EmployeeID
because this is the only field that is guaranteed to be unique. The resulting Series should be order by the number of employees in descending order.
Arguments:
Returns:
ManagerID
In [ ]:
print managementCounts(employees_df)
The output should be
ManagerID 26.0 178 25.0 30 250.0 17 274.0 10 263.0 9 249.0 9 16.0 8 1.0 8 227.0 6 235.0 5 287.0 3 273.0 3 234.0 3 285.0 1 Name: EmployeeID, dtype: int64