SF Salaries Exercise


SF Salaries Exercise

Welcome to a quick exercise for you to practice your pandas skills! We will be using the SF Salaries Dataset from Kaggle! Just follow along and complete the tasks outlined in bold below. The tasks will get harder and harder as you go along.

Import pandas as pd.


In [1]:
import pandas as pd

Read Salaries.csv as a dataframe called sal.


In [2]:
sal = pd.read_csv('salaries.csv')

Check the head of the DataFrame.


In [4]:
sal.head()


Out[4]:
Id EmployeeName JobTitle BasePay OvertimePay OtherPay Benefits TotalPay TotalPayBenefits Year Notes Agency Status
0 1 NATHANIEL FORD GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 167411.18 0.00 400184.25 NaN 567595.43 567595.43 2011 NaN San Francisco NaN
1 2 GARY JIMENEZ CAPTAIN III (POLICE DEPARTMENT) 155966.02 245131.88 137811.38 NaN 538909.28 538909.28 2011 NaN San Francisco NaN
2 3 ALBERT PARDINI CAPTAIN III (POLICE DEPARTMENT) 212739.13 106088.18 16452.60 NaN 335279.91 335279.91 2011 NaN San Francisco NaN
3 4 CHRISTOPHER CHONG WIRE ROPE CABLE MAINTENANCE MECHANIC 77916.00 56120.71 198306.90 NaN 332343.61 332343.61 2011 NaN San Francisco NaN
4 5 PATRICK GARDNER DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 134401.60 9737.00 182234.59 NaN 326373.19 326373.19 2011 NaN San Francisco NaN

In [14]:
sal.describe()


Out[14]:
Id BasePay OvertimePay OtherPay Benefits TotalPay TotalPayBenefits Year Notes Status
count 148654.000000 148045.000000 148650.000000 148650.000000 112491.000000 148654.000000 148654.000000 148654.000000 0.0 0.0
mean 74327.500000 66325.448840 5066.059886 3648.767297 25007.893151 74768.321972 93692.554811 2012.522643 NaN NaN
std 42912.857795 42764.635495 11454.380559 8056.601866 15402.215858 50517.005274 62793.533483 1.117538 NaN NaN
min 1.000000 -166.010000 -0.010000 -7058.590000 -33.890000 -618.130000 -618.130000 2011.000000 NaN NaN
25% 37164.250000 33588.200000 0.000000 0.000000 11535.395000 36168.995000 44065.650000 2012.000000 NaN NaN
50% 74327.500000 65007.450000 0.000000 811.270000 28628.620000 71426.610000 92404.090000 2013.000000 NaN NaN
75% 111490.750000 94691.050000 4658.175000 4236.065000 35566.855000 105839.135000 132876.450000 2014.000000 NaN NaN
max 148654.000000 319275.010000 245131.880000 400184.250000 96570.660000 567595.430000 567595.430000 2014.000000 NaN NaN

Use the .info() method to find out how many entries there are.


In [5]:
sal.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 148654 entries, 0 to 148653
Data columns (total 13 columns):
Id                  148654 non-null int64
EmployeeName        148654 non-null object
JobTitle            148654 non-null object
BasePay             148045 non-null float64
OvertimePay         148650 non-null float64
OtherPay            148650 non-null float64
Benefits            112491 non-null float64
TotalPay            148654 non-null float64
TotalPayBenefits    148654 non-null float64
Year                148654 non-null int64
Notes               0 non-null float64
Agency              148654 non-null object
Status              0 non-null float64
dtypes: float64(8), int64(2), object(3)
memory usage: 14.7+ MB

What is the average BasePay ?


In [7]:
sal.BasePay.mean()


Out[7]:
66325.448840487705

What is the highest amount of OvertimePay in the dataset ?


In [8]:
sal.OvertimePay.max()


Out[8]:
245131.88

What is the job title of JOSEPH DRISCOLL ? Note: Use all caps, otherwise you may get an answer that doesn't match up (there is also a lowercase Joseph Driscoll).


In [9]:
sal[sal['EmployeeName']=='JOSEPH DRISCOLL'].JobTitle


Out[9]:
24    CAPTAIN, FIRE SUPPRESSION
Name: JobTitle, dtype: object

How much does JOSEPH DRISCOLL make (including benefits)?


In [10]:
sal[sal['EmployeeName']=='JOSEPH DRISCOLL'].TotalPayBenefits


Out[10]:
24    270324.91
Name: TotalPayBenefits, dtype: float64

What is the name of highest paid person (including benefits)?


In [11]:
sal[sal['TotalPayBenefits'] == sal.TotalPayBenefits.max()]


Out[11]:
Id EmployeeName JobTitle BasePay OvertimePay OtherPay Benefits TotalPay TotalPayBenefits Year Notes Agency Status
0 1 NATHANIEL FORD GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 167411.18 0.0 400184.25 NaN 567595.43 567595.43 2011 NaN San Francisco NaN

What is the name of lowest paid person (including benefits)? Do you notice something strange about how much he or she is paid?


In [12]:
sal[sal['TotalPayBenefits'] == sal.TotalPayBenefits.min()]


Out[12]:
Id EmployeeName JobTitle BasePay OvertimePay OtherPay Benefits TotalPay TotalPayBenefits Year Notes Agency Status
148653 148654 Joe Lopez Counselor, Log Cabin Ranch 0.0 0.0 -618.13 0.0 -618.13 -618.13 2014 NaN San Francisco NaN

What was the average (mean) BasePay of all employees per year? (2011-2014) ?


In [15]:
sal.groupby('Year').BasePay.mean()


Out[15]:
Year
2011    63595.956517
2012    65436.406857
2013    69630.030216
2014    66564.421924
Name: BasePay, dtype: float64

How many unique job titles are there?


In [16]:
sal['JobTitle'].nunique()


Out[16]:
2159

What are the top 5 most common jobs?


In [19]:
sal['JobTitle'].value_counts()[:5]


Out[19]:
Transit Operator                7036
Special Nurse                   4389
Registered Nurse                3736
Public Svc Aide-Public Works    2518
Police Officer 3                2421
Name: JobTitle, dtype: int64

How many Job Titles were represented by only one person in 2013? (e.g. Job Titles with only one occurence in 2013?)


In [24]:
one_person_jos = sum(sal[sal['Year']==2013]['JobTitle'].value_counts()==1)
one_person_jos


Out[24]:
202

How many people have the word Chief in their job title? (This is pretty tricky)


In [28]:
def chief_in_title(title):
    if 'chief' in title.lower():
        return True
    else:
        return False

In [29]:
sum(sal['JobTitle'].apply(chief_in_title))


Out[29]:
627

In [30]:
sum(sal['JobTitle'].apply(lambda x : chief_in_title(x)))


Out[30]:
627

Bonus: Is there a correlation between length of the Job Title string and Salary?


In [31]:
sal['title_len'] = sal['JobTitle'].apply(len)

In [32]:
sal[['title_len', 'TotalPayBenefits']].corr()


Out[32]:
title_len TotalPayBenefits
title_len 1.000000 -0.036878
TotalPayBenefits -0.036878 1.000000

Great Job!