White House Employee Data

The *2015_white_house.csv** file contains data on White House employees in 2015, and their salares. Here are the columns:

  • Name -- the name of the employee.
  • Status -- whether the employee was a White hHuse employee, or detailed to the White House.
  • Salary -- the employee salary, in USD.
  • Pay Basis -- the time period the salary is expressed over.
  • Position Title -- the title of the employee.

In [1]:
import pandas as pd
white_house = pd.read_csv("../data/2015_white_house.csv")
print(white_house.shape)


(474, 5)

In [2]:
print(white_house.iloc[-1])


Name                                             Zients, Jeffrey D.
Status                                                     Employee
Salary                                                       173922
Pay Basis                                                 Per Annum
Position Title    ASSISTANT TO THE PRESIDENT FOR ECONOMIC POLICY...
Name: 473, dtype: object

In [3]:
white_house


Out[3]:
Name Status Salary Pay Basis Position Title
0 Abdullah, Hasan A. Detailee 105960 Per Annum POLICY ADVISOR
1 Abraham, Sabey M. Employee 55000 Per Annum ENERGY AND ENVIRONMENT DIRECTOR FOR PRESIDENTI...
2 Abraham, Yohannes A. Employee 121200 Per Annum SPECIAL ASSISTANT TO THE PRESIDENT AND CHIEF O...
3 Abramson, Jerry E. Employee 155035 Per Annum DEPUTY ASSISTANT TO THE PRESIDENT AND DIRECTOR...
4 Adler, Caroline E. Employee 114000 Per Annum SPECIAL ASSISTANT TO THE PRESIDENT AND DIRECTO...
5 Aiyer, Vikrum D. Detailee 134662 Per Annum SENIOR POLICY ADVISOR
6 Alcantara, Elias Employee 65650 Per Annum ASSOCIATE DIRECTOR OF INTERGOVERNMENTAL AFFAIRS
7 Ali, Mohammed I. Employee 42000 Per Annum STAFF ASSISTANT
8 Allen, Angelica P. Employee 50000 Per Annum SPECIAL ASSISTANT TO THE DIRECTOR OF THE OFFIC...
9 Allen, Elizabeth M. Employee 103000 Per Annum SPECIAL ASSISTANT TO THE PRESIDENT FOR MESSAGE...
10 Allen, Jessica L. Employee 42844 Per Annum PRESS ASSISTANT
11 Allison, Ashley R. Employee 97000 Per Annum DEPUTY DIRECTOR OF PUBLIC ENGAGEMENT
12 Amendolare, Vincent C. Employee 42420 Per Annum ANALYST
13 Amuluru, Uma M. Detailee 116804 Per Annum ASSOCIATE COUNSEL
14 Anderson, Amanda D. Employee 126250 Per Annum SPECIAL ASSISTANT TO THE PRESIDENT AND HOUSE L...
15 Anderson, Charles D. Employee 101000 Per Annum SENIOR ADVISOR FOR THE NATIONAL ECONOMIC COUNCIL
16 Aniskoff, Paulette L. Employee 155035 Per Annum DEPUTY ASSISTANT TO THE PRESIDENT AND DIRECTOR...
17 Ashton, Nathaniel R. Employee 42000 Per Annum STAFF ASSISTANT
18 Austin, Jr., Roy L. Employee 160085 Per Annum DEPUTY ASSISTANT TO THE PRESIDENT FOR THE OFFI...
19 Axios, Ashleigh T. Employee 73225 Per Annum DIGITAL CREATIVE DIRECTOR
20 Babajide, Ayotunde T. Detailee 114480 Per Annum SENIOR POLICY ADVISOR
21 Bae, Yena Employee 44000 Per Annum SENIOR ANALYST AND PROJECT MANAGER
22 Baker, Sarah E. Employee 131805 Per Annum SPECIAL ASSISTANT TO THE PRESIDENT AND ASSOCIA...
23 Bansal, Gaurab Employee 120000 Per Annum DEPUTY ASSISTANT TO THE PRESIDENT AND DEPUTY C...
24 Barnes, Desiree N. Employee 42844 Per Annum PRESS ASSISTANT
25 Bartoloni, Kristen A. Employee 70000 Per Annum DEPUTY DIRECTOR OF RESEARCH
26 Beckford, Kevin F. Employee 42000 Per Annum ANALYST
27 Beliveau, Emmett S. Employee 160085 Per Annum ASSISTANT TO THE PRESIDENT AND DIRECTOR OF THE...
28 Benenati, Frank J. Employee 84840 Per Annum ASSISTANT PRESS SECRETARY
29 Bennett, Tabitha R. Employee 50000 Per Annum PRESS LEAD
... ... ... ... ... ...
444 Vignarajah, Krishanti Employee 90000 Per Annum DIRECTOR OF INTERNATIONAL AFFAIRS AND SENIOR P...
445 Vilfer, Ryan E. Employee 42420 Per Annum STAFF ASSISTANT
446 Vorhaus, David A. Employee 111100 Per Annum SPECIAL ASSISTANT TO THE PRESIDENT AND ADVISOR...
447 Vrazilek, Lauren S. Employee 55000 Per Annum DEPUTY PRESS SECRETARY FOR THE FIRST LADY
448 Wagstaff, Jesica E. Employee 45905 Per Annum LEGISLATIVE ASSISTANT AND ASSISTANT TO THE SEN...
449 Waheed, Manar Employee 80800 Per Annum DEPUTY POLICY DIRECTOR FOR IMMIGRATION
450 Wainscott, Kip F. Detailee 134662 Per Annum SENIOR DIRECTOR OF CABINET AFFAIRS
451 Waldo, Katherine A. Employee 47470 Per Annum TRIP COORDINATOR
452 Wall, Alexander B. Employee 63125 Per Annum DEPUTY DIRECTOR OF ONLINE ENGAGEMENT
453 Walsh, James D. Detailee 171871 Per Annum ASSOCIATE COUNSEL
454 Walsh, Joan L. Employee 160085 Per Annum DEPUTY ASSISTANT TO THE PRESIDENT AND STAFF SE...
455 Wang, Ya W. Employee 85000 Per Annum SPECIAL ASSISTANT AND ADVISOR TO THE CHIEF OF ...
456 Whisenant, Addie M. Employee 78780 Per Annum DIRECTOR OF AFRICAN-AMERICAN MEDIA
457 Wild, Clayton S. Employee 44000 Per Annum PRESIDENTIAL WRITER
458 Wilkinson, Jr., David E. Employee 100000 Per Annum DIRECTOR, OFFICE OF SOCIAL INNOVATION AND CIVI...
459 Williams, Sherman A. Employee 80716 Per Annum ASSISTANT TO THE EXECUTIVE CLERK
460 Winter, Melissa E. Employee 133320 Per Annum DEPUTY ASSISTANT TO THE PRESIDENT AND SENIOR A...
461 Wong, Jacqueline Employee 58075 Per Annum SENIOR POLICY ANALYST
462 Wright, Frank B. Employee 65000 Per Annum DEPUTY DIRECTOR FOR FINANCE
463 Wu, Alexander P. Employee 52520 Per Annum ASSOCIATE DIRECTOR FOR THE MANAGEMENT AND ADMI...
464 Xharda, Klevis Employee 42420 Per Annum EXECUTIVE ASSISTANT
465 Yaros, Stephen G. Employee 42000 Per Annum STAFF ASSISTANT
466 Young, Caitlin E. Employee 63630 Per Annum STENOGRAPHER
467 Young, Jr., Reginald D. Employee 69597 Per Annum SENIOR MANAGEMENT ANALYST
468 Young, Kimberly E. Employee 45000 Per Annum LEGISLATIVE ASSISTANT
469 Young, Stephanie L. Employee 87870 Per Annum ASSOCIATE COMMUNICATIONS DIRECTOR
470 Young, Valerie A. Employee 52667 Per Annum RECORDS MANAGEMENT ANALYST
471 Yudelson, Alex R. Employee 42000 Per Annum STAFF ASSISTANT
472 Zaid, Zaid A. Detailee 158700 Per Annum ASSOCIATE COUNSEL
473 Zients, Jeffrey D. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT FOR ECONOMIC POLICY...

474 rows × 5 columns


In [13]:
%matplotlib notebook
import matplotlib.pyplot as plt

plt.hist(white_house["Salary"])
plt.show()


So far we have imported a dataset from a CSV file into a Pandas DataFrame using the read_csv() function. Then we displayed the data, first as a table, and secondly as a historgram.

Questions About the Data

There are a near infinite number of questions we could possibly ask about this data. But to get started, here are a few example questions that could be asked:

  • How does length of employee titles correlate to salary?
  • How much does the White House pay in total salary?
  • Who are the highest and lowest paid staffers?
  • What words are the most common in titles?

How does the length of employee titles correlate to salary?

Steps for figuring this out may look like the following:

  1. Calculate the length of each employee title - should be able to use apply() to get this
  2. Add a column to the DataFrame containing the length of the employee title
  3. Plot length of employee title versus employee salary (could also use direct correlation, but visual plot is good)

In [5]:
# Calculate the length of each employee's title and add to the DataFrame
white_house['LengthOfTitle'] = white_house['Position Title'].apply(len)
white_house.head()


Out[5]:
Name Status Salary Pay Basis Position Title LengthOfTitle
0 Abdullah, Hasan A. Detailee 105960 Per Annum POLICY ADVISOR 14
1 Abraham, Sabey M. Employee 55000 Per Annum ENERGY AND ENVIRONMENT DIRECTOR FOR PRESIDENTI... 58
2 Abraham, Yohannes A. Employee 121200 Per Annum SPECIAL ASSISTANT TO THE PRESIDENT AND CHIEF O... 119
3 Abramson, Jerry E. Employee 155035 Per Annum DEPUTY ASSISTANT TO THE PRESIDENT AND DIRECTOR... 75
4 Adler, Caroline E. Employee 114000 Per Annum SPECIAL ASSISTANT TO THE PRESIDENT AND DIRECTO... 84

In [6]:
# Plot the length of employee title versus salary to look for correlation
plt.plot(white_house['LengthOfTitle'], white_house['Salary'])
plt.title('How does length of employee titles correlate to salary?')
plt.xlabel('Length of Employee Title')
plt.ylabel('Salary ($)')


Out[6]:
<matplotlib.text.Text at 0x115438f60>

Uh ok, maybe I was wrong about visuallizing being great for detecting correlation ;-)

It looks like there may be a weak positive correlation. But it is really hard to tell.

Maybe we should just numerically calculate the correlation.

Also, it looks like there are some low salary outliers. Should we check to make sure we aren't mixing in monthly salaries with yearly ones?


In [7]:
# Get the values in Pay Basis and figure out how many unique ones there are
types_of_pay_basis = set(white_house['Pay Basis'])
types_of_pay_basis


Out[7]:
{'Per Annum'}

Ok, only one pay basis, annually. So that wasn't an issue.


In [8]:
# Compute pairwise correlation of columns, excluding NA/null values
correlations = white_house.corr()
correlations


Out[8]:
Salary LengthOfTitle
Salary 1.000000 0.532837
LengthOfTitle 0.532837 1.000000

In [9]:
# Linear Regression using ordinary least squares
import statsmodels.api as sm
model = sm.OLS(white_house['Salary'], white_house['LengthOfTitle'])
residuals = model.fit()
print(residuals.summary())


                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Salary   R-squared:                       0.795
Model:                            OLS   Adj. R-squared:                  0.794
Method:                 Least Squares   F-statistic:                     1829.
Date:                Thu, 08 Sep 2016   Prob (F-statistic):          1.22e-164
Time:                        22:53:10   Log-Likelihood:                -5725.2
No. Observations:                 474   AIC:                         1.145e+04
Df Residuals:                     473   BIC:                         1.146e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
=================================================================================
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
LengthOfTitle  1737.3820     40.626     42.765      0.000    1657.552    1817.212
==============================================================================
Omnibus:                       16.051   Durbin-Watson:                   1.693
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               34.738
Skew:                          -0.081   Prob(JB):                     2.86e-08
Kurtosis:                       4.316   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

So yea, there is a real positive correlation between length of employee title and salary!

How much does the White House pay in total salary?


In [10]:
total_salary = sum(white_house['Salary'])
total_salary


Out[10]:
40225595

The white house pays about $40 Million per year in total salary.

Who are the highest and lowest paid staffers?


In [11]:
highest_paid = white_house[white_house['Salary'] == max(white_house['Salary'])]
highest_paid


Out[11]:
Name Status Salary Pay Basis Position Title LengthOfTitle
52 Breckenridge, Anita J. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT AND DEPUTY CHIEF OF... 67
70 Canegallo, Kristie A. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT AND DEPUTY CHIEF OF... 71
90 Cushman, Chase M. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT AND DIRECTOR OF SCH... 65
117 Earnest, Joshua R. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT AND PRESS SECRETARY 46
121 Eggleston, Warren N. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT AND COUNSEL TO THE ... 55
130 Fallon, Katherine B. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT AND DIRECTOR OF THE... 76
200 Jarrett, Valerie B. Employee 173922 Per Annum SENIOR ADVISOR AND ASSISTANT TO THE PRESIDENT ... 97
201 Johnson, Broderick D. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT AND CABINET SECRETARY 48
210 Kale, Katy A. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT FOR MANAGEMENT AND ... 60
216 Keenan, Cody S. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT AND DIRECTOR OF SPE... 56
280 McDonough, Denis R. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT AND CHIEF OF STAFF 45
297 Monaco, Lisa O. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT FOR HOMELAND SECURI... 106
304 Muñoz, Cecilia Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT AND DIRECTOR OF THE... 70
358 Rhodes, Benjamin J. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT AND DEPUTY NATIONAL... 110
359 Rice, Susan E. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT AND NATIONAL SECURI... 56
403 Simas, David M. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT AND DIRECTOR OF THE... 88
426 Tchen, Christina M. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT AND CHIEF OF STAFF ... 63
473 Zients, Jeffrey D. Employee 173922 Per Annum ASSISTANT TO THE PRESIDENT FOR ECONOMIC POLICY... 92

In [12]:
lowest_paid = white_house[white_house['Salary'] == min(white_house['Salary'])]
lowest_paid


Out[12]:
Name Status Salary Pay Basis Position Title LengthOfTitle
243 Leary, Kimberlyn R. Employee 0 Per Annum ADVISOR TO THE COUNCIL ON WOMEN AND GIRLS 41
326 Park, Todd Y. Employee 0 Per Annum ADVISOR FOR TECHNOLOGY 22

Wow, who are these poor unpaid schmucks?

What words are the most common in titles?

This is another multi-step one that is a bit more involved. One approach to solving it might go like the following:

  1. Create an empty dictionary or Series
  2. Parse all words in each title, splititng at whitespace, possibly adding them to one bit list of words
  3. Increment count for a word each time you see it

In [ ]: