Looking at Payscale's Job Satisfaction Data



In [1]:

    
%pylab inline









    



Populating the interactive namespace from numpy and matplotlib



In [2]:

    
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import pandas as pd

Get the data

Let us use requests and BeautifulSoup to look at the data available on the Payscale website



In [3]:

    
import requests
from bs4 import BeautifulSoup
# soup = BeautifulSoup(html_doc)



In [4]:

    
r = requests.get('http://www.payscale.com/data-packages/most-and-least-meaningful-jobs/full-list')



In [5]:

    
r.status_code









    Out[5]:





200



In [6]:

    
soup = BeautifulSoup(r.text)



In [8]:

    
#print soup.prettify()
#soup.find_all('table')
soup.table

Using pandas.io

As it turns out, once inspected there is only one table. We can easily import it using pandas.io



In [3]:

    
url = 'http://www.payscale.com/data-packages/most-and-least-meaningful-jobs/full-list'



In [4]:

    
pd.io.html.read_html?



In [5]:

    
dflist = pd.io.html.read_html(url, match='.+', flavor=None, header=0, index_col=None, skiprows=None, infer_types=None, attrs=None, parse_dates=False, tupleize_cols=False, thousands=',')



In [6]:

    
df = dflist[0]



In [7]:

    
df.dtypes









    Out[7]:





Detailed Occupation        object
Median Pay                 object
% High Meaning             object
% High Satisfaction        object
% High Stress              object
Typical Education Level    object
% Female                   object
% Male                     object
Job Level                  object
dtype: object



In [7]:

    
df.head()









    Out[7]:






  
    
      
      Detailed Occupation
      Median Pay
      % High Meaning
      % High Satisfaction
      % High Stress
      Typical Education Level
      % Female
      % Male
      Job Level
    
  
  
    
      0
                                               Clergy
        $45,400
       97%
       89%
       68%
              Masters (non-MBA)
       14%
       86%
       Mid-Career
    
    
      1
          Directors, Religious Activities & Education
        $35,900
       97%
       88%
       61%
                      Bachelors
       50%
       50%
       Mid-Career
    
    
      2
                                             Surgeons
       $299,600
       94%
       82%
       79%
       Doctors of Medicine (MD)
       18%
       82%
       Mid-Career
    
    
      3
       Education Administrators, Elementary/Secondary
        $75,900
       93%
       87%
       85%
              Masters (non-MBA)
       55%
       45%
           Senior
    
    
      4
                                        Chiropractors
        $58,700
       93%
       65%
       52%
              Doctorate (Ph.D.)
       27%
       73%
       Mid-Career

Data Munging



In [8]:

    
for c in df.columns:
    df[c] = df[c].str.replace('%','')



In [9]:

    
df[df.columns[1]] = df[df.columns[1]].str.replace('$','')



In [10]:

    
df[df.columns[1]] = df[df.columns[1]].str.replace(',','')



In [11]:

    
df.head()









    Out[11]:






  
    
      
      Detailed Occupation
      Median Pay
      % High Meaning
      % High Satisfaction
      % High Stress
      Typical Education Level
      % Female
      % Male
      Job Level
    
  
  
    
      0
                                               Clergy
        45400
       97
       89
       68
              Masters (non-MBA)
       14
       86
       Mid-Career
    
    
      1
          Directors, Religious Activities & Education
        35900
       97
       88
       61
                      Bachelors
       50
       50
       Mid-Career
    
    
      2
                                             Surgeons
       299600
       94
       82
       79
       Doctors of Medicine (MD)
       18
       82
       Mid-Career
    
    
      3
       Education Administrators, Elementary/Secondary
        75900
       93
       87
       85
              Masters (non-MBA)
       55
       45
           Senior
    
    
      4
                                        Chiropractors
        58700
       93
       65
       52
              Doctorate (Ph.D.)
       27
       73
       Mid-Career



In [12]:

    
df = df.convert_objects(convert_numeric=True)



In [13]:

    
df.dtypes









    Out[13]:





Detailed Occupation         object
Median Pay                   int64
% High Meaning               int64
% High Satisfaction        float64
% High Stress                int64
Typical Education Level     object
% Female                     int64
% Male                       int64
Job Level                   object
dtype: object

Exploring the data

EDA



In [14]:

    
pd.options.display.mpl_style = 'default'



In [15]:

    
figsize(10, 10)



In [20]:

    
plt.scatter(x=df['% High Stress'],y=df['% High Satisfaction'], s=df['Median Pay']*.001, alpha=0.5)
plt.gca().set_xlabel('% High Stress')
plt.gca().set_ylabel('% High Satisfaction')
plt.gca().set_title('High Satisfaction vs High Stress Jobs', fontsize=16)
plt.show()

	Detailed Occupation	Median Pay	% High Meaning	% High Satisfaction	% High Stress	Typical Education Level	% Female	% Male	Job Level
0	Clergy	$45,400	97%	89%	68%	Masters (non-MBA)	14%	86%	Mid-Career
1	Directors, Religious Activities & Education	$35,900	97%	88%	61%	Bachelors	50%	50%	Mid-Career
2	Surgeons	$299,600	94%	82%	79%	Doctors of Medicine (MD)	18%	82%	Mid-Career
3	Education Administrators, Elementary/Secondary	$75,900	93%	87%	85%	Masters (non-MBA)	55%	45%	Senior
4	Chiropractors	$58,700	93%	65%	52%	Doctorate (Ph.D.)	27%	73%	Mid-Career

	Detailed Occupation	Median Pay	% High Meaning	% High Satisfaction	% High Stress	Typical Education Level	% Female	% Male	Job Level
0	Clergy	45400	97	89	68	Masters (non-MBA)	14	86	Mid-Career
1	Directors, Religious Activities & Education	35900	97	88	61	Bachelors	50	50	Mid-Career
2	Surgeons	299600	94	82	79	Doctors of Medicine (MD)	18	82	Mid-Career
3	Education Administrators, Elementary/Secondary	75900	93	87	85	Masters (non-MBA)	55	45	Senior
4	Chiropractors	58700	93	65	52	Doctorate (Ph.D.)	27	73	Mid-Career