H-2B visas are nonimmigrant visas, which allow foreign nationals to enter the U.S. temporarily and engage in nonagricultural employment which is seasonal, intermittent, a peak load need, or a one-time occurrence.
Summary It turns out that Texas has the highest need for foreign unskilled employees. However, it is a Salmon farm in Alaska that has requested the most and only offering them a wage of 10$ an hour.
In [296]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
In [297]:
df = pd.read_excel("H-2B_Disclosure_Data_FY15_Q4.xlsx")
In [298]:
df.head()
Out[298]:
In [299]:
#df.info()
1. How many requests did the Office of Foreign Labor Certification (OFLC) receive in 2015?
In [300]:
df['CASE_NUMBER'].count()
Out[300]:
2. How many jobs did that regard in total? And how many full time positions?
In [301]:
df['NBR_WORKERS_REQUESTED'].sum()
Out[301]:
In [302]:
df.groupby('FULL_TIME_POSITION')['NBR_WORKERS_REQUESTED'].sum()
Out[302]:
3. How many jobs did the ETA National Processing Center actually certify?
In [303]:
df['NBR_WORKERS_CERTIFIED'].sum()
Out[303]:
In [304]:
df.groupby('FULL_TIME_POSITION')['NBR_WORKERS_CERTIFIED'].sum()
Out[304]:
**4. What was the average pay?
In [306]:
df.groupby('BASIC_UNIT_OF_PAY')['PREVAILING_WAGE'].mean()
Out[306]:
In [307]:
df.groupby('BASIC_UNIT_OF_PAY')['BASIC_UNIT_OF_PAY'].count()
Out[307]:
The majority of the jobs are payed hourly at an average rate of 12.65 $ an hour.
5. Who earned the least? And where are these people actually doing?
In [308]:
worst_wage = df[df['BASIC_UNIT_OF_PAY'] == 'Hour'].sort_values(by='PREVAILING_WAGE', ascending=True).head()
This table displays the lowest payed jobs for which no workers were certified.
In [309]:
worst_wage[['BASIC_UNIT_OF_PAY', 'PREVAILING_WAGE', 'EMPLOYER_NAME', 'JOB_TITLE', 'WORKSITE_CITY', 'NBR_WORKERS_REQUESTED', 'NBR_WORKERS_CERTIFIED']]
Out[309]:
In [310]:
lowest_wages_accepted = df[df['NBR_WORKERS_CERTIFIED'] != 0].sort_values(by='PREVAILING_WAGE', ascending=True).head()
And this table shows that landscape laborers are the ones that are earning the least.
In [311]:
lowest_wages_accepted[['BASIC_UNIT_OF_PAY', 'PREVAILING_WAGE', 'EMPLOYER_NAME', 'JOB_TITLE', 'WORKSITE_CITY', 'NBR_WORKERS_REQUESTED', 'NBR_WORKERS_CERTIFIED']]
Out[311]:
6. What was the most common unit of pay (daily, weekly, monthly)?
In [312]:
df.groupby('BASIC_UNIT_OF_PAY')['BASIC_UNIT_OF_PAY'].count()
Out[312]:
7. Work our total pay amount payed to H-2B laborers?
In [313]:
#df.groupby('BASIC_UNIT_OF_PAY')['PREVAILING_WAGE'].describe()
#df.groupby('PREVAILING_WAGE').count()
Approx. ####Count * Mean (Year, Week, Month, Hour(8)(33 Million, Bi-Weekly (180'000)#### 40 million $.
8. Were there any foreign companies hiring foreign workers in the US? If yes, work out averages by nation.
In [314]:
df.groupby('EMPLOYER_COUNTRY')['EMPLOYER_COUNTRY'].count()
Out[314]:
9. Most common job title. Graph this.
In [315]:
#x = df.groupby('JOB_TITLE')['JOB_TITLE'].value_counts()
df['JOB_TITLE'].value_counts().head(10)
Out[315]:
In [316]:
plt.style.use('ggplot')
df['JOB_TITLE'].value_counts(ascending=True).tail(10).plot(kind='barh')
plt.savefig("Top_Jobs.svg")
In [317]:
##Is there an efficient way for Pandas to clean the data? Merge "Landscape Laborer" with "LANDSCAPE LABORER" etc.?
10. Which US states have the largest need for unskilled workers? Make a graph of this.
In [318]:
#x = df['EMPLOYER_STATE'].value_counts(ascending=False).head(10) * df['NBR_WORKERS_REQUESTED'].sum()
df['EMPLOYER_STATE'].value_counts(ascending=False).head(10).plot(kind='bar')
plt.savefig("semand_in_states.svg")
In [319]:
#x = df['EMPLOYER_STATE'].value_counts(ascending=False).head(10) * df['NBR_WORKERS_REQUESTED'].sum()
df['EMPLOYER_STATE'].value_counts(ascending=True).head(10).plot(kind='bar')
plt.savefig("demand_in_states.svg")
In [320]:
Workers_in_state_count = df.groupby('EMPLOYER_STATE')['NBR_WORKERS_REQUESTED'].sum()
Workers_in_state_count.sort_values(ascending=True).tail(10).plot(kind='barh', legend='NBR_WORKERS_REQUESTED')
plt.savefig("workers_requestet_in_states.svg")
11. Which industries had the largest need?
In [321]:
#changing df['NAIC_CODE'] from non_null object into int
In [322]:
#This .fillna(0.0) is magic. I found it here:
#http://stackoverflow.com/questions/21291259/convert-floats-to-ints-in-pandas
#df['NAIC_CODE'] = df['NAIC_CODE'].fillna(0.0).astype(int)
#But it turns out, it only works for my one fill. Not on the other. Why?
Importing the NAIC_Codes from here.
In [323]:
NAIC_CODEs = pd.read_excel("6-digit_2012_Code.xls")
In [324]:
NAIC_CODEs.info()
In [325]:
#Changing the NAIC_Codesfrom non-null object into float64
In [326]:
#NAIC_CODEs['NAICS12'] = df['NAIC_CODE'].fillna(0.0).astype(int)
In [327]:
NAIC_CODEs.head()
Out[327]:
In [328]:
#And now reimporting the original file.
In [329]:
df = pd.read_excel("H-2B_Disclosure_Data_FY15_Q4.xlsx")
In [330]:
#now in the NAIC_CODE is a Float64 in the cells we want to merge.
In [331]:
df_merged = df.merge(NAIC_CODEs, how = 'left', left_on = 'NAIC_CODE', right_on ='NAICS2012')
In [332]:
#df_merged.info()
In [333]:
df_merged['Industry'].value_counts().head(10)
Out[333]:
In [334]:
workers_by_industry = df_merged.groupby('Industry')['NBR_WORKERS_REQUESTED'].sum()
workers_by_industry.sort_values(ascending=True).tail(10).plot(kind='barh', legend='NBR_WORKERS_REQUESTED')
plt.savefig("workers_by_industry.svg")
12. Which companies had the largest need? Compare acceptance/denials of each company.
In [335]:
df['EMPLOYER_NAME'].value_counts().head(5)
Out[335]:
In [336]:
company_workers_demand = df.groupby('EMPLOYER_NAME')['NBR_WORKERS_REQUESTED'].sum()
company_workers_demand.sort_values(ascending=True).tail(10).plot(kind='barh')
plt.savefig("company_workers_demand.svg")
In [337]:
company_workers_demand = df.groupby('EMPLOYER_NAME')['NBR_WORKERS_CERTIFIED'].sum()
company_workers_demand.sort_values(ascending=True).tail(10).plot(kind='barh')
plt.savefig("company_workers_demand.svg")
BONUS Looking into Silver Bay Seafoods and UK International Soccer Campus.
Silver Bay's claim: [Silver Bays's "...primary strength is in its combination of having a state of the art processing plant and favorable logistics to support its operations; competent management and key personnel; an established fish buying system; and ownership by fishermen who represent over 80% of the committed fishing effort."] How much does the company pay it's workers on average?
In [338]:
SILVER_BAY_SEAFOODS = df[df['EMPLOYER_NAME'] == 'SILVER BAY SEAFOODS, LLC']
In [339]:
SILVER_BAY_SEAFOODS[['JOB_TITLE', 'PREVAILING_WAGE', 'HOURLY_WORK_SCHEDULE_AM', 'HOURLY_WORK_SCHEDULE_PM', 'OVERTIME_RATE_FROM', 'OVERTIME_RATE_TO', 'NATURE_OF_TEMPORARY_NEED', 'NBR_WORKERS_REQUESTED', 'NBR_WORKERS_CERTIFIED']]
Out[339]:
In [340]:
SOCCER_CAMPS = df[df['EMPLOYER_NAME'] == 'UK International Soccer Camps']
SOCCER_CAMPS[['JOB_TITLE', 'PREVAILING_WAGE', 'HOURLY_WORK_SCHEDULE_AM', 'HOURLY_WORK_SCHEDULE_PM', 'OVERTIME_RATE_FROM', 'OVERTIME_RATE_TO', 'NATURE_OF_TEMPORARY_NEED', 'NBR_WORKERS_REQUESTED', 'NBR_WORKERS_CERTIFIED']]
Out[340]:
In [ ]:
In [ ]:
In [ ]: