The Python code below finds the employers that, for WHISARD investigations concluded between 2010 and 2014, have been found to owe the greatest back wages to H-2 guest workers.
Load all violations, and limit them to those that meet all of the following critera: (a) Classified as having an ACT_ID
of H2A
or H2B
; (b) has an E
(employee) record flag, as opposed to an R
(employer) record flag; and (c) DATE_CONCLUDED
is between 2010 and 2014.
Identify violations that pertain to U.S. workers, rather than guest workers, and exclude them from the analysis.
Group the back wages by employers, based on the first available of the following: federal EIN, legal name, trade name.
Find all employers with at least \$100,000 in back wages for violations that fit the criteria above.
In [1]:
import pandas as pd
import sys
sys.path.append("../utils")
import loaders
Note: loaders
is a custom module to handle most common data-loading operations in these analyses. It is available here.
In [2]:
# Restrict cases to those concluded 2010–2015
_cases = loaders.load_cases()
cases = _cases[
(_cases["DATE_CONCLUDED_YEAR"] >= 2010) &
(_cases["DATE_CONCLUDED_YEAR"] <= 2014)
]
In [3]:
_employers = loaders.load_employers()
employer_basics = _employers.set_index("CASE_ID")[[
"employer_id",
"ER_LEGAL_NAME"
]].copy()
In [4]:
violations = loaders.load_violations()
In [5]:
# Get H-2A and H-2B violations from those cases, excluding the violation types listed above
h2_employee_violations = violations[
violations["ACT_ID"].isin([ "H2A", "H2B" ]) &
violations["CASE_ID"].isin(cases["CASE_ID"]) &
(violations["violation_found"] == True) &
(violations["ER_EE_VIOL"] == "E") # E = "Employee"
].set_index("CASE_ID")\
.join(employer_basics)\
.reset_index()
In [6]:
# A list of back wage–associated violations that correspond to U.S. (rather than H-2) workers
non_guestworker_descs = [
"17 Preferential treatment given to H-2A workers",
"02 Unlawful rejection of US workers (2008 & 2010 Rules)",
"Requirement to Hire U.S. Workers - ER failed to properly hire or rehire U.S. workers",
"Layoff- ER improperly laid off similarly employed U.S. workers within 120 days of date of need, unless employee refused or was lawfully rejected",
"Job Opportunity - (U.S. workers) - ER failed to offer U.S. workers bona fide, full-time temp. position due to inequitable qualification requirements",
"Terms and Working Conditions for U.S. Workers - ER failed to offer terms and working conditions as required"
]
In [7]:
# Make sure we haven't mis-transcribed the violation descriptions above
assert(len(set(non_guestworker_descs) - set(h2_employee_violations["VIOLATION_DESC"])) == 0)
In [8]:
h2_guestworker_violations = h2_employee_violations[
~h2_employee_violations["VIOLATION_DESC"].isin(non_guestworker_descs)
]
In [9]:
def join_unique_strings(series):
strings = sorted(set(series.fillna("")))
return " | ".join(strings)
In [10]:
# Group the H-2 violations by employer ID
by_employer = h2_guestworker_violations.groupby("employer_id")
In [11]:
# Calculate employer aggregates
employer_aggregates = pd.DataFrame({
"h2_backwages": by_employer["AMT_BW_ASSESSED"].sum(),
"names": by_employer["ER_LEGAL_NAME"].apply(join_unique_strings)
})
In [12]:
# Get employer IDs associated with $100,000+ in H-2 back wages
top_total_h2_backwages = employer_aggregates[
employer_aggregates["h2_backwages"] >= 100000
].sort("h2_backwages", ascending=False)
In [13]:
top_total_h2_backwages
Out[13]:
In [14]:
# Save a copy as a CSV
top_total_h2_backwages.to_csv("../output/top-total-h2-backwages-2010-2014.csv")
In [15]:
peri_violations = h2_guestworker_violations[
h2_guestworker_violations["employer_id"] == "88-0175412"
]
pd.DataFrame({
"count": peri_violations.groupby([
"CASE_ID",
"ACT_ID",
"VIOLATION_DESC"
]).size()
})
Out[15]: