The Python code below calculates the number of WHD cases concluded each fiscal year that examined some aspect of H-2 regulations, and the number of distinct employer IDs associated with those cases. In addition, it provides a rough estimate of the number of employers certified for H-2 per fiscal year, to support the statement that "vast majority" of H-2 certified employers are not inspected.
Note: The visa certification data published by the OFLC do not include unique identifiers for employers. Additionally, the data contain multiple alternate spellings and mis-spellings of employer names, making it difficult to determine the number of distinct employers. For these reasons, this analysis intentionally does not provide an overall inspection rate. It should be clear from the numbers below, however, that number of H-2 employers that WHD inspects each year amounts to a small fraction of the number of employers that DOL certifies for H-2 visas.
CASE_ACT_SUMMARY
rows for each case, and select those for the "H2A" and "H2B" ACT_ID
s. This includes any case with H-2 findings, regardless of whether WHD identified any violations.CASE_ID
(since some employers have summaries for both H-2A and H-2B findings).
In [1]:
import pandas as pd
import sys
import re
sys.path.append("../utils")
import loaders
Note: loaders
is a custom module to handle most common data-loading operations in these analyses. It is available here.
In [2]:
employers = loaders.load_employers().set_index("CASE_ID")
In [3]:
cases = loaders.load_cases().set_index("CASE_ID")
In [4]:
cases_basics = cases[[ "DATE_CONCLUDED_FY", "INVEST_TOOL_DESC" ]]\
.join(employers[ "employer_id" ])\
.reset_index()
In [5]:
act_summaries = loaders.load_act_summaries()
In [6]:
h2_summaries = act_summaries[
act_summaries["ACT_ID"].isin([ "H2A", "H2B" ])
]
In [7]:
matching_cases = cases_basics[
cases_basics["CASE_ID"].isin(h2_summaries["CASE_ID"]) &
(cases_basics["DATE_CONCLUDED_FY"] >= 2010) &
(cases_basics["DATE_CONCLUDED_FY"] <= 2014)
]
In [8]:
invest_tool_counts = matching_cases["INVEST_TOOL_DESC"].value_counts()
In [9]:
case_counts = matching_cases.groupby([
"DATE_CONCLUDED_FY",
"INVEST_TOOL_DESC"
])["CASE_ID"].nunique()\
.unstack()\
.fillna(0)\
[invest_tool_counts.index.tolist()]
case_counts["[total]"] = case_counts.sum(axis=1)
In [10]:
employer_counts = pd.DataFrame({
"n_employer_ids": matching_cases.groupby("DATE_CONCLUDED_FY")["employer_id"].nunique()
})
In [11]:
case_counts
Out[11]:
Note: The counts and below include all cases with at least some H-2 aspect indicated, regardless of whether H-2 was the primary focus or whether investigators found any H-2 violations.
In [12]:
employer_counts
Out[12]:
In [13]:
date_parser = lambda x: pd.to_datetime(x, format="%Y-%m-%d", coerce=True)
oflc_decisions = pd.read_csv("../data/oflc-decisions/processed/oflc-decisions.csv",
parse_dates=["last_event_date"],
date_parser=date_parser)
In [14]:
oflc_decisions["last_event_date_fy"] = oflc_decisions["last_event_date"].apply(loaders.date_to_fy)
In [15]:
certifications = oflc_decisions[
(oflc_decisions["is_certified"] == True) &
(oflc_decisions["is_expired"] == False) &
(oflc_decisions["is_duplicate"] == False) &
(oflc_decisions["last_event_date_fy"] >= 2010) &
(oflc_decisions["last_event_date_fy"] <= 2014)
].copy()
In [16]:
non_alphanum_pat = re.compile(r"[^A-Z0-9 ]+")
In [17]:
def standardize_name(x):
return re.sub(non_alphanum_pat, "", x.upper().strip())
In [18]:
certifications["employer_name_standard"] = certifications["employer_name"]\
.fillna("")\
.apply(standardize_name)
In [19]:
certifications[["employer_name", "employer_name_standard"]].tail()
Out[19]:
In [20]:
certs_by_fy = certifications.groupby("last_event_date_fy")
employer_cert_counts = pd.DataFrame({
"n_employer_names": certs_by_fy["employer_name_standard"].nunique()
})
In [21]:
employer_cert_counts
Out[21]: