In [229]:
import pandas as pd
import numpy as np
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))


This munging uses data scraped 3/25/2017

The Department of Human Services provided us with seven spreadsheets that each have information about substantiated complaints against assisted living, residential care, and nursing facilities in Oregon. The purpose of this notebook is to mung them, standardizing the column names, removing unnecessary columns, and cleaning some fields. The second purpose of this notebook is to get initial ownership date for facilities from the owner_history table and assign it to each facility.

Complaints mung

Start with the 10-year data that does not have narratives.

Import, clean, concat.


In [230]:
#Five years of detailed complaint data for all four kinds of facilities (Residential Care, Assisted Living, Nursing, and Adult Foster Home)
detailed = pd.read_excel('../../data/raw/Oregonian Abuse records 5 years May 2016.xlsx', header=3)
#Ten years of non-detailed complaints for Nursing Facilities
NF_complaints = pd.read_excel('../../data/raw/Copy of Oregonian Data Request Facility Abuse Records April 2016   Reviewed.xlsx',sheetname='NF Complaints')
#Ten years of non-detailed complaints for Assisted Living Facilities
ALF_complaints = pd.read_excel('../../data/raw/Copy of Oregonian Data Request Facility Abuse Records April 2016   Reviewed.xlsx',sheetname='ALF Complaints')
#Ten years of non-detailed complaints for Residential Care Facilities
RCF_complaints = pd.read_excel('../../data/raw/Copy of Oregonian Data Request Facility Abuse Records April 2016   Reviewed.xlsx',sheetname='RCF Complaints')

In [231]:
#NF has an inconsistently named column
NF_complaints.rename(columns={'Abuse_CbcAbuse': 'CbcAbuse'}, inplace=True)

In [232]:
ten_year_complaints = pd.concat([RCF_complaints,ALF_complaints,NF_complaints], ignore_index=True).reset_index().drop('index',1)

In [233]:
ten_year_complaints.rename(columns={'Abuse_Number':'abuse_number', 'Facility ID':'facility_id','Incident Date':'incident_date','Fac Type': 'facility_type',
'Investigation Results':'results_1','FacilityInvestResultsAbuse':'results_2','FacilityInvestResultsRule':'results_3','OutcomeCode':'outcome_code',
'CbcAbuse':'abuse_type'}, inplace=True)

In [234]:
ten_year_complaints = ten_year_complaints[['abuse_number','facility_id','incident_date','results_1',
                                           'results_2','results_3','outcome_code','abuse_type']][ten_year_complaints['abuse_number'].notnull()]

There are 52 complaints that have been mislabelled as unsubstantiated.


In [235]:
sub_comps = pd.read_excel('../../data/raw/52 mislabelled as unsubstantiated.xlsx', header=None, names=['abuse_number'])

In [236]:
miss_comps = sub_comps.merge(ten_year_complaints, how = 'left', left_on='abuse_number',right_on='abuse_number')#.count()

This dataset contains unsubstantiated complaints, which we don't need. There are three columns that indicate substantiation. A DHS person explained that if any one of them has the word 'substantiated,' then the complaint was substantiated.


In [237]:
ten_year_complaints = ten_year_complaints[(ten_year_complaints['results_1']=='Substantiated')|
                   (ten_year_complaints['results_2']=='Substantiated')|
                   (ten_year_complaints['results_3']=='Substantiated')]

In [238]:
ten_year_complaints = pd.concat([ten_year_complaints,miss_comps]).reset_index().drop('index',1)

In [239]:
ten_year_ready = ten_year_complaints[['abuse_number','facility_id','incident_date','outcome_code','abuse_type']].reset_index().drop('index',1)

Now we prepare the five-year, detailed data

The 'detailed' data is a five-year set of substantiated complaints against all facilities, including adult foster homes, which we don't want.


In [240]:
detailed.rename(columns={'Abuse_Number':'abuse_number','Facility ID':'facility_id',
                'Incident Date':'incident_date','Investigation Results':'results_1',
                'Facility Invest Results Abuse':'results_2','Facility Invest Results Rule':'results_3',
               'Outcome Code':'outcome_code','Action Notes':'action_notes','Outcome Notes':'outcome_notes',
               'Cbc Abuse Indicator':'abuse_type', 'Facility Type':'facility_type'}, inplace=True)

Drop Adult Foster Homes and select columns.


In [241]:
five_year_complaints = detailed[['abuse_number','facility_id','facility_type','incident_date','outcome_code',
                      'action_notes','outcome_notes','abuse_type']][detailed['facility_type']!='AFH']

No longer need the facility_type field.


In [242]:
five_year_ready = five_year_complaints.drop('facility_type',1)

There are thousands of complaints that appear in both datasets. If a complaint is a duplicate, we want to keep the one that is in the five-year set, because that one has richer data. To do this, we will add a 'source' column to each dataframe, value '1' for the five-year data and '2' for the ten-year data. We will then sort based on that column, then de-duplicate on the abuse_number field, telling pandas to keep the first instance of the duplicate that it finds.


In [243]:
five_year_ready['source']=1

In [244]:
ten_year_ready['source']=2

In [245]:
five_ten_concat = pd.concat([five_year_ready,ten_year_ready])

Set abuse_numbers to uppercase (three abuse numbers in ten-year data have lowercase)


In [246]:
five_ten_concat['abuse_number'] = five_ten_concat['abuse_number'].apply(lambda x:x.upper())

In [247]:
five_ten_concat = five_ten_concat.sort_values('source')

In [248]:
complaints = five_ten_concat.drop_duplicates(subset='abuse_number', keep='first').reset_index().drop('index',1)

Add a 'year' column based on incident date.


In [249]:
complaints['year']=complaints['incident_date'].dt.year.astype(int)

In [250]:
complaints.count()


Out[250]:
abuse_number     13705
abuse_type       12478
action_notes      6574
facility_id      13705
incident_date    13705
outcome_code     13704
outcome_notes     6544
source           13705
year             13705
dtype: int64

In [251]:
complaints['abuse_type'].fillna('',inplace=True)

Clean the abuse_type column


In [252]:
complaints['abuse_type'] = complaints['abuse_type'].apply(lambda x: x.upper())

In [253]:
complaints["abuse_type"] = complaints["abuse_type"].apply(dict([
    ('0', ''),  
    ('1', ''),  
    ('2', ''),  
    ('363', ''),  
    ('I', ''),
    ('A', 'A'),
    ('L', 'L'),
]).get).fillna('')

Join with scraped complaints

Complaints were scraped from https://apps.state.or.us/cf2/spd/facility_complaints/ using the script in ..scraper/DHS_scraper.py


In [254]:
scraped_comp = pd.read_csv('../../data/scraped/scraped_complaints_3_25.csv')

Set all abuse numbers to upper case.


In [255]:
scraped_comp['abuse_number'] = scraped_comp['abuse_number'].apply(lambda x: x.upper())

In [256]:
scraped_comp = scraped_comp.drop_duplicates(subset='abuse_number').drop(['fac_type','inv_comp_date','city_name'],1)

In [257]:
merged = complaints.merge(scraped_comp, how = 'left',on = 'abuse_number')

In [258]:
merged['outcome_code'] = merged['outcome_code'].fillna(0)

Add a column that tells us if the complaint has an equivalent online, based on the present of the online name.


In [259]:
merged['public'] = np.where(merged['fac_name'].notnull(),'online','offline')

Join to a lookup table for the code number


In [260]:
codes = pd.read_excel('../../data/raw/OLRO Outcome Codes.xlsx', header=3)
codes.rename(columns = {'Code':'outcome_code','Display Text':'outcome'}, inplace = True)
codes['outcome_code'] = codes['outcome_code'].astype(str)
codes = codes.drop('Definition',1)

In [261]:
merged['outcome_code'] = merged['outcome_code'].astype(int).astype(str)

In [262]:
merged = merged.merge(codes, how = 'left')

In [263]:
merged.groupby('abuse_type').count()


Out[263]:
abuse_number action_notes facility_id incident_date outcome_code outcome_notes source year fac_name online_incident_date public outcome
abuse_type
1231 1219 1231 1231 1231 1166 1231 1231 426 426 1231 1230
A 3836 2101 3836 3836 3836 2120 3836 3836 3759 3759 3836 3836
L 8638 3254 8638 8638 8638 3258 8638 8638 1232 1232 8638 8636

In [264]:
merged['fac_name'].fillna('',inplace=True)

Join with facilities

First, prep the facilities.


In [265]:
facilities = pd.read_csv('../../data/raw/APD_FacilityRecords.csv')

In [266]:
facilities.rename(columns={'FACID':'facid','Facility ID':'facility_id','FAC_CCMUNumber':'fac_ccmunumber','FAC_Type':'facility_type',
                          'FAC_Capacity':'fac_capacity','Facility Name':'facility_name','Facility Address':'street',
                          'Other Service':'other_service','Owner':'owner','Operator':'operator'}, inplace=True)

Select the columns we need and drop the one duplicate in here.


In [267]:
facilities = facilities[['facility_id','fac_ccmunumber','facility_type','fac_capacity','facility_name']].drop_duplicates(subset='facility_id', keep='last')

Churchill Estates Residential Care has blank facility_type and capacity fields. The facility is an RCF and has 108 capacity. Info obtained from DHS PIO.


In [268]:
facilities.loc[318,'facility_type']='RCF'
facilities.loc[318,'fac_capacity']=108

Left join facilities to complaints.

This eliminates complaints without facilities.


In [269]:
merged_comp_fac = facilities.merge(merged, on = 'facility_id',how = 'left')

The analysis is only of complaints in 2005 or later.


In [270]:
merged_comp_fac = merged_comp_fac[['abuse_number','facility_id','facility_type','facility_name','abuse_type','action_notes','incident_date','outcome','outcome_notes',
                                   'year','fac_name','public']][merged_comp_fac['year']>2004]

merged_comp_fac has all the complaints we need for the complaints analysis.

Aggregate data by facility


In [271]:
complaint_pivot = merged_comp_fac.pivot_table(values='abuse_number',index='facility_id',columns='public', aggfunc='count').reset_index()

Next, left join the facilities to the pivot table.


In [272]:
fac_pivot_merge = facilities.merge(complaint_pivot, how='left',on='facility_id')

Add our own outcome code


In [273]:
merged_comp_fac["omg_outcome"] = merged_comp_fac["outcome"].apply(dict([
    ('No Negative Outcome', 'Potential harm'),
    ('Exposed to Potential Harm', 'Potential harm'),
            
    ('Fall Without Injury', 'Fall, no injury'),
            
    ('Left facility without assistance without injury', 'Left facility without attendant, no injury'),
            
    ('Loss of Dignity', 'Loss of Dignity'),
            
    ('Fall with Injury', 'Fracture or other injury'),
    ('Injury During Self-Transfer', 'Fracture or other injury'),
    ('Fall Resulting In Fractured Bone(s)', 'Fracture or other injury'),
    ('Fall Resulting In Fractured Hip', 'Fracture or other injury'),
    ('Transfer Resulting In Skin Injury or Bruise', 'Fracture or other injury'),
    ('Fractured Bone', 'Fracture or other injury'),
    ('Fractured Hip', 'Fracture or other injury'),
    ('Burned', 'Fracture or other injury'),
    ('Transfer Resulting In Fractured Hip', 'Fracture or other injury'),
    ('Transfer Resulting In Fracture Bone(s)', 'Fracture or other injury'),
    ('Left Facility Without Assistance With Injury', 'Fracture or other injury'),
    ('Bruised', 'Fracture or other injury'),
    ('Skin Injury', 'Fracture or other injury'),
            
    ('Negative Behavior Escalated, Affected Other Resident(s)', 'Failure to address resident aggression'),
            
    ('Medical Condition Developed or Worsened', 'Medical condition developed or worsened'),
    ('Decubitus Ulcer(s) Developed', 'Medical condition developed or worsened'),
    ('Decubitus Ulcer(s) Worsened', 'Medical condition developed or worsened'),
    ('Urinary Tract Infection Worsened', 'Medical condition developed or worsened'),
    ('Transfer To Hospital For Treatment', 'Medical condition developed or worsened'),
            
    ('Received Incorrect or Wrong Dose of Medication(s)', 'Medication error'),
    ('The resident did not receive an ordered medication', 'Medication error'),
            
    ('Loss of Resident Property', 'Loss of property, theft or financial exploitation'),
    ('Loss of Medication', 'Loss of property, theft or financial exploitation'),
    ('Financially Exploited', 'Loss of property, theft or financial exploitation'),
            
    ('Unreasonable Discomfort', 'Unreasonable discomfort or continued pain'),
    ('Pain And Suffering Continued', 'Unreasonable discomfort or continued pain'),
            
    ('Undesirable Weight Loss', 'Weight loss'),
            
    ('Poor Continuity Of Care', 'Inadequate care'),
    ('Failed To Have Quality of Life Maintained or Enhanced', 'Inadequate care'),
    ('Failed to Receive Needed Services', 'Inadequate care'),
    ('Denied Choice In Treatment', 'Inadequate care'),
            
    ('Incontinence', 'Inadequate hygiene'),
    ('Inadequate Hygiene', 'Inadequate hygiene'),
            
    ('Physically Abused', 'Physical abuse'),
    ('Corporally Punished', 'Physical abuse'),
            
    ('Verbally Abused', 'Verbal or emotional abuse'),
    ('Mentally or Emotionally Abused', 'Verbal or emotional abuse'),
            
    ('Involuntarily Secluded', 'Involuntary seclusion'),
            
    ('Raped', 'Sexual abuse'),
    ('Sexually Abused', 'Sexual abuse'),
            
    ('Deceased', 'Death'),
    ('Facility was understaffed with no negative outcome', 'Staffing issues'),
    ('Unable to timely assess adequacy of staffing', 'Staffing issues'),
            
    ('Improperly Transferred Out of Facility, Denied Readmission or Inappropriate Move Within Facility', 'Denied readmission or moved improperly'),
]).get).fillna('')

Export the facility and complaints data for munging


In [274]:
merged_comp_fac.to_csv('../../data/processed/complaints-3-25-scrape.csv',index=False)

In [275]:
fac_pivot_merge.to_csv('../../data/processed/facilities-3-25-scrape.csv',index=False)

DONE