This is an analysis of complaints data, munged in ../notebooks/mung.ipynb. Raw data is in ../data/raw

The fields are:

  1. abuse_number: A unique number assigned each complaint.
  2. facility_id: A unique number to each facility building. Stays if ownership changes.
  3. facility_name: Name of facility as of January 2017, when DHS provided the facility data to The Oregonian.
  4. abuse_type: A – facility abuse; L – licensing. Note: This does not apply to nursing facilities. All their complaints are either blank in this field or licensing.
  5. action_notes: DHS determination of what general acts constituted the abuse or rule violation.
  6. incident_date: Date the incident occured
  7. outcome: A very brief description of the consequences of the abuse or rule violation to the reisdent
  8. outcome_notes: A detailed description of what happened.
  9. year: year incident occured
  10. fac_name: If complaint is online, name listed for the facility
  11. public: Whether or not complaint is online

In [17]:
import pandas as pd
import numpy as np
import analysis_data_loader as loader
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
pd.set_option('display.max_colwidth', -1)



In [18]:
df = loader.load_facilities()

In [20]:
df = pd.read_csv('/Users/fzarkhin/OneDrive - Advance Central Services, Inc/fproj/github/database-story/data/processed/complaints.csv')

How many complaints do not appear in the state's public database?


In [21]:
df[df['public']=='offline'].count()[0]


Out[21]:
7846

How many complaints do appear in the state's public database?


In [22]:
df[df['public']=='online'].count()[0]


Out[22]:
5186

What percent of complaints are missing?


In [23]:
df[df['public']=='offline'].count()[0]/df.count()[0]*100


Out[23]:
60.205647636586868

How many complaints were labelled 'Exposed to potential harm' or 'No negative outcome?'


In [24]:
df[(df['outcome']=='Exposed to Potential Harm') | (df['outcome']=='No Negative Outcome')].count()[0]


Out[24]:
2509

Of all missing complaints, what percent are in the above two categories?


In [25]:
df[(df['outcome']=='Exposed to Potential Harm') |
   (df['outcome']=='No Negative Outcome')].count()[0]/df[df['public']=='offline'].count()[0]*100


Out[25]:
31.978078001529443

How many complaints are labelled 'A,' which stands for abuse, but are offline?


In [26]:
df[(df['abuse_type']=='A') & (df['public']=='offline')].count()[0]


Out[26]:
65

In [27]:
#df.groupby('outcome').count().reset_index()[['outcome','abuse_number']].sort_values('abuse_number', ascending = False)

What's the online/offline breakdown by outcome?


In [32]:
totals = df.groupby(['omg_outcome','public']).count()['abuse_number'].unstack().reset_index()

In [39]:
totals.fillna(0, inplace = True)

In [40]:
totals['total'] = totals['online']+totals['offline']

In [41]:
totals['pct_offline'] = round(totals['offline']/totals['total']*100)

In [44]:
totals.sort_values('pct_offline',ascending=False)


Out[44]:
public omg_outcome offline online total pct_offline
16 Staffing issues 12.0 0.0 12.0 100.0
1 Denied readmission or moved improperly 35.0 2.0 37.0 95.0
14 Potential harm 2361.0 148.0 2509.0 94.0
3 Fall, no injury 150.0 13.0 163.0 92.0
8 Left facility without attendant, no injury 207.0 18.0 225.0 92.0
9 Loss of Dignity 884.0 97.0 981.0 90.0
12 Medication error 983.0 217.0 1200.0 82.0
5 Inadequate care 496.0 170.0 666.0 74.0
6 Inadequate hygiene 138.0 104.0 242.0 57.0
10 Loss of property, theft or financial exploitation 809.0 737.0 1546.0 52.0
13 Physical abuse 89.0 92.0 181.0 49.0
18 Verbal or emotional abuse 70.0 94.0 164.0 43.0
7 Involuntary seclusion 8.0 11.0 19.0 42.0
2 Failure to address resident aggression 395.0 622.0 1017.0 39.0
4 Fracture or other injury 680.0 1185.0 1865.0 36.0
11 Medical condition developed or worsened 370.0 1046.0 1416.0 26.0
0 Death 7.0 23.0 30.0 23.0
15 Sexual abuse 15.0 49.0 64.0 23.0
17 Unreasonable discomfort or continued pain 115.0 452.0 567.0 20.0
19 Weight loss 20.0 106.0 126.0 16.0

In [ ]: