Candidate party classification

By Ben Welsh

An analysis of what method is ultimately classifying scraped candidate records to a party.


In [45]:
import pandas as pd

In [46]:
df = pd.read_csv("/home/ben/Code/django-calaccess-processed-data/example/django.log", sep="|")

In [47]:
df.head()


Out[47]:
DEBUG 19/Jul/2017 01:10:40 __init__ Flushing 578 Candidacy objects
0 DEBUG 19/Jul/2017 01:10:40 __init__ Flushing 342 CandidateContest objects
1 DEBUG 19/Jul/2017 01:10:40 __init__ Flushing 166 BallotMeasureContest objects
2 DEBUG 19/Jul/2017 01:10:40 __init__ Flushing 2 RetentionContest objects
3 DEBUG 19/Jul/2017 01:10:40 __init__ Flushing 55 Election objects
4 DEBUG 19/Jul/2017 01:10:40 __init__ Flushing 2 Membership objects

In [48]:
df.columns = ['level', 'time', 'logger', 'message']

In [49]:
df = df[df.logger == 'candidates']

In [50]:
def method(message):
    if 'based' in message:
        return message.split("based on")[-1]
    else:
        return message.split("after")[-1]

In [51]:
df['reason'] = df.message.apply(method)

In [52]:
df.reason.value_counts()


Out[52]:
 Form 501 party                   2192
 failing to find a match            71
 office                             19
 checking its scraped filer id       9
 Form 501 filer id                   9
 correction                          4
Name: reason, dtype: int64

In [53]:
df[df.reason.str.contains("correction")]


Out[53]:
level time logger message reason
1707 DEBUG 19/Jul/2017 01:14:58 candidates WALLS, JIMELLE L. party set to NO PARTY PREFER... correction
1715 DEBUG 19/Jul/2017 01:14:59 candidates WINSTON, ALMA MARIE party set to REPUBLICAN ba... correction
1775 DEBUG 19/Jul/2017 01:15:07 candidates WAHL, BERNT party set to NO PARTY PREFERENCE b... correction
2037 DEBUG 19/Jul/2017 01:15:43 candidates RODRIGUEZ, GREG party set to DEMOCRATIC based ... correction