Property Maintenance Fines

Predicting the probability that a set of blight tickets will be paid on time

Supervised Learning. Classification

Source: Applied Machine Learning in Python | Coursera. Solved with classical machine learning classifiers here

Data provided by Michigan Data Science Team (MDST), the Michigan Student Symposium for Interdisciplinary Statistical Sciences (MSSISS) and the City of Detroit Detroit Open Data Portal.

Each row of the dataset corresponds to a single blight ticket, and includes information about when, why, and to whom each ticket was issued. The target variable is compliance, which is True if the ticket was paid early, on time, or within one month of the hearing data, False if the ticket was paid after the hearing date or not at all, and Null if the violator was found not responsible.

Features

ticket_id - unique identifier for tickets
agency_name - Agency that issued the ticket
inspector_name - Name of inspector that issued the ticket
violator_name - Name of the person/organization that the ticket was issued to
violation_street_number, violation_street_name, violation_zip_code - Address where the violation occurred
mailing_address_str_number, mailing_address_str_name, city, state, zip_code, non_us_str_code, country - Mailing address of the violator
ticket_issued_date - Date and time the ticket was issued
hearing_date - Date and time the violator's hearing was scheduled
violation_code, violation_description - Type of violation
disposition - Judgment and judgement type
fine_amount - Violation fine amount, excluding fees
admin_fee - $20 fee assigned to responsible judgments

state_fee - $10 fee assigned to responsible judgments late_fee - 10% fee assigned to responsible judgments discount_amount - discount applied, if any clean_up_cost - DPW clean-up or graffiti removal cost judgment_amount - Sum of all fines and fees grafitti_status - Flag for graffiti violations

Labels

payment_amount - Amount paid, if any
payment_date - Date payment was made, if it was received
payment_status - Current payment status as of Feb 1 2017
balance_due - Fines and fees still owed
collection_status - Flag for payments in collections
compliance [target variable for prediction]
 Null = Not responsible
 0 = Responsible, non-compliant
 1 = Responsible, compliant
compliance_detail - More information on why each ticket was marked compliant or non-compliant



In [1]:

    
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import helper
import keras

helper.info_gpu()
#sns.set_palette("GnBu_d")
#helper.reproducible(seed=0) # Setup reproducible results from run to run using Keras

%matplotlib inline









    



Using TensorFlow backend.






    



/device:GPU:0
Keras		v2.1.3
TensorFlow	v1.4.1

1. Data Processing



In [2]:

    
data_path = 'data/property_maintenance_fines_data.csv'
target = ['compliance']

df_original = pd.read_csv(data_path, encoding='iso-8859-1', dtype='unicode')
print("{} rows \n{} columns \ntarget: {}".format(*df_original.shape, target))









    



250306 rows 
34 columns 
target: ['compliance']

Explore and Clean the target



In [3]:

    
print(df_original[target].squeeze().value_counts(dropna=False))









    



0.0    148283
NaN     90426
1.0     11597
Name: compliance, dtype: int64



In [4]:

    
# Remove rows with NULL targets

df_original = df_original.dropna(subset=target)

print(df_original[target].squeeze().value_counts())
print(df_original.shape)









    



0.0    148283
1.0     11597
Name: compliance, dtype: int64
(159880, 34)

Imbalanced target: the evaluation metric used in this problem is the Area Under the ROC Curve

Split original data into training and validation test set



In [5]:

    
from sklearn.model_selection import train_test_split

df, df_test = train_test_split(
    df_original, test_size=0.2, stratify=df_original[target], random_state=0)

To avoid data leakage, only the training dataframe, df, will be explored and processed here

Show the training data



In [6]:

    
df.head(2)









    Out[6]:







  
    
      
      ticket_id
      agency_name
      inspector_name
      violator_name
      violation_street_number
      violation_street_name
      violation_zip_code
      mailing_address_str_number
      mailing_address_str_name
      city
      ...
      clean_up_cost
      judgment_amount
      payment_amount
      balance_due
      payment_date
      payment_status
      collection_status
      grafitti_status
      compliance_detail
      compliance
    
  
  
    
      131030
      159232
      Department of Public Works
      Montgomery-Coit, Kimberlye
      JOHNSON-GREENE, MARGUSIE F.
      11645.0
      LAKEPOINTE
      NaN
      11645
      LAKEPOINTE
      DETROIT
      ...
      0.0
      85.0
      85.0
      0.0
      2008-09-12 00:00:00
      PAID IN FULL
      NaN
      NaN
      non-compliant by late payment more than 1 month
      0.0
    
    
      29573
      49039
      Buildings, Safety Engineering & Env Department
      Watson, Jerry
      BAPT CHURCH, HOLY  TABERNACLE
      3184.0
      CANFIELD
      NaN
      3184
      E. CANFIELD
      DETROIT
      ...
      0.0
      305.0
      305.0
      0.0
      2007-06-26 00:00:00
      PAID IN FULL
      NaN
      NaN
      non-compliant by late payment more than 1 month
      0.0
    
  

2 rows × 34 columns

Missing values



In [7]:

    
helper.missing(df)

Transform Data

Remove irrelevant features



In [8]:

    
def remove_features(df):

    relevant_col = ['agency_name', 'violation_street_name', 'city', 'state', 'violator_name',
        'violation_code', 'late_fee', 'discount_amount', 'judgment_amount', 'disposition',
        'fine_amount', 'compliance']

    df = df[relevant_col]

    return df


df = remove_features(df)

print(df.shape)









    



(127904, 12)

Classify variables



In [9]:

    
num = ['late_fee', 'discount_amount', 'judgment_amount', 'fine_amount']

df = helper.classify_data(df, target, numerical=num)

pd.DataFrame(dict(df.dtypes), index=["Type"])[df.columns].head()  # show data types









    



numerical features:   4
categorical features: 7
target 'compliance': category






    Out[9]:







  
    
      
      late_fee
      discount_amount
      judgment_amount
      fine_amount
      agency_name
      violation_street_name
      city
      state
      violator_name
      violation_code
      disposition
      compliance
    
  
  
    
      Type
      float32
      float32
      float32
      float32
      category
      category
      category
      category
      category
      category
      category
      category

Remove low-frequency categorical values



In [10]:

    
df, dict_categories = helper.remove_categories(df, target=target, ratio=0.001, show=False)

Fill missing values

Missing categorical values filled by 'Other' There are no numerical missing values



In [11]:

    
df = helper.fill_simple(df, target, missing_categorical='Other')









    



Missing categorical filled with label: "Other"



In [12]:

    
helper.missing(df);









    



No missing values found

Visualize the data

Categorical features



In [13]:

    
for i in ['state', 'disposition']:
    helper.show_categorical(df[[i]])

Target vs Categorical features



In [14]:

    
for i in ['state', 'disposition']:
    helper.show_target_vs_categorical(df[[i, target[0]]], target)

Numerical features



In [15]:

    
helper.show_numerical(df, kde=True)

Target vs Numerical features



In [16]:

    
helper.show_target_vs_numerical(df, target, point_size=10 ,jitter=0.3, fit_reg=True)
plt.ylim(ymin=-0.2, ymax=1.2)









    Out[16]:





(-0.2, 1.2)

Correlation between numerical features and target



In [17]:

    
helper.show_correlation(df, target, figsize=(6,3))

2. Neural Network Model

Select the features



In [18]:

    
droplist = []  # features to drop

# For the model 'data' instead of 'df'
data = df.copy()
# del(df)
data.drop(droplist, axis='columns', inplace=True)
data.head(2)









    Out[18]:







  
    
      
      late_fee
      discount_amount
      judgment_amount
      fine_amount
      agency_name
      violation_street_name
      city
      state
      violator_name
      violation_code
      disposition
      compliance
    
  
  
    
      131030
      5.0
      0.0
      85.0
      50.0
      Department of Public Works
      LAKEPOINTE
      DETROIT
      MI
      Other
      9-1-103(C)
      Responsible by Determination
      0.0
    
    
      29573
      25.0
      0.0
      305.0
      250.0
      Buildings, Safety Engineering & Env Department
      Other
      DETROIT
      MI
      Other
      9-1-36(a)
      Responsible by Default
      0.0

Scale numerical variables

Shift and scale numerical variables to a standard normal distribution. The scaling factors are saved to be used for predictions.



In [19]:

    
data, scale_param = helper.scale(data)

Create dummy features

Replace categorical features (no target) with dummy features



In [20]:

    
data, dict_dummies = helper.replace_by_dummies(data, target)

model_features = [f for f in data if f not in target]  # sorted neural network inputs

data.head(3)









    Out[20]:







  
    
      
      late_fee
      discount_amount
      judgment_amount
      fine_amount
      compliance
      agency_name_Buildings, Safety Engineering & Env Department
      agency_name_Department of Public Works
      agency_name_Detroit Police Department
      agency_name_Health Department
      agency_name_Other
      ...
      violation_code_9-1-43(a) - (Dwellin
      violation_code_9-1-43(a) - (Structu
      violation_code_9-1-81(a)
      violation_code_9-1-82(d) - (Dwellin
      violation_code_Other
      disposition_Responsible (Fine Waived) by Deter
      disposition_Responsible by Admission
      disposition_Responsible by Default
      disposition_Responsible by Determination
      disposition_Other
    
  
  
    
      131030
      -0.422306
      -0.049474
      -0.451255
      -0.453714
      0.0
      0
      1
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
    
    
      29573
      -0.126339
      -0.049474
      -0.154289
      -0.156975
      0.0
      1
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
    
    
      2657
      -0.126339
      -0.049474
      -0.154289
      -0.156975
      0.0
      1
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
    
  

3 rows × 446 columns

Split the data into training and validation sets



In [21]:

    
val_size = 0.2
random_state = 0


def validation_split(data, val_size=0.25):

    train, test = train_test_split(
        data, test_size=val_size, random_state=random_state, stratify=data[target])

    # Separate the data into features and target (x=features, y=target)
    x_train, y_train = train.drop(target, axis=1).values, train[target].values
    x_val, y_val = test.drop(target, axis=1).values, test[target].values
    # _nc: non-categorical yet (needs one-hot encoding)

    return x_train, y_train, x_val, y_val


x_train, y_train, x_val, y_val = validation_split(data, val_size=val_size)

# x_train = x_train.astype(np.float16)
y_train = y_train.astype(np.float16)
# X_val = x_val.astype(np.float16)
y_val = y_val.astype(np.float16)

Encode the output



In [22]:

    
def one_hot_output(y_train, y_val):
    num_classes = len(np.unique(y_train))
    y_train = keras.utils.to_categorical(y_train, num_classes)
    y_val = keras.utils.to_categorical(y_val, num_classes)
    return y_train, y_val


y_train, y_val = one_hot_output(y_train, y_val)



In [23]:

    
print("train size \t X:{} \t Y:{}".format(x_train.shape, y_train.shape))
print("val size \t X:{} \t Y:{}".format(x_val.shape, y_val.shape))









    



train size 	 X:(102323, 445) 	 Y:(102323, 2)
val size 	 X:(25581, 445) 	 Y:(25581, 2)

Build a dummy classifier



In [24]:

    
from sklearn.dummy import DummyClassifier

clf = DummyClassifier(strategy='most_frequent').fit(x_train, np.ravel(y_train))
# The dummy 'most_frequent' classifier always predicts class 0
y_pred = clf.predict(x_val).reshape([-1, 1])

helper.binary_classification_scores(y_val[:, 1], y_pred);









    



Scores:
-----------
Log_Loss: 	2.5059
Accuracy: 	0.93
Precision: 	0.00
Recall: 	0.00
ROC AUC: 	0.00
F1-score: 	0.00

Confusion matrix: 
 [[23725     0]
 [ 1856     0]]

Build a random forest classifier (best of grid search)



In [25]:

    
from sklearn.ensemble import RandomForestClassifier


%time clf_random_forest_opt = RandomForestClassifier(n_estimators = 30, max_features=150, \
                                max_depth=13, class_weight='balanced', n_jobs=-1, \
                                   random_state=0).fit(x_train, np.ravel(y_train[:,1]))









    



CPU times: user 29 s, sys: 63.4 ms, total: 29.1 s
Wall time: 9.1 s



In [26]:

    
y_pred = clf_random_forest_opt.predict(x_val).reshape([-1, 1])
helper.binary_classification_scores(y_val[:, 1], y_pred);









    



Scores:
-----------
Log_Loss: 	4.3881
Accuracy: 	0.87
Precision: 	0.31
Recall: 	0.60
ROC AUC: 	0.75
F1-score: 	0.41

Confusion matrix: 
 [[21209  2516]
 [  734  1122]]

Build the Neural Network for Binary Classification



In [27]:

    
cw = helper.get_class_weight(y_train[:, 1])  # class weight (imbalanced target)

import keras
from keras.models import Sequential
from keras.layers.core import Dense, Dropout


def build_nn(input_size, output_size, summary=False):

    input_nodes = input_size // 8

    model = Sequential()
    model.add(Dense(input_nodes, input_dim=input_size, activation='relu'))

    model.add(Dense(output_size, activation='softmax'))

    if summary:
        model.summary()

    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

    return model


model = build_nn(x_train.shape[1], y_train.shape[1], summary=True)









    



_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 55)                24530     
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 112       
=================================================================
Total params: 24,642
Trainable params: 24,642
Non-trainable params: 0
_________________________________________________________________

Train the Neural Network



In [28]:

    
import os
from time import time
model_path = os.path.join("models", "detroit.h5")


def train_nn(model, x_train, y_train, validation_data=None, path=False, show=True):
    """ 
    Train the neural network model. If no validation_datais provided, a split for validation
    will be used
    """
    
    if show:
        print('Training ....')
    
    callbacks = [keras.callbacks.EarlyStopping(monitor='val_loss', patience=0, verbose=1)]
    t0 = time()

    history = model.fit(
        x_train,
        y_train,
        epochs=100,
        batch_size=2048,
        
        class_weight = cw,
        
        verbose=1,
        validation_split=0.3,
        validation_data = validation_data,
        callbacks=callbacks)

    if show:
        print("time: \t {:.1f} s".format(time() - t0))
        helper.show_training(history)

    if path:
        model.save(path)
        print("\nModel saved at", path)
    
    return history
        

model = None
model = build_nn(x_train.shape[1], y_train.shape[1], summary=False)
train_nn(model, x_train, y_train, path=None);


from sklearn.metrics import roc_auc_score

y_pred_train = model.predict(x_train, verbose=1)
print('\n\n ROC_AUC train:\t{:.2f} \n'.format(roc_auc_score(y_train, y_pred_train)))
y_pred_val = model.predict(x_val, verbose=1)
print('\n\n ROC_AUC val:\t{:.2f}'.format(roc_auc_score(y_val, y_pred_val)))









    



Training ....
Train on 71626 samples, validate on 30697 samples
Epoch 1/100
71626/71626 [==============================] - 6s 84us/step - loss: 0.6311 - acc: 0.7677 - val_loss: 0.5909 - val_acc: 0.7567
Epoch 2/100
71626/71626 [==============================] - 5s 76us/step - loss: 0.5471 - acc: 0.7890 - val_loss: 0.5538 - val_acc: 0.8086
Epoch 3/100
71626/71626 [==============================] - 5s 69us/step - loss: 0.5232 - acc: 0.8325 - val_loss: 0.5477 - val_acc: 0.8383
Epoch 4/100
71626/71626 [==============================] - 5s 65us/step - loss: 0.5144 - acc: 0.8364 - val_loss: 0.5460 - val_acc: 0.8433
Epoch 5/100
71626/71626 [==============================] - 5s 64us/step - loss: 0.5092 - acc: 0.8353 - val_loss: 0.5432 - val_acc: 0.8311
Epoch 6/100
71626/71626 [==============================] - 5s 64us/step - loss: 0.5048 - acc: 0.8282 - val_loss: 0.5427 - val_acc: 0.8280
Epoch 7/100
71626/71626 [==============================] - 5s 67us/step - loss: 0.5014 - acc: 0.8279 - val_loss: 0.5427 - val_acc: 0.8283
Epoch 00007: early stopping
time: 	 35.1 s






    












    



Training loss:  	0.5014
Validation loss: 	0.5427

Training accuracy: 	0.828
Validation accuracy:	0.828
102323/102323 [==============================] - 4s 39us/step


 ROC_AUC train:	0.82 

25581/25581 [==============================] - 1s 39us/step


 ROC_AUC val:	0.81

Validate the model (validation set)



In [29]:

    
helper.binary_classification_scores(y_val[:, 1], y_pred_val[:, 1]);









    



Scores:
-----------
Log_Loss: 	nan
Accuracy: 	0.83
Precision: 	0.24
Recall: 	0.63
ROC AUC: 	0.81
F1-score: 	0.35

Confusion matrix: 
 [[20094  3631]
 [  678  1178]]

Evaluate the final model (test set)



In [30]:

    
df_test.head(2)









    Out[30]:







  
    
      
      ticket_id
      agency_name
      inspector_name
      violator_name
      violation_street_number
      violation_street_name
      violation_zip_code
      mailing_address_str_number
      mailing_address_str_name
      city
      ...
      clean_up_cost
      judgment_amount
      payment_amount
      balance_due
      payment_date
      payment_status
      collection_status
      grafitti_status
      compliance_detail
      compliance
    
  
  
    
      29156
      48696
      Department of Public Works
      Funchess, Mitchell
      CORP, CONTIMORTGAGE
      8200.0
      HEYDEN
      NaN
      3815
      WEST TEMPLE
      SALT LAKE CITY
      ...
      0.0
      140.0
      0.0
      140.0
      NaN
      NO PAYMENT APPLIED
      IN COLLECTION
      NaN
      non-compliant by no payment
      0.0
    
    
      125262
      152329
      Buildings, Safety Engineering & Env Department
      Doetsch, James
      JACKSON, THEO
      13821.0
      GLENWOOD
      NaN
      1464
      PO BOX
      DETROIT
      ...
      0.0
      305.0
      0.0
      305.0
      NaN
      NO PAYMENT APPLIED
      NaN
      NaN
      non-compliant by no payment
      0.0
    
  

2 rows × 34 columns

Process test data with training set parameters (no data leakage)



In [31]:

    
df_test = remove_features(df_test)

df_test = helper.classify_data(df_test, target, numerical=num)

df_test, _ = helper.remove_categories(
    df_test, target=target, show=False, dict_categories=dict_categories)

df_test = helper.fill_simple(df_test, target, missing_categorical='Other') 

df_test, _ = helper.scale(df_test, scale_param)
df_test, _ = helper.replace_by_dummies(df_test, target, dict_dummies)
df_test = df_test[model_features+target] # sort columns to match training features order









    



numerical features:   4
categorical features: 7
target 'compliance': category
Missing categorical filled with label: "Other"



In [32]:

    
def separate_x_y(data):
    """ Separate the data into features and target (x=features, y=target) """

    x, y = data.drop(target, axis=1).values, data[target].values
    x = x.astype(np.float16)
    y = y.astype(np.float16)
   
    return x, y

x_test, y_test = separate_x_y(df_test)

y_test = keras.utils.to_categorical(y_test, 2)

Random Forest model



In [33]:

    
y_pred = clf_random_forest_opt.predict_proba(x_test)[:,1]
helper.binary_classification_scores(y_test[:,1], y_pred);









    



Scores:
-----------
Log_Loss: 	0.4576
Accuracy: 	0.87
Precision: 	0.30
Recall: 	0.59
ROC AUC: 	0.81
F1-score: 	0.40

Confusion matrix: 
 [[26488  3169]
 [  955  1364]]



In [34]:

    
helper.show_feature_importances(model_features, clf_random_forest_opt)









    



 Top contributing features:
 --------------------------
disposition_Responsible by Default          0.32
late_fee                                    0.26
judgment_amount                             0.09
disposition_Responsible by Admission        0.05
disposition_Responsible by Determination    0.04
fine_amount                                 0.03
discount_amount                             0.03
violation_code_9-1-36(a)                    0.01
violation_code_22-2-61                      0.01
violation_code_9-1-81(a)                    0.01
dtype: float64

Neural Network model



In [35]:

    
y_pred = model.predict(x_test, verbose=1)[:,1]
helper.binary_classification_scores(y_test[:,1], y_pred);









    



31976/31976 [==============================] - 1s 22us/step
Scores:
-----------
Log_Loss: 	nan
Accuracy: 	0.83
Precision: 	0.24
Recall: 	0.62
ROC AUC: 	0.80
F1-score: 	0.35

Confusion matrix: 
 [[25076  4581]
 [  875  1444]]

Compare with other non-neural ML models



In [36]:

    
helper.ml_classification(x_train, y_train[:, 1], x_test, y_test[:, 1])









    



Naive Bayes
AdaBoost
Decision Tree
Random Forest
Extremely Randomized Trees






    Out[36]:







  
    
      
      Time (s)
      Loss
      Accuracy
      Precision
      Recall
      ROC-AUC
      F1-score
    
  
  
    
      AdaBoost
      12.40
      0.66
      0.94
      0.89
      0.22
      0.80
      0.35
    
    
      Random Forest
      87.48
      0.48
      0.94
      0.70
      0.27
      0.77
      0.39
    
    
      Extremely Randomized Trees
      130.21
      0.81
      0.94
      0.66
      0.26
      0.74
      0.38
    
    
      Decision Tree
      18.39
      1.47
      0.93
      0.59
      0.28
      0.68
      0.38
    
    
      Naive Bayes
      1.88
      20.79
      0.36
      0.09
      0.83
      0.59
      0.16

	ticket_id	agency_name	inspector_name	violator_name	violation_street_number	violation_street_name	violation_zip_code	mailing_address_str_number	mailing_address_str_name	city	...	clean_up_cost	judgment_amount	payment_amount	balance_due	payment_date	payment_status	collection_status	grafitti_status	compliance_detail	compliance
131030	159232	Department of Public Works	Montgomery-Coit, Kimberlye	JOHNSON-GREENE, MARGUSIE F.	11645.0	LAKEPOINTE	NaN	11645	LAKEPOINTE	DETROIT	...	0.0	85.0	85.0	0.0	2008-09-12 00:00:00	PAID IN FULL	NaN	NaN	non-compliant by late payment more than 1 month	0.0
29573	49039	Buildings, Safety Engineering & Env Department	Watson, Jerry	BAPT CHURCH, HOLY TABERNACLE	3184.0	CANFIELD	NaN	3184	E. CANFIELD	DETROIT	...	0.0	305.0	305.0	0.0	2007-06-26 00:00:00	PAID IN FULL	NaN	NaN	non-compliant by late payment more than 1 month	0.0

	late_fee	discount_amount	judgment_amount	fine_amount	agency_name	violation_street_name	city	state	violator_name	violation_code	disposition	compliance
131030	5.0	0.0	85.0	50.0	Department of Public Works	LAKEPOINTE	DETROIT	MI	Other	9-1-103(C)	Responsible by Determination	0.0
29573	25.0	0.0	305.0	250.0	Buildings, Safety Engineering & Env Department	Other	DETROIT	MI	Other	9-1-36(a)	Responsible by Default	0.0

	late_fee	discount_amount	judgment_amount	fine_amount	agency_name_Buildings, Safety Engineering & Env Department	agency_name_Department of Public Works	...	disposition_Responsible by Default	disposition_Responsible by Determination
131030	-0.422306	-0.049474	-0.451255	-0.453714	0	1	...	0	1
29573	-0.126339	-0.049474	-0.154289	-0.156975	1	0	...	1	0
2657	-0.126339	-0.049474	-0.154289	-0.156975	1	0	...	1	0

	ticket_id	agency_name	inspector_name	violator_name	violation_street_number	violation_street_name	violation_zip_code	mailing_address_str_number	mailing_address_str_name	city	...	clean_up_cost	judgment_amount	payment_amount	balance_due	payment_date	payment_status	collection_status	grafitti_status	compliance_detail	compliance
29156	48696	Department of Public Works	Funchess, Mitchell	CORP, CONTIMORTGAGE	8200.0	HEYDEN	NaN	3815	WEST TEMPLE	SALT LAKE CITY	...	0.0	140.0	0.0	140.0	NaN	NO PAYMENT APPLIED	IN COLLECTION	NaN	non-compliant by no payment	0.0
125262	152329	Buildings, Safety Engineering & Env Department	Doetsch, James	JACKSON, THEO	13821.0	GLENWOOD	NaN	1464	PO BOX	DETROIT	...	0.0	305.0	0.0	305.0	NaN	NO PAYMENT APPLIED	NaN	NaN	non-compliant by no payment	0.0

	Time (s)	Loss	Accuracy	Precision	Recall	ROC-AUC	F1-score
AdaBoost	12.40	0.66	0.94	0.89	0.22	0.80	0.35
Random Forest	87.48	0.48	0.94	0.70	0.27	0.77	0.39
Extremely Randomized Trees	130.21	0.81	0.94	0.66	0.26	0.74	0.38
Decision Tree	18.39	1.47	0.93	0.59	0.28	0.68	0.38
Naive Bayes	1.88	20.79	0.36	0.09	0.83	0.59	0.16