Decision Trees in Practice

  • Implement binary decision trees with different early stopping methods.
  • Compare models with different stopping parameters.
  • Visualize the concept of overfitting in decision trees.

Fire up GraphLab Create

In [1]:
import graphlab

Load LendingClub Dataset

In [2]:
loans = graphlab.SFrame('')

2016-04-04 12:24:49,088 [INFO] graphlab.cython.cy_server, 176: GraphLab Create v1.8.5 started. Logging: /tmp/graphlab_server_1459752885.log
This non-commercial license of GraphLab Create is assigned to and will expire on November 11, 2016. For commercial licensing options, visit

reassign the labels to have +1 for a safe loan, and -1 for a risky (bad) loan.

In [3]:
loans['safe_loans'] = loans['bad_loans'].apply(lambda x : +1 if x==0 else -1)
loans = loans.remove_column('bad_loans')

use 4 categorical features:

  1. grade of the loan
  2. the length of the loan term
  3. the home ownership status: own, mortgage, rent
  4. number of years of employment.

In the dataset, each of these features is a categorical feature. I will convert this to binary data in a subsequent section using 1-hot encoding.

In [4]:
features = ['grade',              # grade of the loan
            'term',               # the term of the loan
            'home_ownership',     # home_ownership status: own, mortgage or rent
            'emp_length',         # number of years of employment
target = 'safe_loans'
loans = loans[features + [target]]

Subsample dataset to make sure classes are balanced

undersample the larger class (safe loans) in order to balance out our dataset.

In [5]:
safe_loans_raw = loans[loans[target] == 1]
risky_loans_raw = loans[loans[target] == -1]

percentage = len(risky_loans_raw)/float(len(safe_loans_raw))
safe_loans = safe_loans_raw.sample(percentage, seed = 1)
risky_loans = risky_loans_raw
loans_data = risky_loans.append(safe_loans)

print "Percentage of safe loans                 :", len(safe_loans) / float(len(loans_data))
print "Percentage of risky loans                :", len(risky_loans) / float(len(loans_data))
print "Total number of loans in our new dataset :", len(loans_data)

Percentage of safe loans                 : 0.502236174422
Percentage of risky loans                : 0.497763825578
Total number of loans in our new dataset : 46508

Transform categorical data into binary features

In [6]:
loans_data = risky_loans.append(safe_loans)
for feature in features:
    loans_data_one_hot_encoded = loans_data[feature].apply(lambda x: {x: 1})    
    loans_data_unpacked = loans_data_one_hot_encoded.unpack(column_name_prefix=feature)
    # Change None's to 0's
    for column in loans_data_unpacked.column_names():
        loans_data_unpacked[column] = loans_data_unpacked[column].fillna(0)

The feature columns now look like this:

In [7]:
features = loans_data.column_names()
features.remove('safe_loans')  # Remove the response variable

 'term. 36 months',
 'term. 60 months',
 'emp_length.1 year',
 'emp_length.10+ years',
 'emp_length.2 years',
 'emp_length.3 years',
 'emp_length.4 years',
 'emp_length.5 years',
 'emp_length.6 years',
 'emp_length.7 years',
 'emp_length.8 years',
 'emp_length.9 years',
 'emp_length.< 1 year',

Train-Validation split

I split the data into a train-validation split with 80% of the data in the training set and 20% of the data in the validation set.

In [10]:
train_data, validation_set = loans_data.random_split(.8, seed=1)

Early stopping methods for decision trees

3 early stopping methods:

  1. Reached a maximum depth. (set by parameter max_depth).
  2. Reached a minimum node size. (set by parameter min_node_size).
  3. Don't split if the gain in error reduction is too small. (set by parameter min_error_reduction).

Early stopping condition 1: Maximum depth

Early stopping condition 2: Minimum node size

The function reached_minimum_node_size takes 2 arguments:

  1. The data (from a node)
  2. The minimum number of data points that a node is allowed to split on, min_node_size.

This function simply calculates whether the number of data points at a given node is less than or equal to the specified minimum node size. This function will be used to detect this early stopping condition in the decision_tree_create function.

In [11]:
def reached_minimum_node_size(data, min_node_size):
    if len(data)<=min_node_size:
        return True
        return False

Early stopping condition 3: Minimum gain in error reduction

The function error_reduction takes 2 arguments:

  1. The error before a split, error_before_split.
  2. The error after a split, error_after_split.

This function computes the gain in error reduction, i.e., the difference between the error before the split and that after the split. This function will be used to detect this early stopping condition in the decision_tree_create function.

In [12]:
def error_reduction(error_before_split, error_after_split):
    return error_before_split-error_after_split

Binary decision tree helper functions

In [1]:
def intermediate_node_num_mistakes(labels_in_node):
    if len(labels_in_node) == 0:
        return 0
    no_safe_loans = (labels_in_node == 1).sum()
    no_risky_loans = (labels_in_node == -1).sum()
    if no_safe_loans > no_risky_loans :
        return no_risky_loans
        return no_safe_loans

In [14]:
def best_splitting_feature(data, features, target):
    best_feature = None
    best_error = 10 
    num_data_points = float(len(data))  
    for feature in features:
        left_split = data[data[feature] == 0]
        right_split =  data[data[feature] == 1]
        left_mistakes = intermediate_node_num_mistakes(left_split[target])             
        right_mistakes = intermediate_node_num_mistakes(right_split[target])
        error = (left_mistakes+right_mistakes)/num_data_points
        if error < best_error:
            best_error = error
            best_feature = feature
    return best_feature

In [15]:
def create_leaf(target_values):
    leaf = {'splitting_feature' : None,
            'left' : None,
            'right' : None,
            'is_leaf': None,
            'prediction' : None}
    num_ones = len(target_values[target_values == +1])
    num_minus_ones = len(target_values[target_values == -1])
    if num_ones > num_minus_ones:
        leaf['prediction'] = 1          
        leaf['prediction'] = -1     
    leaf['is_leaf'] = True     
    return leaf

Incorporating new early stopping conditions in binary decision tree implementation

In [18]:
def decision_tree_create(data, features, target, current_depth = 0, 
                         max_depth = 10, min_node_size=1, 
    remaining_features = features[:]
    target_values = data[target]
    print "--------------------------------------------------------------------"
    print "Subtree, depth = %s (%s data points)." % (current_depth, len(target_values))
    # Stopping condition 1: All nodes are of the same type.
    if intermediate_node_num_mistakes(target_values) == 0:
        print "Stopping condition 1 reached. All data points have the same target value."                
        return create_leaf(target_values)
    # Stopping condition 2: No more features to split on.
    if remaining_features == []:
        print "Stopping condition 2 reached. No remaining features."                
        return create_leaf(target_values)    
    # Early stopping condition 1: Reached max depth limit.
    if current_depth >= max_depth:
        print "Early stopping condition 1 reached. Reached maximum depth."
        return create_leaf(target_values)
    # Early stopping condition 2: Reached the minimum node size.
    if reached_minimum_node_size(data,min_node_size):   
        print "Early stopping condition 2 reached. Reached minimum node size."
        return create_leaf(target_values)
    splitting_feature = best_splitting_feature(data, features, target) 
    left_split = data[data[splitting_feature] == 0]
    right_split = data[data[splitting_feature] == 1]
    error_before_split = intermediate_node_num_mistakes(target_values) / float(len(data))
    left_mistakes = intermediate_node_num_mistakes(left_split[target])
    right_mistakes = intermediate_node_num_mistakes(right_split[target])
    error_after_split = (left_mistakes + right_mistakes) / float(len(data))
    if error_reduction(error_before_split,error_after_split)<=min_error_reduction:    
        print "Early stopping condition 3 reached. Minimum error reduction."
        return  create_leaf(target_values)    
    print "Split on feature %s. (%s, %s)" % (\
                      splitting_feature, len(left_split), len(right_split))
    left_tree = decision_tree_create(left_split, remaining_features, target, 
                                     current_depth + 1, max_depth, min_node_size, min_error_reduction)        
    right_tree = decision_tree_create(right_split, remaining_features, target, 
                                     current_depth + 1, max_depth, min_node_size, min_error_reduction) 
    return {'is_leaf'          : False, 
            'prediction'       : None,
            'splitting_feature': splitting_feature,
            'left'             : left_tree, 
            'right'            : right_tree}

In [19]:
def count_nodes(tree):
    if tree['is_leaf']:
        return 1
    return 1 + count_nodes(tree['left']) + count_nodes(tree['right'])

Build a tree.

In [21]:
my_decision_tree_new = decision_tree_create(train_data, features, 'safe_loans', max_depth = 6, 
                                min_node_size = 100, min_error_reduction=0.0)

Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
Subtree, depth = 2 (9122 data points).
Early stopping condition 3 reached. Minimum error reduction.
Subtree, depth = 2 (101 data points).
Split on feature emp_length.n/a. (96, 5)
Subtree, depth = 3 (96 data points).
Early stopping condition 2 reached. Reached minimum node size.
Subtree, depth = 3 (5 data points).
Early stopping condition 2 reached. Reached minimum node size.
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
Subtree, depth = 6 (20638 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (96 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 5 (932 data points).
Split on feature grade.A. (702, 230)
Subtree, depth = 6 (702 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (230 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 4 (358 data points).
Split on feature emp_length.8 years. (347, 11)
Subtree, depth = 5 (347 data points).
Early stopping condition 3 reached. Minimum error reduction.
Subtree, depth = 5 (11 data points).
Early stopping condition 2 reached. Reached minimum node size.
Subtree, depth = 3 (1276 data points).
Early stopping condition 3 reached. Minimum error reduction.
Subtree, depth = 2 (4701 data points).
Early stopping condition 3 reached. Minimum error reduction.

In [22]:
my_decision_tree_old = decision_tree_create(train_data, features, 'safe_loans', max_depth = 6, 
                                min_node_size = 0, min_error_reduction=-1)

Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
Subtree, depth = 2 (9122 data points).
Split on feature grade.B. (8074, 1048)
Subtree, depth = 3 (8074 data points).
Split on feature grade.C. (5884, 2190)
Subtree, depth = 4 (5884 data points).
Split on feature grade.D. (3826, 2058)
Subtree, depth = 5 (3826 data points).
Split on feature grade.E. (1693, 2133)
Subtree, depth = 6 (1693 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (2133 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 5 (2058 data points).
Split on feature grade.E. (2058, 0)
Subtree, depth = 6 (2058 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (2190 data points).
Split on feature grade.D. (2190, 0)
Subtree, depth = 5 (2190 data points).
Split on feature grade.E. (2190, 0)
Subtree, depth = 6 (2190 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (1048 data points).
Split on feature emp_length.5 years. (969, 79)
Subtree, depth = 4 (969 data points).
Split on feature grade.C. (969, 0)
Subtree, depth = 5 (969 data points).
Split on feature grade.D. (969, 0)
Subtree, depth = 6 (969 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (79 data points).
Split on feature home_ownership.MORTGAGE. (34, 45)
Subtree, depth = 5 (34 data points).
Split on feature grade.C. (34, 0)
Subtree, depth = 6 (34 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (45 data points).
Split on feature grade.C. (45, 0)
Subtree, depth = 6 (45 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 2 (101 data points).
Split on feature emp_length.n/a. (96, 5)
Subtree, depth = 3 (96 data points).
Split on feature emp_length.< 1 year. (85, 11)
Subtree, depth = 4 (85 data points).
Split on feature grade.B. (85, 0)
Subtree, depth = 5 (85 data points).
Split on feature grade.C. (85, 0)
Subtree, depth = 6 (85 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (11 data points).
Split on feature grade.B. (11, 0)
Subtree, depth = 5 (11 data points).
Split on feature grade.C. (11, 0)
Subtree, depth = 6 (11 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (5 data points).
Split on feature grade.B. (5, 0)
Subtree, depth = 4 (5 data points).
Split on feature grade.C. (5, 0)
Subtree, depth = 5 (5 data points).
Split on feature grade.D. (5, 0)
Subtree, depth = 6 (5 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
Subtree, depth = 6 (20638 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (96 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 5 (932 data points).
Split on feature grade.A. (702, 230)
Subtree, depth = 6 (702 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (230 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 4 (358 data points).
Split on feature emp_length.8 years. (347, 11)
Subtree, depth = 5 (347 data points).
Split on feature grade.A. (347, 0)
Subtree, depth = 6 (347 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (11 data points).
Split on feature home_ownership.OWN. (9, 2)
Subtree, depth = 6 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (1276 data points).
Split on feature grade.A. (1276, 0)
Subtree, depth = 4 (1276 data points).
Split on feature grade.B. (1276, 0)
Subtree, depth = 5 (1276 data points).
Split on feature grade.C. (1276, 0)
Subtree, depth = 6 (1276 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 2 (4701 data points).
Split on feature grade.A. (4701, 0)
Subtree, depth = 3 (4701 data points).
Split on feature grade.B. (4701, 0)
Subtree, depth = 4 (4701 data points).
Split on feature grade.C. (4701, 0)
Subtree, depth = 5 (4701 data points).
Split on feature grade.E. (4701, 0)
Subtree, depth = 6 (4701 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (0 data points).
Stopping condition 1 reached. All data points have the same target value.

Making predictions

In [23]:
def classify(tree, x, annotate = False):   
    if tree['is_leaf']:
        if annotate: 
            print "At leaf, predicting %s" % tree['prediction']
        return tree['prediction'] 
        split_feature_value = x[tree['splitting_feature']]
        if annotate: 
            print "Split on %s = %s" % (tree['splitting_feature'], split_feature_value)
        if split_feature_value == 0:
            return classify(tree['left'], x, annotate)
            return classify(tree['right'], x, annotate)

In [24]:

{'emp_length.1 year': 0,
 'emp_length.10+ years': 0,
 'emp_length.2 years': 1,
 'emp_length.3 years': 0,
 'emp_length.4 years': 0,
 'emp_length.5 years': 0,
 'emp_length.6 years': 0,
 'emp_length.7 years': 0,
 'emp_length.8 years': 0,
 'emp_length.9 years': 0,
 'emp_length.< 1 year': 0,
 'emp_length.n/a': 0,
 'grade.A': 0,
 'grade.B': 0,
 'grade.C': 0,
 'grade.D': 1,
 'grade.E': 0,
 'grade.F': 0,
 'grade.G': 0,
 'home_ownership.MORTGAGE': 0,
 'home_ownership.OTHER': 0,
 'home_ownership.OWN': 0,
 'home_ownership.RENT': 1,
 'safe_loans': -1,
 'term. 36 months': 0,
 'term. 60 months': 1}

In [25]:
print 'Predicted class: %s ' % classify(my_decision_tree_new, validation_set[0])

Predicted class: -1 

In [26]:
classify(my_decision_tree_new, validation_set[0], annotate = True)

Split on term. 36 months = 0
Split on grade.A = 0
At leaf, predicting -1

In [27]:
classify(my_decision_tree_old, validation_set[0], annotate = True)

Split on term. 36 months = 0
Split on grade.A = 0
Split on grade.B = 0
Split on grade.C = 0
Split on grade.D = 1
Split on grade.E = 0
At leaf, predicting -1

Evaluating the model

In [28]:
def evaluate_classification_error(tree, data):
    prediction = data.apply(lambda x: classify(tree, x))
    mistakes = (prediction!=data['safe_loans']).sum()
    return mistakes/float(len(data))

In [29]:
evaluate_classification_error(my_decision_tree_new, validation_set)


In [30]:
evaluate_classification_error(my_decision_tree_old, validation_set)


Exploring the effect of max_depth

In [31]:
model_1 = decision_tree_create(train_data, features, 'safe_loans', max_depth = 2,
                               min_node_size = 0, min_error_reduction=-1)
model_2 = decision_tree_create(train_data, features, 'safe_loans', max_depth = 6,
                               min_node_size = 0, min_error_reduction=-1)
model_3 = decision_tree_create(train_data, features, 'safe_loans', max_depth = 14,
                               min_node_size = 0, min_error_reduction=-1)

Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
Subtree, depth = 2 (9122 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 2 (101 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
Subtree, depth = 2 (23300 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 2 (4701 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
Subtree, depth = 2 (9122 data points).
Split on feature grade.B. (8074, 1048)
Subtree, depth = 3 (8074 data points).
Split on feature grade.C. (5884, 2190)
Subtree, depth = 4 (5884 data points).
Split on feature grade.D. (3826, 2058)
Subtree, depth = 5 (3826 data points).
Split on feature grade.E. (1693, 2133)
Subtree, depth = 6 (1693 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (2133 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 5 (2058 data points).
Split on feature grade.E. (2058, 0)
Subtree, depth = 6 (2058 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (2190 data points).
Split on feature grade.D. (2190, 0)
Subtree, depth = 5 (2190 data points).
Split on feature grade.E. (2190, 0)
Subtree, depth = 6 (2190 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (1048 data points).
Split on feature emp_length.5 years. (969, 79)
Subtree, depth = 4 (969 data points).
Split on feature grade.C. (969, 0)
Subtree, depth = 5 (969 data points).
Split on feature grade.D. (969, 0)
Subtree, depth = 6 (969 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (79 data points).
Split on feature home_ownership.MORTGAGE. (34, 45)
Subtree, depth = 5 (34 data points).
Split on feature grade.C. (34, 0)
Subtree, depth = 6 (34 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (45 data points).
Split on feature grade.C. (45, 0)
Subtree, depth = 6 (45 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 2 (101 data points).
Split on feature emp_length.n/a. (96, 5)
Subtree, depth = 3 (96 data points).
Split on feature emp_length.< 1 year. (85, 11)
Subtree, depth = 4 (85 data points).
Split on feature grade.B. (85, 0)
Subtree, depth = 5 (85 data points).
Split on feature grade.C. (85, 0)
Subtree, depth = 6 (85 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (11 data points).
Split on feature grade.B. (11, 0)
Subtree, depth = 5 (11 data points).
Split on feature grade.C. (11, 0)
Subtree, depth = 6 (11 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (5 data points).
Split on feature grade.B. (5, 0)
Subtree, depth = 4 (5 data points).
Split on feature grade.C. (5, 0)
Subtree, depth = 5 (5 data points).
Split on feature grade.D. (5, 0)
Subtree, depth = 6 (5 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
Subtree, depth = 6 (20638 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (96 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 5 (932 data points).
Split on feature grade.A. (702, 230)
Subtree, depth = 6 (702 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (230 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 4 (358 data points).
Split on feature emp_length.8 years. (347, 11)
Subtree, depth = 5 (347 data points).
Split on feature grade.A. (347, 0)
Subtree, depth = 6 (347 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (11 data points).
Split on feature home_ownership.OWN. (9, 2)
Subtree, depth = 6 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (1276 data points).
Split on feature grade.A. (1276, 0)
Subtree, depth = 4 (1276 data points).
Split on feature grade.B. (1276, 0)
Subtree, depth = 5 (1276 data points).
Split on feature grade.C. (1276, 0)
Subtree, depth = 6 (1276 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 2 (4701 data points).
Split on feature grade.A. (4701, 0)
Subtree, depth = 3 (4701 data points).
Split on feature grade.B. (4701, 0)
Subtree, depth = 4 (4701 data points).
Split on feature grade.C. (4701, 0)
Subtree, depth = 5 (4701 data points).
Split on feature grade.E. (4701, 0)
Subtree, depth = 6 (4701 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
Subtree, depth = 2 (9122 data points).
Split on feature grade.B. (8074, 1048)
Subtree, depth = 3 (8074 data points).
Split on feature grade.C. (5884, 2190)
Subtree, depth = 4 (5884 data points).
Split on feature grade.D. (3826, 2058)
Subtree, depth = 5 (3826 data points).
Split on feature grade.E. (1693, 2133)
Subtree, depth = 6 (1693 data points).
Split on feature home_ownership.OTHER. (1692, 1)
Subtree, depth = 7 (1692 data points).
Split on feature grade.F. (339, 1353)
Subtree, depth = 8 (339 data points).
Split on feature grade.G. (0, 339)
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (339 data points).
Split on feature term. 60 months. (0, 339)
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (339 data points).
Split on feature home_ownership.MORTGAGE. (175, 164)
Subtree, depth = 11 (175 data points).
Split on feature home_ownership.OWN. (142, 33)
Subtree, depth = 12 (142 data points).
Split on feature emp_length.6 years. (133, 9)
Subtree, depth = 13 (133 data points).
Split on feature home_ownership.RENT. (0, 133)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (133 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (9 data points).
Split on feature home_ownership.RENT. (0, 9)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 12 (33 data points).
Split on feature emp_length.n/a. (31, 2)
Subtree, depth = 13 (31 data points).
Split on feature emp_length.2 years. (30, 1)
Subtree, depth = 14 (30 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (164 data points).
Split on feature emp_length.2 years. (159, 5)
Subtree, depth = 12 (159 data points).
Split on feature emp_length.3 years. (148, 11)
Subtree, depth = 13 (148 data points).
Split on feature home_ownership.OWN. (148, 0)
Subtree, depth = 14 (148 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (11 data points).
Split on feature home_ownership.OWN. (11, 0)
Subtree, depth = 14 (11 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (5 data points).
Split on feature home_ownership.OWN. (5, 0)
Subtree, depth = 13 (5 data points).
Split on feature home_ownership.RENT. (5, 0)
Subtree, depth = 14 (5 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (1353 data points).
Split on feature grade.G. (1353, 0)
Subtree, depth = 9 (1353 data points).
Split on feature term. 60 months. (0, 1353)
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (1353 data points).
Split on feature home_ownership.MORTGAGE. (710, 643)
Subtree, depth = 11 (710 data points).
Split on feature home_ownership.OWN. (602, 108)
Subtree, depth = 12 (602 data points).
Split on feature home_ownership.RENT. (0, 602)
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (602 data points).
Split on feature emp_length.1 year. (565, 37)
Subtree, depth = 14 (565 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (37 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 12 (108 data points).
Split on feature home_ownership.RENT. (108, 0)
Subtree, depth = 13 (108 data points).
Split on feature emp_length.1 year. (100, 8)
Subtree, depth = 14 (100 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (8 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (643 data points).
Split on feature home_ownership.OWN. (643, 0)
Subtree, depth = 12 (643 data points).
Split on feature home_ownership.RENT. (643, 0)
Subtree, depth = 13 (643 data points).
Split on feature emp_length.1 year. (602, 41)
Subtree, depth = 14 (602 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (41 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 6 (2133 data points).
Split on feature grade.F. (2133, 0)
Subtree, depth = 7 (2133 data points).
Split on feature grade.G. (2133, 0)
Subtree, depth = 8 (2133 data points).
Split on feature term. 60 months. (0, 2133)
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (2133 data points).
Split on feature home_ownership.MORTGAGE. (1045, 1088)
Subtree, depth = 10 (1045 data points).
Split on feature home_ownership.OTHER. (1044, 1)
Subtree, depth = 11 (1044 data points).
Split on feature home_ownership.OWN. (879, 165)
Subtree, depth = 12 (879 data points).
Split on feature home_ownership.RENT. (0, 879)
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (879 data points).
Split on feature emp_length.1 year. (809, 70)
Subtree, depth = 14 (809 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (70 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 12 (165 data points).
Split on feature emp_length.9 years. (157, 8)
Subtree, depth = 13 (157 data points).
Split on feature home_ownership.RENT. (157, 0)
Subtree, depth = 14 (157 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (8 data points).
Split on feature home_ownership.RENT. (8, 0)
Subtree, depth = 14 (8 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (1088 data points).
Split on feature home_ownership.OTHER. (1088, 0)
Subtree, depth = 11 (1088 data points).
Split on feature home_ownership.OWN. (1088, 0)
Subtree, depth = 12 (1088 data points).
Split on feature home_ownership.RENT. (1088, 0)
Subtree, depth = 13 (1088 data points).
Split on feature emp_length.1 year. (1035, 53)
Subtree, depth = 14 (1035 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (53 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (2058 data points).
Split on feature grade.E. (2058, 0)
Subtree, depth = 6 (2058 data points).
Split on feature grade.F. (2058, 0)
Subtree, depth = 7 (2058 data points).
Split on feature grade.G. (2058, 0)
Subtree, depth = 8 (2058 data points).
Split on feature term. 60 months. (0, 2058)
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (2058 data points).
Split on feature home_ownership.MORTGAGE. (923, 1135)
Subtree, depth = 10 (923 data points).
Split on feature home_ownership.OTHER. (922, 1)
Subtree, depth = 11 (922 data points).
Split on feature home_ownership.OWN. (762, 160)
Subtree, depth = 12 (762 data points).
Split on feature home_ownership.RENT. (0, 762)
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (762 data points).
Split on feature emp_length.1 year. (704, 58)
Subtree, depth = 14 (704 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (58 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 12 (160 data points).
Split on feature home_ownership.RENT. (160, 0)
Subtree, depth = 13 (160 data points).
Split on feature emp_length.1 year. (154, 6)
Subtree, depth = 14 (154 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (6 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (1135 data points).
Split on feature home_ownership.OTHER. (1135, 0)
Subtree, depth = 11 (1135 data points).
Split on feature home_ownership.OWN. (1135, 0)
Subtree, depth = 12 (1135 data points).
Split on feature home_ownership.RENT. (1135, 0)
Subtree, depth = 13 (1135 data points).
Split on feature emp_length.1 year. (1096, 39)
Subtree, depth = 14 (1096 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (39 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (2190 data points).
Split on feature grade.D. (2190, 0)
Subtree, depth = 5 (2190 data points).
Split on feature grade.E. (2190, 0)
Subtree, depth = 6 (2190 data points).
Split on feature grade.F. (2190, 0)
Subtree, depth = 7 (2190 data points).
Split on feature grade.G. (2190, 0)
Subtree, depth = 8 (2190 data points).
Split on feature term. 60 months. (0, 2190)
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (2190 data points).
Split on feature home_ownership.MORTGAGE. (803, 1387)
Subtree, depth = 10 (803 data points).
Split on feature emp_length.4 years. (746, 57)
Subtree, depth = 11 (746 data points).
Split on feature home_ownership.OTHER. (746, 0)
Subtree, depth = 12 (746 data points).
Split on feature home_ownership.OWN. (598, 148)
Subtree, depth = 13 (598 data points).
Split on feature home_ownership.RENT. (0, 598)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (598 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (148 data points).
Split on feature emp_length.< 1 year. (137, 11)
Subtree, depth = 14 (137 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (11 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (57 data points).
Split on feature home_ownership.OTHER. (57, 0)
Subtree, depth = 12 (57 data points).
Split on feature home_ownership.OWN. (49, 8)
Subtree, depth = 13 (49 data points).
Split on feature home_ownership.RENT. (0, 49)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (49 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (8 data points).
Split on feature home_ownership.RENT. (8, 0)
Subtree, depth = 14 (8 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (1387 data points).
Split on feature emp_length.6 years. (1313, 74)
Subtree, depth = 11 (1313 data points).
Split on feature home_ownership.OTHER. (1313, 0)
Subtree, depth = 12 (1313 data points).
Split on feature home_ownership.OWN. (1313, 0)
Subtree, depth = 13 (1313 data points).
Split on feature home_ownership.RENT. (1313, 0)
Subtree, depth = 14 (1313 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (74 data points).
Split on feature home_ownership.OTHER. (74, 0)
Subtree, depth = 12 (74 data points).
Split on feature home_ownership.OWN. (74, 0)
Subtree, depth = 13 (74 data points).
Split on feature home_ownership.RENT. (74, 0)
Subtree, depth = 14 (74 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (1048 data points).
Split on feature emp_length.5 years. (969, 79)
Subtree, depth = 4 (969 data points).
Split on feature grade.C. (969, 0)
Subtree, depth = 5 (969 data points).
Split on feature grade.D. (969, 0)
Subtree, depth = 6 (969 data points).
Split on feature grade.E. (969, 0)
Subtree, depth = 7 (969 data points).
Split on feature grade.F. (969, 0)
Subtree, depth = 8 (969 data points).
Split on feature grade.G. (969, 0)
Subtree, depth = 9 (969 data points).
Split on feature term. 60 months. (0, 969)
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (969 data points).
Split on feature home_ownership.MORTGAGE. (367, 602)
Subtree, depth = 11 (367 data points).
Split on feature home_ownership.OTHER. (367, 0)
Subtree, depth = 12 (367 data points).
Split on feature home_ownership.OWN. (291, 76)
Subtree, depth = 13 (291 data points).
Split on feature home_ownership.RENT. (0, 291)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (291 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (76 data points).
Split on feature emp_length.9 years. (71, 5)
Subtree, depth = 14 (71 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (5 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (602 data points).
Split on feature emp_length.9 years. (580, 22)
Subtree, depth = 12 (580 data points).
Split on feature emp_length.3 years. (545, 35)
Subtree, depth = 13 (545 data points).
Split on feature emp_length.4 years. (506, 39)
Subtree, depth = 14 (506 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (39 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (35 data points).
Split on feature home_ownership.OTHER. (35, 0)
Subtree, depth = 14 (35 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (22 data points).
Split on feature home_ownership.OTHER. (22, 0)
Subtree, depth = 13 (22 data points).
Split on feature home_ownership.OWN. (22, 0)
Subtree, depth = 14 (22 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (79 data points).
Split on feature home_ownership.MORTGAGE. (34, 45)
Subtree, depth = 5 (34 data points).
Split on feature grade.C. (34, 0)
Subtree, depth = 6 (34 data points).
Split on feature grade.D. (34, 0)
Subtree, depth = 7 (34 data points).
Split on feature grade.E. (34, 0)
Subtree, depth = 8 (34 data points).
Split on feature grade.F. (34, 0)
Subtree, depth = 9 (34 data points).
Split on feature grade.G. (34, 0)
Subtree, depth = 10 (34 data points).
Split on feature term. 60 months. (0, 34)
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (34 data points).
Split on feature home_ownership.OTHER. (34, 0)
Subtree, depth = 12 (34 data points).
Split on feature home_ownership.OWN. (25, 9)
Subtree, depth = 13 (25 data points).
Split on feature home_ownership.RENT. (0, 25)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (25 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (9 data points).
Split on feature home_ownership.RENT. (9, 0)
Subtree, depth = 14 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (45 data points).
Split on feature grade.C. (45, 0)
Subtree, depth = 6 (45 data points).
Split on feature grade.D. (45, 0)
Subtree, depth = 7 (45 data points).
Split on feature grade.E. (45, 0)
Subtree, depth = 8 (45 data points).
Split on feature grade.F. (45, 0)
Subtree, depth = 9 (45 data points).
Split on feature grade.G. (45, 0)
Subtree, depth = 10 (45 data points).
Split on feature term. 60 months. (0, 45)
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (45 data points).
Split on feature home_ownership.OTHER. (45, 0)
Subtree, depth = 12 (45 data points).
Split on feature home_ownership.OWN. (45, 0)
Subtree, depth = 13 (45 data points).
Split on feature home_ownership.RENT. (45, 0)
Subtree, depth = 14 (45 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 2 (101 data points).
Split on feature emp_length.n/a. (96, 5)
Subtree, depth = 3 (96 data points).
Split on feature emp_length.< 1 year. (85, 11)
Subtree, depth = 4 (85 data points).
Split on feature grade.B. (85, 0)
Subtree, depth = 5 (85 data points).
Split on feature grade.C. (85, 0)
Subtree, depth = 6 (85 data points).
Split on feature grade.D. (85, 0)
Subtree, depth = 7 (85 data points).
Split on feature grade.E. (85, 0)
Subtree, depth = 8 (85 data points).
Split on feature grade.F. (85, 0)
Subtree, depth = 9 (85 data points).
Split on feature grade.G. (85, 0)
Subtree, depth = 10 (85 data points).
Split on feature term. 60 months. (0, 85)
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (85 data points).
Split on feature home_ownership.MORTGAGE. (26, 59)
Subtree, depth = 12 (26 data points).
Split on feature emp_length.3 years. (24, 2)
Subtree, depth = 13 (24 data points).
Split on feature home_ownership.OTHER. (24, 0)
Subtree, depth = 14 (24 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (59 data points).
Split on feature home_ownership.OTHER. (59, 0)
Subtree, depth = 13 (59 data points).
Split on feature home_ownership.OWN. (59, 0)
Subtree, depth = 14 (59 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (11 data points).
Split on feature grade.B. (11, 0)
Subtree, depth = 5 (11 data points).
Split on feature grade.C. (11, 0)
Subtree, depth = 6 (11 data points).
Split on feature grade.D. (11, 0)
Subtree, depth = 7 (11 data points).
Split on feature grade.E. (11, 0)
Subtree, depth = 8 (11 data points).
Split on feature grade.F. (11, 0)
Subtree, depth = 9 (11 data points).
Split on feature grade.G. (11, 0)
Subtree, depth = 10 (11 data points).
Split on feature term. 60 months. (0, 11)
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (11 data points).
Split on feature home_ownership.MORTGAGE. (8, 3)
Subtree, depth = 12 (8 data points).
Split on feature home_ownership.OTHER. (8, 0)
Subtree, depth = 13 (8 data points).
Split on feature home_ownership.OWN. (6, 2)
Subtree, depth = 14 (6 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (2 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (3 data points).
Split on feature home_ownership.OTHER. (3, 0)
Subtree, depth = 13 (3 data points).
Split on feature home_ownership.OWN. (3, 0)
Subtree, depth = 14 (3 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (5 data points).
Split on feature grade.B. (5, 0)
Subtree, depth = 4 (5 data points).
Split on feature grade.C. (5, 0)
Subtree, depth = 5 (5 data points).
Split on feature grade.D. (5, 0)
Subtree, depth = 6 (5 data points).
Split on feature grade.E. (5, 0)
Subtree, depth = 7 (5 data points).
Split on feature grade.F. (5, 0)
Subtree, depth = 8 (5 data points).
Split on feature grade.G. (5, 0)
Subtree, depth = 9 (5 data points).
Split on feature term. 60 months. (0, 5)
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (5 data points).
Split on feature home_ownership.MORTGAGE. (2, 3)
Subtree, depth = 11 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (3 data points).
Split on feature home_ownership.OTHER. (3, 0)
Subtree, depth = 12 (3 data points).
Split on feature home_ownership.OWN. (3, 0)
Subtree, depth = 13 (3 data points).
Split on feature home_ownership.RENT. (3, 0)
Subtree, depth = 14 (3 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
Subtree, depth = 6 (20638 data points).
Split on feature grade.A. (15839, 4799)
Subtree, depth = 7 (15839 data points).
Split on feature home_ownership.OTHER. (15811, 28)
Subtree, depth = 8 (15811 data points).
Split on feature grade.B. (6894, 8917)
Subtree, depth = 9 (6894 data points).
Split on feature home_ownership.MORTGAGE. (4102, 2792)
Subtree, depth = 10 (4102 data points).
Split on feature emp_length.4 years. (3768, 334)
Subtree, depth = 11 (3768 data points).
Split on feature emp_length.9 years. (3639, 129)
Subtree, depth = 12 (3639 data points).
Split on feature emp_length.2 years. (3123, 516)
Subtree, depth = 13 (3123 data points).
Split on feature grade.C. (0, 3123)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (3123 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (516 data points).
Split on feature home_ownership.OWN. (458, 58)
Subtree, depth = 14 (458 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (58 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 12 (129 data points).
Split on feature home_ownership.OWN. (113, 16)
Subtree, depth = 13 (113 data points).
Split on feature grade.C. (0, 113)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (113 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (16 data points).
Split on feature grade.C. (0, 16)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (16 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 11 (334 data points).
Split on feature grade.C. (0, 334)
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (334 data points).
Split on feature term. 60 months. (334, 0)
Subtree, depth = 13 (334 data points).
Split on feature home_ownership.OWN. (286, 48)
Subtree, depth = 14 (286 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (48 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (2792 data points).
Split on feature emp_length.2 years. (2562, 230)
Subtree, depth = 11 (2562 data points).
Split on feature emp_length.5 years. (2335, 227)
Subtree, depth = 12 (2335 data points).
Split on feature grade.C. (0, 2335)
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (2335 data points).
Split on feature term. 60 months. (2335, 0)
Subtree, depth = 14 (2335 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (227 data points).
Split on feature grade.C. (0, 227)
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (227 data points).
Split on feature term. 60 months. (227, 0)
Subtree, depth = 14 (227 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (230 data points).
Split on feature grade.C. (0, 230)
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (230 data points).
Split on feature term. 60 months. (230, 0)
Subtree, depth = 13 (230 data points).
Split on feature home_ownership.OWN. (230, 0)
Subtree, depth = 14 (230 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (8917 data points).
Split on feature grade.C. (8917, 0)
Subtree, depth = 10 (8917 data points).
Split on feature term. 60 months. (8917, 0)
Subtree, depth = 11 (8917 data points).
Split on feature home_ownership.MORTGAGE. (4748, 4169)
Subtree, depth = 12 (4748 data points).
Split on feature home_ownership.OWN. (4089, 659)
Subtree, depth = 13 (4089 data points).
Split on feature home_ownership.RENT. (0, 4089)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (4089 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (659 data points).
Split on feature home_ownership.RENT. (659, 0)
Subtree, depth = 14 (659 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (4169 data points).
Split on feature home_ownership.OWN. (4169, 0)
Subtree, depth = 13 (4169 data points).
Split on feature home_ownership.RENT. (4169, 0)
Subtree, depth = 14 (4169 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (28 data points).
Split on feature grade.B. (11, 17)
Subtree, depth = 9 (11 data points).
Split on feature emp_length.6 years. (10, 1)
Subtree, depth = 10 (10 data points).
Split on feature grade.C. (0, 10)
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (10 data points).
Split on feature term. 60 months. (10, 0)
Subtree, depth = 12 (10 data points).
Split on feature home_ownership.MORTGAGE. (10, 0)
Subtree, depth = 13 (10 data points).
Split on feature home_ownership.OWN. (10, 0)
Subtree, depth = 14 (10 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (17 data points).
Split on feature emp_length.1 year. (16, 1)
Subtree, depth = 10 (16 data points).
Split on feature emp_length.3 years. (15, 1)
Subtree, depth = 11 (15 data points).
Split on feature emp_length.4 years. (14, 1)
Subtree, depth = 12 (14 data points).
Split on feature emp_length.< 1 year. (13, 1)
Subtree, depth = 13 (13 data points).
Split on feature grade.C. (13, 0)
Subtree, depth = 14 (13 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (4799 data points).
Split on feature grade.B. (4799, 0)
Subtree, depth = 8 (4799 data points).
Split on feature grade.C. (4799, 0)
Subtree, depth = 9 (4799 data points).
Split on feature term. 60 months. (4799, 0)
Subtree, depth = 10 (4799 data points).
Split on feature home_ownership.MORTGAGE. (2163, 2636)
Subtree, depth = 11 (2163 data points).
Split on feature home_ownership.OTHER. (2154, 9)
Subtree, depth = 12 (2154 data points).
Split on feature home_ownership.OWN. (1753, 401)
Subtree, depth = 13 (1753 data points).
Split on feature home_ownership.RENT. (0, 1753)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (1753 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (401 data points).
Split on feature home_ownership.RENT. (401, 0)
Subtree, depth = 14 (401 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (9 data points).
Split on feature emp_length.3 years. (8, 1)
Subtree, depth = 13 (8 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (2636 data points).
Split on feature home_ownership.OTHER. (2636, 0)
Subtree, depth = 12 (2636 data points).
Split on feature home_ownership.OWN. (2636, 0)
Subtree, depth = 13 (2636 data points).
Split on feature home_ownership.RENT. (2636, 0)
Subtree, depth = 14 (2636 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 6 (96 data points).
Split on feature grade.A. (96, 0)
Subtree, depth = 7 (96 data points).
Split on feature grade.B. (96, 0)
Subtree, depth = 8 (96 data points).
Split on feature grade.C. (96, 0)
Subtree, depth = 9 (96 data points).
Split on feature term. 60 months. (96, 0)
Subtree, depth = 10 (96 data points).
Split on feature home_ownership.MORTGAGE. (44, 52)
Subtree, depth = 11 (44 data points).
Split on feature emp_length.3 years. (43, 1)
Subtree, depth = 12 (43 data points).
Split on feature emp_length.7 years. (42, 1)
Subtree, depth = 13 (42 data points).
Split on feature emp_length.8 years. (41, 1)
Subtree, depth = 14 (41 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (52 data points).
Split on feature emp_length.2 years. (47, 5)
Subtree, depth = 12 (47 data points).
Split on feature home_ownership.OTHER. (47, 0)
Subtree, depth = 13 (47 data points).
Split on feature home_ownership.OWN. (47, 0)
Subtree, depth = 14 (47 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (5 data points).
Split on feature home_ownership.OTHER. (5, 0)
Subtree, depth = 13 (5 data points).
Split on feature home_ownership.OWN. (5, 0)
Subtree, depth = 14 (5 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (932 data points).
Split on feature grade.A. (702, 230)
Subtree, depth = 6 (702 data points).
Split on feature home_ownership.OTHER. (701, 1)
Subtree, depth = 7 (701 data points).
Split on feature grade.B. (317, 384)
Subtree, depth = 8 (317 data points).
Split on feature grade.C. (1, 316)
Subtree, depth = 9 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (316 data points).
Split on feature grade.G. (316, 0)
Subtree, depth = 10 (316 data points).
Split on feature term. 60 months. (316, 0)
Subtree, depth = 11 (316 data points).
Split on feature home_ownership.MORTGAGE. (189, 127)
Subtree, depth = 12 (189 data points).
Split on feature home_ownership.OWN. (139, 50)
Subtree, depth = 13 (139 data points).
Split on feature home_ownership.RENT. (0, 139)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (139 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (50 data points).
Split on feature home_ownership.RENT. (50, 0)
Subtree, depth = 14 (50 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (127 data points).
Split on feature home_ownership.OWN. (127, 0)
Subtree, depth = 13 (127 data points).
Split on feature home_ownership.RENT. (127, 0)
Subtree, depth = 14 (127 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (384 data points).
Split on feature grade.C. (384, 0)
Subtree, depth = 9 (384 data points).
Split on feature grade.G. (384, 0)
Subtree, depth = 10 (384 data points).
Split on feature term. 60 months. (384, 0)
Subtree, depth = 11 (384 data points).
Split on feature home_ownership.MORTGAGE. (210, 174)
Subtree, depth = 12 (210 data points).
Split on feature home_ownership.OWN. (148, 62)
Subtree, depth = 13 (148 data points).
Split on feature home_ownership.RENT. (0, 148)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (148 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (62 data points).
Split on feature home_ownership.RENT. (62, 0)
Subtree, depth = 14 (62 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (174 data points).
Split on feature home_ownership.OWN. (174, 0)
Subtree, depth = 13 (174 data points).
Split on feature home_ownership.RENT. (174, 0)
Subtree, depth = 14 (174 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 6 (230 data points).
Split on feature grade.B. (230, 0)
Subtree, depth = 7 (230 data points).
Split on feature grade.C. (230, 0)
Subtree, depth = 8 (230 data points).
Split on feature grade.G. (230, 0)
Subtree, depth = 9 (230 data points).
Split on feature term. 60 months. (230, 0)
Subtree, depth = 10 (230 data points).
Split on feature home_ownership.MORTGAGE. (119, 111)
Subtree, depth = 11 (119 data points).
Split on feature home_ownership.OTHER. (119, 0)
Subtree, depth = 12 (119 data points).
Split on feature home_ownership.OWN. (71, 48)
Subtree, depth = 13 (71 data points).
Split on feature home_ownership.RENT. (0, 71)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (71 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (48 data points).
Split on feature home_ownership.RENT. (48, 0)
Subtree, depth = 14 (48 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (111 data points).
Split on feature home_ownership.OTHER. (111, 0)
Subtree, depth = 12 (111 data points).
Split on feature home_ownership.OWN. (111, 0)
Subtree, depth = 13 (111 data points).
Split on feature home_ownership.RENT. (111, 0)
Subtree, depth = 14 (111 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (358 data points).
Split on feature emp_length.8 years. (347, 11)
Subtree, depth = 5 (347 data points).
Split on feature grade.A. (347, 0)
Subtree, depth = 6 (347 data points).
Split on feature grade.B. (347, 0)
Subtree, depth = 7 (347 data points).
Split on feature grade.C. (347, 0)
Subtree, depth = 8 (347 data points).
Split on feature grade.G. (347, 0)
Subtree, depth = 9 (347 data points).
Split on feature term. 60 months. (347, 0)
Subtree, depth = 10 (347 data points).
Split on feature home_ownership.MORTGAGE. (237, 110)
Subtree, depth = 11 (237 data points).
Split on feature home_ownership.OTHER. (235, 2)
Subtree, depth = 12 (235 data points).
Split on feature home_ownership.OWN. (203, 32)
Subtree, depth = 13 (203 data points).
Split on feature home_ownership.RENT. (0, 203)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (203 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (32 data points).
Split on feature home_ownership.RENT. (32, 0)
Subtree, depth = 14 (32 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (110 data points).
Split on feature home_ownership.OTHER. (110, 0)
Subtree, depth = 12 (110 data points).
Split on feature home_ownership.OWN. (110, 0)
Subtree, depth = 13 (110 data points).
Split on feature home_ownership.RENT. (110, 0)
Subtree, depth = 14 (110 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (11 data points).
Split on feature home_ownership.OWN. (9, 2)
Subtree, depth = 6 (9 data points).
Split on feature grade.A. (9, 0)
Subtree, depth = 7 (9 data points).
Split on feature grade.B. (9, 0)
Subtree, depth = 8 (9 data points).
Split on feature grade.C. (9, 0)
Subtree, depth = 9 (9 data points).
Split on feature grade.G. (9, 0)
Subtree, depth = 10 (9 data points).
Split on feature term. 60 months. (9, 0)
Subtree, depth = 11 (9 data points).
Split on feature home_ownership.MORTGAGE. (6, 3)
Subtree, depth = 12 (6 data points).
Split on feature home_ownership.OTHER. (6, 0)
Subtree, depth = 13 (6 data points).
Split on feature home_ownership.RENT. (0, 6)
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 14 (6 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (3 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 6 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (1276 data points).
Split on feature grade.A. (1276, 0)
Subtree, depth = 4 (1276 data points).
Split on feature grade.B. (1276, 0)
Subtree, depth = 5 (1276 data points).
Split on feature grade.C. (1276, 0)
Subtree, depth = 6 (1276 data points).
Split on feature grade.F. (1276, 0)
Subtree, depth = 7 (1276 data points).
Split on feature grade.G. (1276, 0)
Subtree, depth = 8 (1276 data points).
Split on feature term. 60 months. (1276, 0)
Subtree, depth = 9 (1276 data points).
Split on feature home_ownership.MORTGAGE. (855, 421)
Subtree, depth = 10 (855 data points).
Split on feature home_ownership.OTHER. (849, 6)
Subtree, depth = 11 (849 data points).
Split on feature home_ownership.OWN. (737, 112)
Subtree, depth = 12 (737 data points).
Split on feature home_ownership.RENT. (0, 737)
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (737 data points).
Split on feature emp_length.1 year. (670, 67)
Subtree, depth = 14 (670 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (67 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 12 (112 data points).
Split on feature home_ownership.RENT. (112, 0)
Subtree, depth = 13 (112 data points).
Split on feature emp_length.1 year. (102, 10)
Subtree, depth = 14 (102 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (10 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (6 data points).
Split on feature home_ownership.OWN. (6, 0)
Subtree, depth = 12 (6 data points).
Split on feature home_ownership.RENT. (6, 0)
Subtree, depth = 13 (6 data points).
Split on feature emp_length.1 year. (6, 0)
Subtree, depth = 14 (6 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (421 data points).
Split on feature emp_length.6 years. (408, 13)
Subtree, depth = 11 (408 data points).
Split on feature home_ownership.OTHER. (408, 0)
Subtree, depth = 12 (408 data points).
Split on feature home_ownership.OWN. (408, 0)
Subtree, depth = 13 (408 data points).
Split on feature home_ownership.RENT. (408, 0)
Subtree, depth = 14 (408 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (13 data points).
Split on feature home_ownership.OTHER. (13, 0)
Subtree, depth = 12 (13 data points).
Split on feature home_ownership.OWN. (13, 0)
Subtree, depth = 13 (13 data points).
Split on feature home_ownership.RENT. (13, 0)
Subtree, depth = 14 (13 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 2 (4701 data points).
Split on feature grade.A. (4701, 0)
Subtree, depth = 3 (4701 data points).
Split on feature grade.B. (4701, 0)
Subtree, depth = 4 (4701 data points).
Split on feature grade.C. (4701, 0)
Subtree, depth = 5 (4701 data points).
Split on feature grade.E. (4701, 0)
Subtree, depth = 6 (4701 data points).
Split on feature grade.F. (4701, 0)
Subtree, depth = 7 (4701 data points).
Split on feature grade.G. (4701, 0)
Subtree, depth = 8 (4701 data points).
Split on feature term. 60 months. (4701, 0)
Subtree, depth = 9 (4701 data points).
Split on feature home_ownership.MORTGAGE. (3047, 1654)
Subtree, depth = 10 (3047 data points).
Split on feature home_ownership.OTHER. (3037, 10)
Subtree, depth = 11 (3037 data points).
Split on feature home_ownership.OWN. (2633, 404)
Subtree, depth = 12 (2633 data points).
Split on feature home_ownership.RENT. (0, 2633)
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (2633 data points).
Split on feature emp_length.1 year. (2392, 241)
Subtree, depth = 14 (2392 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (241 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 12 (404 data points).
Split on feature home_ownership.RENT. (404, 0)
Subtree, depth = 13 (404 data points).
Split on feature emp_length.1 year. (374, 30)
Subtree, depth = 14 (374 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (30 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (10 data points).
Split on feature home_ownership.OWN. (10, 0)
Subtree, depth = 12 (10 data points).
Split on feature home_ownership.RENT. (10, 0)
Subtree, depth = 13 (10 data points).
Split on feature emp_length.1 year. (9, 1)
Subtree, depth = 14 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 10 (1654 data points).
Split on feature emp_length.5 years. (1532, 122)
Subtree, depth = 11 (1532 data points).
Split on feature emp_length.3 years. (1414, 118)
Subtree, depth = 12 (1414 data points).
Split on feature emp_length.9 years. (1351, 63)
Subtree, depth = 13 (1351 data points).
Split on feature home_ownership.OTHER. (1351, 0)
Subtree, depth = 14 (1351 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (63 data points).
Split on feature home_ownership.OTHER. (63, 0)
Subtree, depth = 14 (63 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (118 data points).
Split on feature home_ownership.OTHER. (118, 0)
Subtree, depth = 13 (118 data points).
Split on feature home_ownership.OWN. (118, 0)
Subtree, depth = 14 (118 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 11 (122 data points).
Split on feature home_ownership.OTHER. (122, 0)
Subtree, depth = 12 (122 data points).
Split on feature home_ownership.OWN. (122, 0)
Subtree, depth = 13 (122 data points).
Split on feature home_ownership.RENT. (122, 0)
Subtree, depth = 14 (122 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (0 data points).
Stopping condition 1 reached. All data points have the same target value.

Evaluating the models

In [32]:
print "Training data, classification error (model 1):", evaluate_classification_error(model_1, train_data)
print "Training data, classification error (model 2):", evaluate_classification_error(model_2, train_data)
print "Training data, classification error (model 3):", evaluate_classification_error(model_3, train_data)

Training data, classification error (model 1): 0.400037610144
Training data, classification error (model 2): 0.381850419084
Training data, classification error (model 3): 0.374462712229

In [34]:
print "Training data, classification error (model 1):", evaluate_classification_error(model_1, validation_set)
print "Training data, classification error (model 2):", evaluate_classification_error(model_2, validation_set)
print "Training data, classification error (model 3):", evaluate_classification_error(model_3, validation_set)

 Training data, classification error (model 1): 0.398104265403
Training data, classification error (model 2): 0.383778543731
Training data, classification error (model 3): 0.380008616975

Measuring the complexity of the tree

  complexity(T) = number of leaves in the tree T

In [35]:
def count_leaves(tree):
    if tree['is_leaf']:
        return 1
    return count_leaves(tree['left']) + count_leaves(tree['right'])

In [37]:
print count_leaves(model_1),count_leaves(model_2),count_leaves(model_3)

4 41 341

Exploring the effect of min_error

In [38]:
model_4 = decision_tree_create(train_data, features, 'safe_loans', max_depth = 6,
                               min_node_size = 0, min_error_reduction=-1)
model_5 = decision_tree_create(train_data, features, 'safe_loans', max_depth = 6,
                               min_node_size = 0, min_error_reduction=0)
model_6 = decision_tree_create(train_data, features, 'safe_loans', max_depth = 6,
                               min_node_size = 0, min_error_reduction=5)

Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
Subtree, depth = 2 (9122 data points).
Split on feature grade.B. (8074, 1048)
Subtree, depth = 3 (8074 data points).
Split on feature grade.C. (5884, 2190)
Subtree, depth = 4 (5884 data points).
Split on feature grade.D. (3826, 2058)
Subtree, depth = 5 (3826 data points).
Split on feature grade.E. (1693, 2133)
Subtree, depth = 6 (1693 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (2133 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 5 (2058 data points).
Split on feature grade.E. (2058, 0)
Subtree, depth = 6 (2058 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (2190 data points).
Split on feature grade.D. (2190, 0)
Subtree, depth = 5 (2190 data points).
Split on feature grade.E. (2190, 0)
Subtree, depth = 6 (2190 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (1048 data points).
Split on feature emp_length.5 years. (969, 79)
Subtree, depth = 4 (969 data points).
Split on feature grade.C. (969, 0)
Subtree, depth = 5 (969 data points).
Split on feature grade.D. (969, 0)
Subtree, depth = 6 (969 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (79 data points).
Split on feature home_ownership.MORTGAGE. (34, 45)
Subtree, depth = 5 (34 data points).
Split on feature grade.C. (34, 0)
Subtree, depth = 6 (34 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (45 data points).
Split on feature grade.C. (45, 0)
Subtree, depth = 6 (45 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 2 (101 data points).
Split on feature emp_length.n/a. (96, 5)
Subtree, depth = 3 (96 data points).
Split on feature emp_length.< 1 year. (85, 11)
Subtree, depth = 4 (85 data points).
Split on feature grade.B. (85, 0)
Subtree, depth = 5 (85 data points).
Split on feature grade.C. (85, 0)
Subtree, depth = 6 (85 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (11 data points).
Split on feature grade.B. (11, 0)
Subtree, depth = 5 (11 data points).
Split on feature grade.C. (11, 0)
Subtree, depth = 6 (11 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (5 data points).
Split on feature grade.B. (5, 0)
Subtree, depth = 4 (5 data points).
Split on feature grade.C. (5, 0)
Subtree, depth = 5 (5 data points).
Split on feature grade.D. (5, 0)
Subtree, depth = 6 (5 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
Subtree, depth = 6 (20638 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (96 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 5 (932 data points).
Split on feature grade.A. (702, 230)
Subtree, depth = 6 (702 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (230 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 4 (358 data points).
Split on feature emp_length.8 years. (347, 11)
Subtree, depth = 5 (347 data points).
Split on feature grade.A. (347, 0)
Subtree, depth = 6 (347 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (11 data points).
Split on feature home_ownership.OWN. (9, 2)
Subtree, depth = 6 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (1276 data points).
Split on feature grade.A. (1276, 0)
Subtree, depth = 4 (1276 data points).
Split on feature grade.B. (1276, 0)
Subtree, depth = 5 (1276 data points).
Split on feature grade.C. (1276, 0)
Subtree, depth = 6 (1276 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 2 (4701 data points).
Split on feature grade.A. (4701, 0)
Subtree, depth = 3 (4701 data points).
Split on feature grade.B. (4701, 0)
Subtree, depth = 4 (4701 data points).
Split on feature grade.C. (4701, 0)
Subtree, depth = 5 (4701 data points).
Split on feature grade.E. (4701, 0)
Subtree, depth = 6 (4701 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
Subtree, depth = 2 (9122 data points).
Early stopping condition 3 reached. Minimum error reduction.
Subtree, depth = 2 (101 data points).
Split on feature emp_length.n/a. (96, 5)
Subtree, depth = 3 (96 data points).
Split on feature emp_length.< 1 year. (85, 11)
Subtree, depth = 4 (85 data points).
Early stopping condition 3 reached. Minimum error reduction.
Subtree, depth = 4 (11 data points).
Early stopping condition 3 reached. Minimum error reduction.
Subtree, depth = 3 (5 data points).
Early stopping condition 3 reached. Minimum error reduction.
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
Subtree, depth = 6 (20638 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (96 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 5 (932 data points).
Split on feature grade.A. (702, 230)
Subtree, depth = 6 (702 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (230 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 4 (358 data points).
Split on feature emp_length.8 years. (347, 11)
Subtree, depth = 5 (347 data points).
Early stopping condition 3 reached. Minimum error reduction.
Subtree, depth = 5 (11 data points).
Split on feature home_ownership.OWN. (9, 2)
Subtree, depth = 6 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (1276 data points).
Early stopping condition 3 reached. Minimum error reduction.
Subtree, depth = 2 (4701 data points).
Early stopping condition 3 reached. Minimum error reduction.
Subtree, depth = 0 (37224 data points).
Early stopping condition 3 reached. Minimum error reduction.

In [39]:
print "Validation data, classification error (model 4):", evaluate_classification_error(model_4, validation_set)
print "Validation data, classification error (model 5):", evaluate_classification_error(model_5, validation_set)
print "Validation data, classification error (model 6):", evaluate_classification_error(model_6, validation_set)

Validation data, classification error (model 4): 0.383778543731
Validation data, classification error (model 5): 0.383778543731
Validation data, classification error (model 6): 0.503446790177

In [40]:
print count_leaves(model_4),count_leaves(model_5),count_leaves(model_6)

41 13 1

Exploring the effect of min_node_size

In [41]:
model_7 = decision_tree_create(train_data, features, 'safe_loans', max_depth = 6,
                               min_node_size = 0, min_error_reduction=-1)
model_8 = decision_tree_create(train_data, features, 'safe_loans', max_depth = 6,
                               min_node_size = 2000, min_error_reduction=-1)
model_9 = decision_tree_create(train_data, features, 'safe_loans', max_depth = 6,
                               min_node_size = 50000, min_error_reduction=-1)

Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
Subtree, depth = 2 (9122 data points).
Split on feature grade.B. (8074, 1048)
Subtree, depth = 3 (8074 data points).
Split on feature grade.C. (5884, 2190)
Subtree, depth = 4 (5884 data points).
Split on feature grade.D. (3826, 2058)
Subtree, depth = 5 (3826 data points).
Split on feature grade.E. (1693, 2133)
Subtree, depth = 6 (1693 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (2133 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 5 (2058 data points).
Split on feature grade.E. (2058, 0)
Subtree, depth = 6 (2058 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (2190 data points).
Split on feature grade.D. (2190, 0)
Subtree, depth = 5 (2190 data points).
Split on feature grade.E. (2190, 0)
Subtree, depth = 6 (2190 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (1048 data points).
Split on feature emp_length.5 years. (969, 79)
Subtree, depth = 4 (969 data points).
Split on feature grade.C. (969, 0)
Subtree, depth = 5 (969 data points).
Split on feature grade.D. (969, 0)
Subtree, depth = 6 (969 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (79 data points).
Split on feature home_ownership.MORTGAGE. (34, 45)
Subtree, depth = 5 (34 data points).
Split on feature grade.C. (34, 0)
Subtree, depth = 6 (34 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (45 data points).
Split on feature grade.C. (45, 0)
Subtree, depth = 6 (45 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 2 (101 data points).
Split on feature emp_length.n/a. (96, 5)
Subtree, depth = 3 (96 data points).
Split on feature emp_length.< 1 year. (85, 11)
Subtree, depth = 4 (85 data points).
Split on feature grade.B. (85, 0)
Subtree, depth = 5 (85 data points).
Split on feature grade.C. (85, 0)
Subtree, depth = 6 (85 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (11 data points).
Split on feature grade.B. (11, 0)
Subtree, depth = 5 (11 data points).
Split on feature grade.C. (11, 0)
Subtree, depth = 6 (11 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (5 data points).
Split on feature grade.B. (5, 0)
Subtree, depth = 4 (5 data points).
Split on feature grade.C. (5, 0)
Subtree, depth = 5 (5 data points).
Split on feature grade.D. (5, 0)
Subtree, depth = 6 (5 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
Subtree, depth = 6 (20638 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (96 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 5 (932 data points).
Split on feature grade.A. (702, 230)
Subtree, depth = 6 (702 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (230 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 4 (358 data points).
Split on feature emp_length.8 years. (347, 11)
Subtree, depth = 5 (347 data points).
Split on feature grade.A. (347, 0)
Subtree, depth = 6 (347 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (11 data points).
Split on feature home_ownership.OWN. (9, 2)
Subtree, depth = 6 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (1276 data points).
Split on feature grade.A. (1276, 0)
Subtree, depth = 4 (1276 data points).
Split on feature grade.B. (1276, 0)
Subtree, depth = 5 (1276 data points).
Split on feature grade.C. (1276, 0)
Subtree, depth = 6 (1276 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 2 (4701 data points).
Split on feature grade.A. (4701, 0)
Subtree, depth = 3 (4701 data points).
Split on feature grade.B. (4701, 0)
Subtree, depth = 4 (4701 data points).
Split on feature grade.C. (4701, 0)
Subtree, depth = 5 (4701 data points).
Split on feature grade.E. (4701, 0)
Subtree, depth = 6 (4701 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
Subtree, depth = 2 (9122 data points).
Split on feature grade.B. (8074, 1048)
Subtree, depth = 3 (8074 data points).
Split on feature grade.C. (5884, 2190)
Subtree, depth = 4 (5884 data points).
Split on feature grade.D. (3826, 2058)
Subtree, depth = 5 (3826 data points).
Split on feature grade.E. (1693, 2133)
Subtree, depth = 6 (1693 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (2133 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 5 (2058 data points).
Split on feature grade.E. (2058, 0)
Subtree, depth = 6 (2058 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (2190 data points).
Split on feature grade.D. (2190, 0)
Subtree, depth = 5 (2190 data points).
Split on feature grade.E. (2190, 0)
Subtree, depth = 6 (2190 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (1048 data points).
Early stopping condition 2 reached. Reached minimum node size.
Subtree, depth = 2 (101 data points).
Early stopping condition 2 reached. Reached minimum node size.
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
Subtree, depth = 6 (20638 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (96 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 5 (932 data points).
Early stopping condition 2 reached. Reached minimum node size.
Subtree, depth = 4 (358 data points).
Early stopping condition 2 reached. Reached minimum node size.
Subtree, depth = 3 (1276 data points).
Early stopping condition 2 reached. Reached minimum node size.
Subtree, depth = 2 (4701 data points).
Split on feature grade.A. (4701, 0)
Subtree, depth = 3 (4701 data points).
Split on feature grade.B. (4701, 0)
Subtree, depth = 4 (4701 data points).
Split on feature grade.C. (4701, 0)
Subtree, depth = 5 (4701 data points).
Split on feature grade.E. (4701, 0)
Subtree, depth = 6 (4701 data points).
Early stopping condition 1 reached. Reached maximum depth.
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 3 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
Subtree, depth = 0 (37224 data points).
Early stopping condition 2 reached. Reached minimum node size.

In [42]:
print "Validation data, classification error (model 7):", evaluate_classification_error(model_7, validation_set)
print "Validation data, classification error (model 8):", evaluate_classification_error(model_8, validation_set)
print "Validation data, classification error (model 9):", evaluate_classification_error(model_9, validation_set)

Validation data, classification error (model 7): 0.383778543731
Validation data, classification error (model 8): 0.384532529082
Validation data, classification error (model 9): 0.503446790177

In [43]:
print count_leaves(model_7),count_leaves(model_8),count_leaves(model_9)

41 19 1