The objective of this notebook is to simulate a baseline case for the custom error metric

Define the error metric:


In [34]:
def ComputeErrorMetric(y_true,y_pred):
    
    df = pd.DataFrame({'y_true':y_true, 'y_pred':y_pred})
    # home_wins
    hw_fp = ((df.y_true != 1) & (df.y_pred == 1))
    hw_tp = ((df.y_true == 1) & (df.y_pred == 1))
    hw_fn = ((df.y_true == 1) & (df.y_pred != 1))
    hw_tn = ((df.y_true != 1) & (df.y_pred != 1))
    # away_win
    aw_fp = ((df.y_true != -1) & (df.y_pred == -1))
    aw_tp = ((df.y_true == -1) & (df.y_pred == -1))
    aw_fn = ((df.y_true == -1) & (df.y_pred != -1))
    aw_tn = ((df.y_true != -1) & (df.y_pred != -1))
    #  draw
    dd_fp = ((df.y_true != 0) & (df.y_pred == 0))
    dd_tp = ((df.y_true == 0) & (df.y_pred == 0))
    dd_fn = ((df.y_true == 0) & (df.y_pred != 0))
    dd_tn = ((df.y_true != 0) & (df.y_pred != 0))

    true_positive = sum(hw_tp + aw_tp + dd_tp)
    false_positive = sum(hw_fp + aw_fp + dd_fp) 
    true_negative = sum(hw_tn + aw_tn + dd_tn)
    false_negative = sum(hw_fn + aw_fn + dd_fn)

    combined_error_metric = 11.0/13.0*false_positive/(false_positive+true_negative)+2.0/18.0*false_negative/(false_negative+true_positive)
    
    #precision = true_positive / (true_positive + false_positive)
    #recall = true_positive / (true_positive + false_negative)
    
    return round(combined_error_metric,2)

In [35]:
import pandas as pd

Set an artificial set of true and predicted values.

The ratio of 1,0,-1 values are representative to the dataset. 10 home wins / 6 draws / 6 away wins

The predicted value are repeating values of 1,0,-1 to simulate picking random variables.


In [41]:
true = [1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,-1,-1,-1,-1,-1,-1]
pred = [1,0,-1,1,0,-1,1,0,-1,1,0,-1,1,0,-1,1,0,-1,1,0,-1,1]

In [42]:
cem = ComputeErrorMetric(true,pred)

print round(cem,2)


0.4

The output of the simulation is an error metric of 0.4. This can be used as a baseline value for 'random guessing'


In [39]:
true = []
pred = []
for i in range(5000):
    true.append(random.randint(-1,1))
    pred.append(random.randint(-1,1))

cem = ComputeErrorMetric(true,pred)

print round(cem,2)


0.41

0.41 is the error metric from randomly picking values between -1,1. This goes along with the experiment above that error metrics around 0.4 would be considered a random guess baseline.


In [ ]: