In reponse to this issue on GitHub I have started doing stochastic evaluations — running each model instantiation and prediction 100 times.
As described in that thread, I'm doing more or less this:
y_pred = []
for seed in range(100):
np.random.seed(seed)
clf = RandomForestClassifier(<hyperparams>, random_state=seed, n_jobs=-1)
clf.fit(X, y)
y_pred.append(clf.predict(X_test))
print('.', end='')
np.save('100_realizations.npy', y_pred)
I then evaluate against the blind data, which results in 100 F1 scores — that's the npy file you load here.
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
ls -l *_f1.npy
In [3]:
accs = np.load('ar4_100_realizations_f1.npy')
s = pd.Series(accs)
In [4]:
plt.hist(accs)
plt.show()
In [5]:
s.describe()
Out[5]:
In [6]:
np.median(s)
Out[6]:
In [ ]: