We have a number of csvs that perform approximately equally well on the leaderboard, but use different processing and features. Looking at the differences in predictions between these.


In [50]:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
plt.rcParams['figure.figsize'] = 8, 8
plt.rcParams['axes.grid'] = True
plt.set_cmap('brg')


<matplotlib.figure.Figure at 0x7f6784dce1d0>

In [1]:
cd ..


/home/gavin/repositories/hail-seizure

In [23]:
csvone = "output/forestselection_gavin_submission_using__v2_feats.csv"
# most recent best scoring
csvtwo = "best_from_batchall/output/SVC_best_for_each_subject_in_batchall_with_FS_submission_using__v3_feats.csv"
# expected good for patient 2
#csvtwo = "best_from_batchall/output/best_feats_combo_40_submission_using__v3_feats.csv"

In [24]:
import csv

In [25]:
dictone = {}
with open(csvone) as f:
    c = csv.reader(f)
    fl = next(c)
    for l in c:
        dictone[l[0]] = float(l[1])

In [26]:
dicttwo = {}
with open(csvtwo) as f:
    c = csv.reader(f)
    fl = next(c)
    for l in c:
        dicttwo[l[0]] = float(l[1])

In [27]:
segments = list(dictone.keys())

Merging by mean for submission

Normally described as an incredibly simple but effective way to combine models. Take the predictions of each and take the mean.


In [20]:
with open("output/merged_by_mean.csv","w") as f:
    c = csv.writer(f)
    c.writerow(fl)
    for s in segments:
        merged = np.mean([dictone[s],dicttwo[s]])
        c.writerow([s,merged])

In [21]:
!wc output/merged_by_mean.csv


  3936   3936 179834 output/merged_by_mean.csv

In [22]:
!head output/merged_by_mean.csv











Comparing differences

The above submission is currently the best scoring. Comparing the differences in predictions made by each model.


In [30]:
differences = {}
for segment in segments:
    difference = abs(dictone[segment]-dicttwo[segment])
    differences[segment] = difference

In [51]:
h=plt.hist(list(differences.values()), bins=25, log=True)


Differences are logarithmically spaced, most disagreements are minor. Splitting these by subject:


In [38]:
import json

In [39]:
with open("settings/forestselection_gavin.json") as f:
    settings = json.load(f)

In [44]:
subjdifferences = {}
for subject in settings['SUBJECTS']:
    subjdifferences[subject] = {}
for subject in settings['SUBJECTS']:
    for segment in segments:
        if subject in segment:
            difference = abs(dictone[segment]-dicttwo[segment])
            subjdifferences[subject][segment] = difference

In [56]:
spltinds = [(i,j) for i in range(3) for j in range(3)][:7]

In [58]:
f, axarr = plt.subplots(3,3)
for subject,(i,j) in zip(settings['SUBJECTS'],spltinds):
    axarr[i,j].set_title(subject)
    h=axarr[i,j].hist(list(subjdifferences[subject].values()), bins=25, log=True)
    meandiff = np.mean(list(subjdifferences[subject].values()))
    print(subject + " has average prediction difference of {0}".format(meandiff))


Dog_1 has average prediction difference of 0.02064752975320786
Dog_2 has average prediction difference of 0.010975418109321925
Dog_3 has average prediction difference of 0.02086565650608659
Dog_4 has average prediction difference of 0.03390148789065487
Dog_5 has average prediction difference of 0.0019821561473201445
Patient_1 has average prediction difference of 0.028013861925094356
Patient_2 has average prediction difference of 0.18082091683388

Looks like Patient 2 was changed the most by the mean merging.

Therefore, could weight the merging based on differences in AUC reported results? Results for the two models are:

forestselection_gavin   0.612280991736  0.991058806801  0.88025 0.760826709857  0.979630853994  0.911704716852  0.649256198347  0.834669401275
SVC_best_for_each_subject_in_batchall_with_FS   0.628723140496  0.995495161918  0.889429752066  0.772317200377  0.98061707989   0.916032496915  0.628533057851  0.872194085084

So, it looks like the second should be better for everything except Patient 2.

Testing this, will create two submissions, one which weights the recent model on patient 2, and one which weights the older model on patient 2.


In [59]:
with open("output/merged_by_mean_recent_weighted_P2.csv","w") as f:
    c = csv.writer(f)
    c.writerow(fl)
    for s in segments:
        if "Patient_2" in s:
            merged = 0.1*dictone[s] + 0.9*dicttwo[s]
        else:
            merged = np.mean([dictone[s],dicttwo[s]])
        c.writerow([s,merged])

In [60]:
with open("output/merged_by_mean_old_weighted_P2.csv","w") as f:
    c = csv.writer(f)
    c.writerow(fl)
    for s in segments:
        if "Patient_2" in s:
            merged = 0.9*dictone[s] + 0.1*dicttwo[s]
        else:
            merged = np.mean([dictone[s],dicttwo[s]])
        c.writerow([s,merged])

Finlay's theory is that Patient 2 is misleading, and that the newer features are less noisy, so the classifier is less confident. Whereas with the newer features the confident classifier makes more mistakes.

Theoretically, then we might do better just moderating the predictions by a prior belief. To do that though, we need to define a conditional distribution of observing this result given our prior belief.

As a sanity check, might as well replace all the predictions for Patient 2 with the class prior:


In [63]:
p2prior = 18.0/42
with open("output/merged_by_mean_prior_P2.csv","w") as f:
    c = csv.writer(f)
    c.writerow(fl)
    for s in segments:
        if "Patient_2" in s:
            merged = p2prior
        else:
            merged = np.mean([dictone[s],dicttwo[s]])
        c.writerow([s,merged])

Submitted and improved our best score. Overfitting on Patient 2 must have been massive.

Question is now, what score do you get if you replace all subjects by their respective class priors?


In [64]:
priors = {'Patient_2':18.0/42,
          'Patient_1':18.0/50,
          'Dog_5':30.0/450,
          'Dog_4':97.0/804,
          'Dog_3':72.0/1440,
          'Dog_2':42.0/500,
          'Dog_1':24.0/480}

In [66]:
with open("output/priors.csv","w") as f:
    c = csv.writer(f)
    c.writerow(fl)
    for subject in settings['SUBJECTS']:
        for s in segments:
           if subject in s:
                c.writerow([s,priors[subject]])

In [67]:
!head output/priors.csv












In [68]:
!wc -l output/priors.csv


3936 output/priors.csv

Only scored 0.52, so our predictions are not completely useless.

Taking average of the class prior model and both the csvs defined above:


In [80]:
with open("output/merged_by_mean_and_priors.csv","w") as f:
    c = csv.writer(f)
    c.writerow(fl)
    for s in segments:
        prior = [priors[subj] for subj in settings['SUBJECTS'] if subj in s][0]
        t = 1.0/3
        merged = t*dictone[s]+t*dicttwo[s]+t*prior
        c.writerow([s,merged])

Now, it might be interesting to see if our best performing model for Patient 2 performs better than priors. So, keeping all other subjects equal to priors.


In [81]:
bestp2 = {}
with open("output/SVC_ica_psd_logfBB_submission_using__v3_feats.csv") as f:
    c = csv.reader(f)
    fl = next(c)
    for l in c:
        bestp2[l[0]] = float(l[1])

In [83]:
with open("output/priors_and_bestp2.csv","w") as f:
    c = csv.writer(f)
    c.writerow(fl)
    for s in segments:
        if 'Patient_2' in s:
            c.writerow([s,bestp2[s]])
        else:
            prior = [priors[subj] for subj in settings['SUBJECTS'] if subj in s][0]
            c.writerow([s,prior])

In [84]:
!wc -l output/priors_and_bestp2.csv


3936 output/priors_and_bestp2.csv

Merging many high scoring models

Looking at the submissions we've placed, sorted by their score we can take those which scored well and used a variety of different features. If we merge these, they should cover a variety of different predictions, and hopefully make less mistakes.


In [134]:
outputcsvs = [csvone,
              csvtwo,
              "output/stoch_opt_2nd_submission_using__v2_feats.csv",
              "output/bbsubj_pg_submission_using__v2_feats.csv"]

In [135]:
dicts = {}
for cname in outputcsvs:
    dicts[cname] = {}
for cname in outputcsvs:
    with open(cname) as f:
        c = csv.reader(f)
        fl = next(c)
        for l in c:
            dicts[cname][l[0]] = float(l[1])

In [91]:
with open("output/merged_many_v1.csv","w") as f:
    c = csv.writer(f)
    c.writerow(fl)
    for s in segments:
        many = []
        for cname in outputcsvs:
            many.append(dicts[cname][s])
        c.writerow([s,np.mean(many)])

In [92]:
!wc -l output/merged_many_v1.csv


3936 output/merged_many_v1.csv

In [93]:
!head output/merged_many_v1.csv












In [94]:
outputcsvs


Out[94]:
['output/forestselection_gavin_submission_using__v2_feats.csv',
 'best_from_batchall/output/SVC_best_for_each_subject_in_batchall_with_FS_submission_using__v3_feats.csv',
 'output/stoch_opt_2nd_submission_using__v2_feats.csv',
 'output/bbsubj_pg_submission_using__v2_feats.csv']

Comparing output of bagging tests

We have limited submissions and I want to know if the results from changing bagging settings has actually produced a different output. Opening most recent big bagger and checking if it's making some new insights.


In [95]:
metabag = {}
with open("best_from_batchall/output/best_5_feats_per_subj_fs_bagging_submission_using__v3_feats.csv") as f:
    c = csv.reader(f)
    fl = next(c)
    for l in c:
        metabag[l[0]] = float(l[1])

In [100]:
differences = {}
for s in segments:
    many = []
    for cname in outputcsvs:
        many.append(dicts[cname][s])
    subject = [subj for subj in settings['SUBJECTS'] if subj in s][0]
    differences[s] = abs(np.mean(many)-metabag[s])

In [101]:
h=plt.hist(list(differences.values()), bins=25, log=True)



In [103]:
subjdifferences = {}
for subject in settings['SUBJECTS']:
    subjdifferences[subject] = {}
for s in differences.keys():
    subject = [subj for subj in settings['SUBJECTS'] if subj in s][0]
    subjdifferences[subject][s] = differences[s]

In [104]:
f, axarr = plt.subplots(3,3)
for subject,(i,j) in zip(settings['SUBJECTS'],spltinds):
    axarr[i,j].set_title(subject)
    h=axarr[i,j].hist(list(subjdifferences[subject].values()), bins=25, log=True)
    meandiff = np.mean(list(subjdifferences[subject].values()))
    print(subject + " has average prediction difference of {0}".format(meandiff))


Dog_1 has average prediction difference of 0.07399804467006905
Dog_2 has average prediction difference of 0.05089346716249404
Dog_3 has average prediction difference of 0.1548756828904873
Dog_4 has average prediction difference of 0.046560074382240184
Dog_5 has average prediction difference of 0.016466505323713715
Patient_1 has average prediction difference of 0.11772024479643448
Patient_2 has average prediction difference of 0.2315801992692685

Appears it has many insights. Although its performance on Patient 2 was worse according to our tests so I'm not liking how it's going to affect that. Anyway, probably worth giving it a try.


In [131]:
dicts["best_from_batchall/output/best_5_feats_per_subj_fs_bagging_submission_using__v3_feats.csv"] = metabag

In [111]:
with open("output/merged_many_v2.csv","w") as f:
    c = csv.writer(f)
    c.writerow(fl)
    for s in segments:
        many = []
        for cname in outputcsvs:
            many.append(dicts[cname][s])
        c.writerow([s,np.mean(many)])

In [108]:
!wc -l output/merged_many_v2.csv


3936 output/merged_many_v2.csv

Scored exactly the same. Was there a problem generating the file?


In [109]:
!head output/merged_many_v2.csv












In [110]:
!head output/merged_many_v1.csv











Trying again:


In [132]:
with open("output/merged_many_v2.csv","w") as f:
    c = csv.writer(f)
    c.writerow(fl)
    for s in segments:
        many = []
        for cname in dicts.keys():
            many.append(dicts[cname][s])
        c.writerow([s,np.mean(many)])

In [113]:
!head output/merged_many_v1.csv












In [114]:
!head output/merged_many_v2.csv











Merging in the results from Finlay's run over a range of different features. Going to merge the results from his run, then merge that in half and half and submit. Just going to assume that's the right way to do it.


In [115]:
import glob

In [122]:
fcsvs = glob.glob("best_from_batchall/output/best_actual*")+glob.glob("best_from_batchall/output/best_predicted*")

In [124]:
fdicts = {}
for cn in fcsvs:
    fdicts[cn] = {}
for cn in fcsvs:
    with open(cn) as f:
        c = csv.reader(f)
        fl = next(c)
        for l in c:
            fdicts[cn][l[0]] = float(l[1])

In [141]:
dicts = {}
for cname in outputcsvs:
    dicts[cname] = {}
for cname in outputcsvs:
    with open(cname) as f:
        c = csv.reader(f)
        fl = next(c)
        for l in c:
            dicts[cname][l[0]] = float(l[1])

In [138]:
with open("output/merged_many_v3.csv","w") as f:
    c = csv.writer(f)
    c.writerow(fl)
    for s in segments:
        many = []
        for cname in dicts.keys():
            many.append(dicts[cname][s])
        fmany = []
        for cname in fdicts.keys():
            fmany.append(fdicts[cname][s])
        m1 = np.mean(many)
        m2 = np.mean(fmany)
        c.writerow([s,np.mean([m1,m2])])

In [139]:
!head output/merged_many_v3.csv












In [140]:
!wc -l output/merged_many_v3.csv


3936 output/merged_many_v3.csv

Submitted with no improvement. Mirroring extra weighting on forestselection model that performed well above, but including other features.


In [143]:
outputcsvs


Out[143]:
['output/forestselection_gavin_submission_using__v2_feats.csv',
 'best_from_batchall/output/SVC_best_for_each_subject_in_batchall_with_FS_submission_using__v3_feats.csv',
 'output/stoch_opt_2nd_submission_using__v2_feats.csv',
 'output/bbsubj_pg_submission_using__v2_feats.csv']

In [144]:
with open("output/merged_many_v4.csv","w") as f:
    c = csv.writer(f)
    c.writerow(fl)
    for s in segments:
        many = []
        for cname in outputcsvs:
            many.append(dicts[cname][s])
        mnval = many[0]*0.3 + many[1]*0.4 + many[2]*0.15 + many[3]*0.15
        c.writerow([s,mnval])

In [145]:
!head output/merged_many_v4.csv












In [146]:
!wc -l output/merged_many_v4.csv


3936 output/merged_many_v4.csv