We will use distance between test segments computed in 140926-test-signal-jump to find sequence of segments that were likely together. Armed with this fact we can take the individual proababilities of each segment and combine it to form one probability that will be used to update the probabilites of all the segments in the sequence
the sequences are found using a greedy algoirthm that stops when a conflict is detected
the probabilities of segments should be combined by multiplying them, however this did not work well. Probably because the probabilites are not well calibrated. Taking the mean had a better effect.
Suppose you have a chain of segments: $i \in 1 \ldots N $
Each segment predicts a seizure $P_i$ or not $Q_i=1-P_i$
if a chain is negative then the probability is $\prod Q_i$ if a chain is positive the situation is more complex. There is a chance $U$ that a seizure detection even has happened and $V=1-U$ it did not. I estimate $U$ to be around $0.2$. So the probability is $\prod ( U * P_i + V*Q_i)$
or $\prod Q_i \times \Pi ( U \frac{P_i}{Q_i} +V )$
the ratio of positive to negative probability is just $r = \prod ( U \frac{P_i}{Q_i} +V )$ and probability is $1/(1+1/r)$
In [1]:
%matplotlib inline
from matplotlib import pylab as pl
import cPickle as pickle
import pandas as pd
import numpy as np
import os
individual segment probablility file
In [2]:
FNAME_IN = '../submissions/141001-predict.2.csv'
updated probability file
In [3]:
FNAME_OUT = '../submissions/141001-predict.3.csv'
In [4]:
!head {FNAME_IN}
In [5]:
scores = pd.read_csv(FNAME_IN, index_col='clip', squeeze=True)
out_scores = scores.copy()
In [6]:
scores['Dog_2_test_segment_0004.mat']
Out[6]:
In [7]:
targets = set(['_'.join(f.split('_')[:2]) for f in scores.index.values])
targets
Out[7]:
In [8]:
for target in targets:
print
d = np.load('../data-cache/%s-test-jump-distance.npy'%target)
N = d.shape[0]
print target, N
dord = np.unravel_index(d.ravel().argsort(),d.shape)
Nsequences = N/6
# find good pairs of segments that are likely to be paired in time
next_segment = [-1]*N
previous_segment = [-1]*N
for i,(s1,s2) in enumerate(np.array(dord).T):
dist = d[s1,s2]
if next_segment[s1] != -1:
print i,'right conflict',dist
break
if previous_segment[s2] != -1:
print i,'left conflict',dist
break
next_segment[s1] = s2
previous_segment[s2] = s1
# if i < Nsequences:
# print 'skip'
# continue
# check code
for i in range(N):
if next_segment[i] != -1:
assert previous_segment[next_segment[i]] == i
# find good sequences
sequences = []
for i in range(N):
if previous_segment[i] == -1 and next_segment[i] != -1:
j = i
sequence = [j]
while next_segment[j] != -1:
j = next_segment[j]
sequence.append(j)
sequences.append(sequence)
len_sequences = [len(sequence) for sequence in sequences]
print '#sequences',len(sequences), '%segments that was sequenced',sum(len_sequences)/float(N), 'longest sequence', max(len_sequences)
print sequences
#compute probability for sequences
sequences_prb = []
for sequence in sequences:
p0 = 1.
q0 = 1.
p1 =0.
p2 = 0.
p3 = 1.
U = 0.2 # chance of seizure detection event in a preictal segment
V = 1-U
for s in sequence:
P = scores['%s_test_segment_%04d.mat'%(target,s+1)]
Q = 1.-P
p0 *= P
q0 *= Q
p1 += P
if P > p2:
p2 = P
p3 *= (U * P/Q + (1-U))
p0 = p0 / (p0+q0)
p1 = p1 / len(sequence)
p2 = p2
p3 = 1./(1+1./p3)
# print p0, p1, p2, p3
sequences_prb.append(p2)
# fix probability for segments in sequences
for p,sequence in zip(sequences_prb,sequences):
# all segments in the same sequence will be assigned the same probability
for s in sequence:
out_scores['%s_test_segment_%04d.mat'%(target,s+1)] = p
In [9]:
out_scores.to_csv(FNAME_OUT, header=True)
In [10]:
!paste {FNAME_IN} {FNAME_OUT} | head
In [11]:
out_scores['Dog_2_test_segment_0004.mat']
Out[11]:
In [12]:
df = pd.DataFrame()
df['in'] = pd.read_csv(FNAME_IN, index_col='clip', squeeze=True) #64
df['out'] = pd.read_csv(FNAME_OUT, index_col='clip', squeeze=True)
In [13]:
pd.scatter_matrix(df,figsize=(6, 6), diagonal='kde');
In [ ]: