Winepi

A partial implementation of Winepi algorithm described by Mannila, Toivonen and Verkamo in Discovery of Frequent Episodes in Event Sequences, 1997.


In [1]:
from episode_miner import Event, EventSequence, EventSequences, Episode, Episodes
from pprint import pprint

In [2]:
sequence_of_events = (Event('a', 1), Event('b', 2), Event('a', 3), Event('a', 5), Event('b', 8))
event_sequences = EventSequences(sequence_of_events=sequence_of_events, start=0, end=9)

frequent_episodes = event_sequences.find_serial_episodes(window_width=5, 
                                                         min_frequency=0.2, 
                                                         only_full_windows=False, 
                                                         allow_intermediate_events=True)
frequent_episodes


Out[2]:
[('b'), ('a'), ('b', 'a'), ('a', 'b'), ('a', 'a'), ('a', 'b', 'a')]

All episodes in output of find_serial_episodes are equipped with abs_support, rel_support and allow_intermediate_events attributes.


In [3]:
frequent_episodes[3].abs_support, frequent_episodes[3].rel_support, frequent_episodes[3].allow_intermediate_events


Out[3]:
(6, 0.46153846153846156, True)

Support

Equip episodes with support (frequency) information.


In [4]:
episodes = Episodes((Episode(('a',)), Episode(('a', 'b'))))
episodes = event_sequences.support(episodes=episodes,
                                   window_width=5,
                                   only_full_windows=False, 
                                   allow_intermediate_events=True)
episodes[0], episodes[0].abs_support, episodes[0].rel_support, episodes[0].allow_intermediate_events


Out[4]:
(('a'), 9, 0.6923076923076923, True)

The defaults are

only_full_windows = False
allow_intermediate_events = episodes[0].allow_intermediate_events

It is more efficient to find the supports for list of episodes than for each episode separetely.

Get the absolute support of episodes.


In [5]:
episodes.abs_support()


Out[5]:
[9, 6]

Get relative support of episodes.


In [6]:
episodes.rel_support()


Out[6]:
[0.6923076923076923, 0.46153846153846156]

Only full windows vs all windows

The default is

only_full_windows=False

If True, the start of the first window is at the start of the sequence of events and the end of the last window is at the end of the sequence of events.

If False, the end of the first window is at the start of the sequence of events and the start of the last window is at the end of the sequence of events.


In [7]:
sequence_of_events = (Event('a', 0), Event('b', 4), Event('c', 7))
event_sequences = EventSequences(sequence_of_events=sequence_of_events, start=0, end=9)

frequent_episodes = event_sequences.find_serial_episodes(window_width=5, 
                                                         min_frequency=0.2, 
                                                         only_full_windows=True)
print('Full windows:', frequent_episodes)
frequent_episodes = event_sequences.find_serial_episodes(window_width=5, 
                                                         min_frequency=0.2, 
                                                         only_full_windows=False)
print('All windows: ', frequent_episodes)


Full windows: [('b',), ('c',), ('a',), ('b', 'c'), ('a', 'b')]
All windows:  [('b',), ('c',), ('a',)]

In case of full windows the Winepi frequency of episodes near the start or end of the event sequence is reduced, but the number of all windows is smaller and therefore the relative frequency of episodes is increased.

Allow intermediate events vs no intermediate events

An event B is an intermediate event for events A and C if

time of A < time of B < time of C.

The default is

allow_intermediate_events = True

In the next example 2nd and 3rd event have the same time and the only episode which has an intermediate event is ('a', 'd').


In [8]:
sequence_of_events = (Event('a', 0), Event('b', 2), Event('c', 2), Event('d', 3))
event_sequences = EventSequences(sequence_of_events=sequence_of_events, start=0, end=6)

print('No intermediate events:')
frequent_episodes = event_sequences.find_serial_episodes(window_width=4, 
                                                         min_frequency=0.1, 
                                                         allow_intermediate_events=False)
pprint(frequent_episodes)
print('Allow intermediate events:')
frequent_episodes = event_sequences.find_serial_episodes(window_width=4, 
                                                         min_frequency=0.1, 
                                                         allow_intermediate_events=True)
pprint(frequent_episodes)


No intermediate events:
[('b',),
 ('d',),
 ('c',),
 ('a',),
 ('b', 'd'),
 ('c', 'd'),
 ('a', 'b'),
 ('a', 'c'),
 ('a', 'b', 'd'),
 ('a', 'c', 'd')]
Allow intermediate events:
[('b',),
 ('d',),
 ('c',),
 ('a',),
 ('b', 'd'),
 ('c', 'd'),
 ('a', 'b'),
 ('a', 'd'),
 ('a', 'c'),
 ('a', 'b', 'd'),
 ('a', 'c', 'd')]

In the next example "no intermediate events" version finds only episodes of length one, but "allow intermediate events" version discovers the pattern.


In [9]:
sequence_of_events = (Event('a', 1), Event('c',  2), Event('b',  3), Event('c',  4), 
                      Event('a', 5), Event('d',  6), Event('b',  7), Event('d',  8), 
                      Event('a', 9), Event('e', 10), Event('b', 11), Event('e', 12)) 
event_sequences = EventSequences(sequence_of_events=sequence_of_events, start=0, end=13)

frequent_episodes = event_sequences.find_serial_episodes(window_width=4, 
                                                         min_frequency=0.2, 
                                                         allow_intermediate_events=False)
print('No intermediate events:')
pprint(frequent_episodes)

frequent_episodes = event_sequences.find_serial_episodes(window_width=4, 
                                                         min_frequency=0.2, 
                                                         allow_intermediate_events=True)
print('\nAllow intermediate events:')
pprint(frequent_episodes)


No intermediate events:
[('e',), ('b',), ('d',), ('c',), ('a',)]

Allow intermediate events:
[('e',),
 ('b',),
 ('d',),
 ('c',),
 ('a',),
 ('b', 'e'),
 ('b', 'd'),
 ('b', 'a'),
 ('d', 'b'),
 ('c', 'b'),
 ('a', 'b')]