In [1]:
from episode_miner import Event, EventSequence, EventSequences, Episode, Episodes
from pprint import pprint
In [2]:
sequence_of_events = (Event('a', 1), Event('b', 2), Event('a', 3), Event('a', 5), Event('b', 8))
event_sequences = EventSequences(sequence_of_events=sequence_of_events, start=0, end=9)
frequent_episodes = event_sequences.find_serial_episodes(window_width=5,
min_frequency=0.2,
only_full_windows=False,
allow_intermediate_events=True)
frequent_episodes
Out[2]:
All episodes in output of find_serial_episodes
are equipped with abs_support
, rel_support
and allow_intermediate_events
attributes.
In [3]:
frequent_episodes[3].abs_support, frequent_episodes[3].rel_support, frequent_episodes[3].allow_intermediate_events
Out[3]:
In [4]:
episodes = Episodes((Episode(('a',)), Episode(('a', 'b'))))
episodes = event_sequences.support(episodes=episodes,
window_width=5,
only_full_windows=False,
allow_intermediate_events=True)
episodes[0], episodes[0].abs_support, episodes[0].rel_support, episodes[0].allow_intermediate_events
Out[4]:
The defaults are
only_full_windows = False
allow_intermediate_events = episodes[0].allow_intermediate_events
It is more efficient to find the supports for list of episodes than for each episode separetely.
Get the absolute support of episodes.
In [5]:
episodes.abs_support()
Out[5]:
Get relative support of episodes.
In [6]:
episodes.rel_support()
Out[6]:
The default is
only_full_windows=False
If True, the start of the first window is at the start of the sequence of events and the end of the last window is at the end of the sequence of events.
If False, the end of the first window is at the start of the sequence of events and the start of the last window is at the end of the sequence of events.
In [7]:
sequence_of_events = (Event('a', 0), Event('b', 4), Event('c', 7))
event_sequences = EventSequences(sequence_of_events=sequence_of_events, start=0, end=9)
frequent_episodes = event_sequences.find_serial_episodes(window_width=5,
min_frequency=0.2,
only_full_windows=True)
print('Full windows:', frequent_episodes)
frequent_episodes = event_sequences.find_serial_episodes(window_width=5,
min_frequency=0.2,
only_full_windows=False)
print('All windows: ', frequent_episodes)
In case of full windows the Winepi frequency of episodes near the start or end of the event sequence is reduced, but the number of all windows is smaller and therefore the relative frequency of episodes is increased.
An event B
is an intermediate event for events A
and C
if
time of A < time of B < time of C.
The default is
allow_intermediate_events = True
In the next example 2nd and 3rd event have the same time and the only episode which has an intermediate event is ('a', 'd')
.
In [8]:
sequence_of_events = (Event('a', 0), Event('b', 2), Event('c', 2), Event('d', 3))
event_sequences = EventSequences(sequence_of_events=sequence_of_events, start=0, end=6)
print('No intermediate events:')
frequent_episodes = event_sequences.find_serial_episodes(window_width=4,
min_frequency=0.1,
allow_intermediate_events=False)
pprint(frequent_episodes)
print('Allow intermediate events:')
frequent_episodes = event_sequences.find_serial_episodes(window_width=4,
min_frequency=0.1,
allow_intermediate_events=True)
pprint(frequent_episodes)
In the next example "no intermediate events" version finds only episodes of length one, but "allow intermediate events" version discovers the pattern.
In [9]:
sequence_of_events = (Event('a', 1), Event('c', 2), Event('b', 3), Event('c', 4),
Event('a', 5), Event('d', 6), Event('b', 7), Event('d', 8),
Event('a', 9), Event('e', 10), Event('b', 11), Event('e', 12))
event_sequences = EventSequences(sequence_of_events=sequence_of_events, start=0, end=13)
frequent_episodes = event_sequences.find_serial_episodes(window_width=4,
min_frequency=0.2,
allow_intermediate_events=False)
print('No intermediate events:')
pprint(frequent_episodes)
frequent_episodes = event_sequences.find_serial_episodes(window_width=4,
min_frequency=0.2,
allow_intermediate_events=True)
print('\nAllow intermediate events:')
pprint(frequent_episodes)