In this example, we will show what to do when you are analysing frequency counts of data and you want to identify which part of the data is the reason for a deviation. For plotting the data in 3D, we use the open source 3D plotting library Mayavi.
In [ ]:
import pyisc;
import visisc;
import numpy as np
import datetime
from scipy.stats import poisson, norm, multivariate_normal
%matplotlib wx
from pylab import plot, figure
First, we create a event data set using a set of Poisson distributed frequency counts and then train an anomaly detector. Each row in the data consists of an event source (for instance, an identifier of a machine), an optional event source class (for instance, the machine type), a time stamp (a date), a measurment period (for instance, number of days), and a set of different events with frequency counts.
In [ ]:
n_sources = 10
n_events = 20
num_of_normal_days = 200
num_of_anomalous_days = 10
data = None
days_list = [num_of_normal_days, num_of_anomalous_days]
dates = []
for state in [0,1]: # normal, anomalous data
num_of_days = days_list[state]
for i in range(n_sources):
data0 = None
for j in range(n_events):
if state == 0:# Normal
po_dist = poisson(int((10+2*(n_sources-i))*(float(j)/n_events/2+0.75))) # from 0.75 to 1.25
else: # anomalous
po_dist = poisson(int((20+2*(n_sources-i))*(float(j)/n_events+0.5))) # from 0.5 to 1.5
tmp = po_dist.rvs(num_of_days)
if data0 is None:
data0 = tmp
else:
data0 = np.c_[data0,tmp]
tmp = np.c_[
[i] * (num_of_days), # Sources
[ # Timestamp
datetime.date(2015,02,24) + datetime.timedelta(d)
for d in np.array(range(num_of_days)) + (0 if state==0 else num_of_normal_days)
],
[1] * (num_of_days), # Measurement period
data0, # Event frequency counts
]
if data is None:
data = tmp
else:
data = np.r_[
tmp,
data
]
# Column index into the data
source_column = 0
date_column = 1
period_column = 2
first_event_column = 3
last_event_column = first_event_column + n_events
First we create a flat model with a root element where all columns in the data are subelements:
In [ ]:
model = visisc.EventDataModel.flat_model(
event_columns=range(first_event_column,last_event_column)
)
Second we transform numpy array to a pyisc data object. The data object consists of the orginial event columns, the source column, period column and a root column containg the sum of all event frequency counts per day. In this case, the source and the class are the same. The source identifies the origin of the data, for instance, the user or machine that generates the data, while the class is the type of source. A reference to the last created data object is also kept in the model.
In [ ]:
data_object = model.data_object(
data,
source_column = source_column,
class_column = source_column,
period_column = period_column,
date_column = date_column
)
Thereafter, we create an anomaly detector and fit a onesided poisson distribution for each event column. A reference to the last created and fitted anomaly detector is also kept in the model
In [ ]:
anomaly_detector = model.fit_anomaly_detector(data_object, poisson_onesided=True)
Finally, we can viualize the event frequency data using the Visualization class. However, due to incompatibility between the used 3D engine (Mayavi) and Jupyter notebook, we have to run the notebook as a script (if it does not work for Windows, try to run it using the command prompt in the notebook catalog):
vis = visisc.EventVisualization(model, 13.8,start_day=209)
In [ ]:
!ipython --matplotlib=wx --gui=wx -i visISC_simple_frequency_data_example.py
Now you soon will se a window looking similar to the picture below, which shows the 30 last (y-axis) frequancy counts (z-axis) for the different sources (x-axis). The white color means anomaly scores less than 13.8, while red color means anomaly scores larger than 13.8. As can be seen, only the last 10 days are anomalous. For more on interacting with the visualisation window, see Mayavi documentation on Interaction with the scene.
If we click on a source label or a bar, we can zoom into the source instance and we can see the detail of each event frequency count (x-axis). Below, the data for source 2 is shown and now, we see that only if we look at the root element, we are able to detect this instance as an anomaly.
In the shown example, we have used the same data for training the anomaly detector as we use when visualizing. However, we can easily replace the data set by calling model.data_object again with another data set, and then create a new instance of Visualization.