Anomaly Detection Overview

An anomaly detector for computing anomaly scores is constructed by providing a set of component distributions that defines the models used by the anomaly detector. Then, in order to train the anomaly detector, the $fit$ method can be called with some training data, and then compute the anomaly scores with the $anomaly\_score$ method. Below, we show how to create and train a bivariate Gaussian distribution and how to compute anomaly scores.


In [1]:
import numpy as np
import pyisc

# Get some data:
X = np.array([[20, 4], [1200, 130], [12, 8], [27, 8], [-9, 13], [2, -6]])

# Create an anomaly detector where the numbers are column indices of the data:
anomaly_detector = pyisc.AnomalyDetector(
    pyisc.P_Gaussian([0,1])
)

# The anomaly detector is trained
anomaly_detector.fit(X) 

# Then, we can compute the anomaly scores for the data:
anomaly_detector.anomaly_score(X)

# The result is anomaly scores (with two decimal precision):
#array([ 0.10,  1.08,  0.10,  0.05,  0.67, 0.77])


Out[1]:
array([ 0.09595115,  1.07745075,  0.0999642 ,  0.05291047,  0.67480946,
        0.77318013])

By comparing the number pairs in the list, the second element easily stands out as the "most anomalous". Similarly, we can create a anomaly detector with the Gamma or Poisson distributions where the numbers are the column indices into the input data:


In [11]:
pyisc.P_Gamma(frequency_column=0,period_column=1)

pyisc.P_Poisson(frequency_column=0,period_column=1);

In case we have more than one known class of data points, it is also possible to train the detector to make a separate model for each class. In this case, if $y$ is an array with two or more class labels, the anomaly detector can still be similarly trained and likewise compute the anomaly scores:


In [28]:
#Create classes (only one class)
y = np.zeros(len(X))

#Fit classes
anomaly_detector.fit(X,y)

anomaly_detector.anomaly_score(X,y)


Out[28]:
array([ 0.09595115,  1.07745081,  0.0999642 ,  0.05291047,  0.67480948,
        0.77318014])

In [ ]: