Anomaly (outlier) Detection

outlier score:
- 1-dimensional: $O(x) = \displaystyle \frac{|x - \mu|}{\sigma}$
- multidimensional: use the mahalanobis distance $O(x) = (x-\mu)^T \Sigma^{-1} (x - \mu)$

Compute the distance between every pair of data instances
An anomlay is an instance for which there are fewer than $k$ neighbors within distance $\delta$
- User can define either distance threshold $\delta$ or the number of neighbors $k$

A data structure to support nearest-neighbor search (for numeric-valued attributes)
Properties of K-D tree *
Procedure
1. choose the dimension with largest variance
2. find the median along the dimension as split point
3. parition the space into two parts based on the median
4. choose the enxt dimension and repeat
Conventions: the median is always one of the points (not the average of the two middle points)

Alternate between the dimensions

Evaluation

**True Positive Rate (TPR)

False Positive Rate (FPR)



In [ ]: