Online Anomaly Detection in Time Series using Merge Growing Neural Gas

Mario Tambos

Bachelor's Thesis Defense

December 2014

Objective

Develop a new online, unsupervised anomaly detection technique using Merge Growing Neural Gas.

Outline

  • Intro to Anomaly Detection
  • Merge Growing Neural Gas
  • Merge Growing Neural Gas for Anomaly Detection
  • Hierarchical Temporal Memory
  • Experiments
  • Conclusions

Outline

  • Intro to Anomaly Detection
    • Challenges
    • Classification of Anomaly Detectors
    • Characteristics of MGNG-AD
    • Questions
  • Merge Growing Neural Gas
  • MGNG for Anomaly Detection
  • Hierarchical Temporal Memory
  • Experiments
  • Conclusions

Intro to Anomaly Detection

Many names for the same concept:

  • Anomaly
  • Outlier
  • Noise
  • Discord
  • etc.

Intro to Anomaly Detection


An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism. (Hawkins)

For some authors, outlier encompases anomalies and noise.

  • Anomalies are of interest to an data analyst, noise is not.
  • In this work, all are considered synonyms.

Intro to Anomaly Detection


Challenges

  • Difficult to separate normal from abnormal.
  • Malicious agents try to disguise their activities as normal.
  • What is "normal" changes over time.
  • Concept of "anomaly" varies across domains.
  • Training data difficult to obtain.
  • Data can contain noise (uninteresting outliers).

Solution:

  • Narrow the scope of the task.
    • The scope's choices made affect the results obtained.
    • These restrictions give to a classification.
  • No silver bullet.

Intro to Anomaly Detection


Classification of Anomaly Detectors

  • Characteristics of the input data.
  • Type of anomaly.
  • Type of supervision.
  • Type of output.
  1. Data type, dimensionality, relationship (spatial, temporal, etc), availability (online, offline)
  2. Point, contextual, collective.
  3. Supervised, semi-supervised, unsupervised.
  4. Score, label.

Intro to Anomaly Detection


Characteristics of MGNG-AD

  • Detects contextual anomalies in time series.
    • Inputs need to be sequentially related.
    • Detects anomalies wrt. their temporal context.

6 points total

  • Context defined by position of instance in sequence.
  • Takes vectors in $\mathbb{R}^n$ as input.
  • Online, evolving model method.
    • Can process the data as it is generated/observed.
    • If normal behavior changes, MGNG-AD should be able to adapt.
  • Unsupervised learning approach.
  • Is a score-based detector.
  • Assumes that the input can be modeled using MGNG
    • If not, MGNG-AD cannot provide meaningful results.

Intro to Anomaly Detection


Questions?

Outline

  • Intro to Anomaly Detection
  • Merge Growing Neural Gas
    • How does it work?
    • Questions
  • MGNG for Anomaly Detection
  • Hierarchical Temporal Memory
  • Experiments
  • Conclusions

Merge Growing Neural Gas

  • Unsupervised neural network.
  • Part of the Self-organizing map family.
  • Used to model time series.

Merge Growing Neural Gas


How does it work?

  • Very similar to GNG.
    • Nodes and edges are generated/adapted to approximate the input.
  • 6 points.
  • GNG: prototype-based vector quantization method that approximates the input space topology.
  • Nodes have an additional context vector besides their weight vector.
  • A so called Global Temporal Context (GTC) is kept.
  • The winner neurons are the ones that are nearer the input and the GTC.
  • Neurons are adapted by moving their weight vector towards the input and the context vector towards the GTC.
  • The GTC of each step is a linear combination of the last step's winner's weight and context vector, before adaptation.

Merge Growing Neural Gas


Questions?

Outline

  • Intro to Anomaly Detection
  • Merge Growing Neural Gas
  • MGNG for Anomaly Detection
    • Algorithm
    • Questions
  • Hierarchical Temporal Memory
  • Experiments
  • Conclusions

MGNG for Anomaly Detection

  • Key idea: Compare two models of the same time series.

3 points.

  • The anomaly level of a data point $a_t$ of a time series $S$ is the dissimilarity between two MGNG models of $S$:
    • A “present” model $M^Q$ with input $S$, which is fed the most up-to-date data.
    • A “past” model $M^P$ with input $P$, which is fed delayed data.
  • The more similar $M^Q$ is to $M^P$, the more normal $a_t$ is.

MGNG for Anomaly Detection


Algorithm

Given a delay $k$, for each time step $t$ do:

4 points.

  1. Feed $a_t$ is fed to $M^Q$.
  1. Feed $a_{t−k}$ to $M^P$.
  1. Calculate the distance $D_t$ of $M^Q$ to $M^P$.
  1. Return $D_t$ as the “anomaly level” $A_t$.

MGNG for Anomaly Detection


How is the distance $D_t$ calculated?

  1. For each neuron $q \in \mathcal{K}^Q$, find a neuron $p \in \mathcal{K}^P$ such that:

$\arg\ \min_{p\in\mathcal{K}^P} d_{q,p}$

where

$d_{m,n}=(1−\omega)\cdot\|\mathbf{w}_m−\mathbf{w}_n\|^2+\omega\cdot\|\mathbf{c}_m−\mathbf{c}_n\|^2$

2 points.

  1. The distance $D_t$ is the average of all $d_{q,p}$:

$D_t:=\frac{1}{|\mathcal{K}^Q|}\sum_{q\in\mathcal{K}^Q}{d_{q,p}}$

MGNG for Anomaly Detection


Questions?

Outline

  • Intro to Anomaly Detection
  • Merge Growing Neural Gas
  • MGNG for Anomaly Detection
  • Hierarchical Temporal Memory
    • HTM Functions
    • Cortical Learning Algorithms and Online Prediction Framework
    • Questions
  • Experiments
  • Conclusions

Hierarchical Temporal Memory (HTM)

  • Type of neural network inspired by the neocortex.
  • Neurons are organized in arrays, called columns.
  • Columns organized in two dimensional arrays, called regions.
  • Regions form a hierarchy:
    • Lower levels learn basic concepts (edges, corners in an image).
    • Higher levels learn more abstract concept (“car”, “flower”).

Hierarchical Temporal Memory


HTM Functions

  • Learning: build an internal representation of the inputs.
    • Spatial patterns, in each input signal.
    • Temporal patterns, in sequences of input signals.

4 Points.

Only first three important for anomaly detection.

  • Inference: match new inputs and sequences of new inputs to the patterns already stored.
  • Prediction: guess what input signals could come next, based upon the spatio-temporal patterns learned.
    • The anomaly score of an input is the difference between predicted and seen.
  • Behavior: give feedback to environment.

Hierarchical Temporal Memory


Cortical Learning Algorithms and Online Prediction Framework

  • Cortical Learning Algorithms (CLA): mechanism used to implement the HTM's functions.
  • CLA is implemented in the Numenta Platform for Intelligent Computing (NuPIC)
  • Also part of NuPIC:
    • Online Prediction Framework (OPF): used for set up and running of HTM models for anomaly detection.
    • Swarming: optimizes an HTM model's parameters to perform a certain task.

Hierarchical Temporal Memory


Questions?

Outline

  • Intro to Anomaly Detection
  • Merge Growing Neural Gas
  • MGNG for Anomaly Detection
  • Hierarchical Temporal Memory
  • Experiments
    • Methodology
    • Mackey-Glass Time Series
    • Randomly Generated Time Series
    • Questions
  • Conclusions

Experiments

Methodology

  • Two types of artificial 4-dimensional time series of length $150000$:
    • Mackey-Glass.
    • Randomly generated.

4 points. Mackey-Glass: Defined by a differential equation. Models physiological phenomena. Randomly generated: Each instance of each component taken from a normal distribution.

  • Anomalies added in seven out of nine experiments to a subsequence of length $1800$ (from $74601$ to $76400$).
  • F1-Score was used to evaluate performance.
  • Mean and standard deviation over some sub-sequences were also calculated.

Experiments

Methodology

Important to Note

  • Neither method is a classificator
  • It is not a classification problem.
    • It was made one just so the F1-Score could be used.
  • Literature agrees that unsupervised methods cannot be quantitatively evaluated in the general case.

General case = any of the domains they're supposed to work in.

Experiments

Methodology

  • One MGNG-AD model trained in each experiment.
    • Output: anomaly_score.
    • $M^P$ and $M^Q$ set up with same parameters.
    • Parameters set as in paper by Andreakis et alii for their Mackey-Glass experiment.
    • Exception: $\epsilon_w$ and $\epsilon_v$ were set to much higher values.

3 points.

  • Four HTM models, one for each time series' component, due to limitations in the OPF.
    • Parameters obtained by swarming over the first 400 data points.
    • Output: 4 anomaly_likelihoods.
    • The output was averaged to obtain a single likelihood per data point.
  • A threshold was set to determine wheter a datapoint was classified as an anomaly.
    • The threshold was selected to maximize each model's F1-Score.

Experiments

Methodology

Example random noise in time domain

Experiments

Methodology

Example incremental noise in time domain

Experiments

Methodology

Example random noise in frequency domain

w/noise
wo/noise

Experiments

Methodology

Example mean shift

Experiments


Mackey-Glass

  • Std of both models was lower in almost all cases for longer, later normal subsequences.

4 points.

  • Random noise in the time domain:
    • Both models performed well. F1-Scores:
      • MGNG-AD: .92.
      • HTM: .98.
  • Incremental noise in the time domain
    • MGNG-AD clearly outperformed HTM. F1-Scores:
      • MGNG-AD: .83
      • HTM: .39
    • MGNG-AD provided a non-decreasing std.
  • Random noise in the frequency domain:
    • HTM obtained a high F1-Score (.82).
    • MGNG-AD was oblivious to the presence of anomalies (F1-Score: .04).

Experiments


Randomly Generated Time Series

  • HTM seemed unable to model the normal sunsequences.
    • It performed poorly in both experiments (F1-Score ~ .02).
    • Stds were not stable.
  • MGNG-AD was worse than in the Mackey-Glass case.
    • However: its stds were decreasing.
  • Random noise in the time domain:
    • MGNG-AD F1-Score: .42.
  • Incremental noise in the time domain
    • MGNG-AD F1-Score: .65.

Experiments


Mixed Time Series

  • Random noise in time domain of Mackey-Glass component.
    • F1-Scores:
      • MGNG-AD: .62
      • HTM: .14
    • Stds were lower for longer, later normal subsequences in both cases.
  • Random noise in time domain of randomly generated component.
    • F1-Scores:
      • MGNG-AD: .56
      • HTM: .02
    • Stds decreased over normal subsequences in MGNG-AD case only.

Experiments


Mean Shift

  • Mackey-Glass time series:
    • Both models showed strong reaction at the moment of the shift.
    • Their values returned to normal or better in under:
      • MGNG-AD: 10000 time steps.
      • HTM: 1000 steps.
  • Randomly generated time series:
    • MGNG-AD performed similarly as in the previous experiment.
    • HTM could also adapt to the shift, but its mean and std were higher than with the Mackey-Glass.

Experiments


Questions?

Outline

  • Intro to Anomaly Detection
  • Merge Growing Neural Gas
  • MGNG for Anomaly Detection
  • Hierarchical Temporal Memory
  • Experiments
  • Conclusions

Conclusions

Summary

  • MGNG-AD obtained F1-Scores higher than .6 in 5 out of 7 experiments.
    • In 4 of those 5 it outperformed HTM.

5 points.

  • HTM obtained F1-scores higher than .8 in 2 out of 7 experiments.
    • In both it outperformed MGNG-AD.
  • The stds provided by both models were lower for longer, later normal subsequences when using Mackey-Glass components.
  • Only the stds provided by MGNG-AD were lower for longer, later normal subsequences when using randomly generated components.
  • Both models could handle the mean shift.

Conclusions


Other aspects

  • Parameters:
    • 19 for MGNG-AD vs over 50 for HTM.
    • HTM's parameters can be swarmed using NuPIC (not reliable in all cases).
    • MGNG-AD's parameters must be adjusted manually.
  • Complexity and capabilities:
    • HTM is much more complex and much slower than MGNG-AD.
    • HTM has been applied successfully to several problems, MGNG-AD is brand new.
  • As with other unsupervised methods, is not recommended to use MGNG-AD alone.
    • Under human supervision.
    • In combination with other methods.

Thank You!

Bibliography

[Agg13] Aggarwal, Charu C.: Outlier analysis. New York, NY : Springer, 2013. http://dx.doi.org/10.1007/978-1-4614-6396-2. http://dx.doi.org/10.1007/978-1-4614-6396-2. – ISBN 978–1–4614–6395–5

[AHHB09] Andreakis, Andreas ; Hoyningen-Huene, Nicolai v. ; Beetz, Michael: Incremental unsupervised time series analysis using merge growing neural gas. In: Advances in Self-Organizing Maps. Springer, 2009, S. 10–18

[APD05] Anstey, J ; Peters, D ; Dawson, Chris: Discovering novelty in time series data. In: Proceedings of the 15th annual newfoundland electrical and computer engineering conference Citeseer, 2005

[BCM +86] Baim, Donald S. ; Colucci, Wilson S. ; Monrad, E S. ; Smith, Harton S. ; Wright, Richard F. ; Lanoue, Alyce ; Gauthier, Diane F. ; Ransil, Bernard J. ; Grossman, William ; Braunwald, Eugene: Survival of patients with severe congestive heart failure treated with oral milrinone. In: Journal of the American College of Cardiology 7 (1986), Nr. 3, S. 661–670

[CBK09] Chandola, Varun ; Banerjee, Arindam ; Kumar, Vipin: Anomaly detection: A survey. In: ACM Computing Surveys (CSUR) 41 (2009), Nr. 3, S. 15

[Chi92] Chinchor, Nancy: MUC-4 Evaluation Metrics. In: Fourth Message Understanding Conference, 1992, 22–29

[CT93] Chappell, Geoffrey J. ; Taylor, John G.: The temporal Koh{ø}nen map. In: Neural networks 6 (1993), Nr. 3, S. 441–445

[DF] Dunning, Ted ; Friedman, Ellen: Practical Machine Learning: A New Look at Anomaly Detection. – ISBN 9781491904084

[EH11] Estevez´ , Pablo A. ; Hernandez´ , Rodrigo: Gamma-filter selforganizing neural networks for time series analysis. In: Advances in Self-Organizing Maps. Springer, 2011, S. 151–159

[Fri95] Fritzke, Bernd: A growing neural gas network learns topologies. In: Advances in neural information processing systems 7 (1995), S. 625–632

[Fri97] Fritzke, Bernd: A self-organizing network that can follow non-stationary distributions. In: Artificial Neural Networks—ICANN’97. Springer, 1997, S. 613–618

[GAG +] Goldberger, A L. ; Amaral, L A N. ; Glass, L ; Hausdorff, J M. ; Ivanov, P C. ; Mark, R G. ; Mietus, J E. ; Moody, G B.; Peng, C.-K. ; Stanley, H E.: {PhysioBank, PhysioToolkit, and PhysioNet}: Components of a New Research Resource for Complex Physiologic Signals. In: Circulation 101, Nr. 23, S. e215—-e220

[GGAH14a] Gupta, Manish ; Gao, Jing ; Aggarwal, Charu C. ; Han, Jiawei: Outlier Detection for Temporal Data. In: IEEE Transactions on Knowledge and Data Engineering 25 (2014), Nr. 1

[GGAH14b] Gupta, Manish ; Gao, Jing ; Aggarwal, Charu C. ; Han, Jiawei: Outlier Detection for Temporal Data : A Survey. 25 (2014), Nr. 1, S. 1–20

[HAD11a] Hawkins, Jeff ; Ahmad, Subutai ; Dubinsky, Donna: HIERARCHICAL TEMPORAL MEMORY including HTM Cortical Learning Algorithms. In: Whitepaper, Numenta Inc (2011). http://numenta.org/resources/HTM_CorticalLearningAlgorithms.pdf

[HAD11b] Hawkins, Jeff ; Ahmad, Subutai ; Dubinsky, Donna: The Science of Anomaly Detection: How HTM Enables Anomaly Detection in Streaming Data. 2011

[Haw80] Hawkins, Douglas M.: Identification of outliers. Bd. 11. Springer, 1980

[HER08] HERNANDEZ C´ ARCAMO´ , RODRIGO E.: Mapas temporales mediante redes neuronales auto-organizativas. (2008)

[HG06] Hawkins, Jeff ; George, Dileep: Hierarchical temporal memory: Concepts, theory and terminology. In: Whitepaper, Numenta Inc (2006)

[Hol02] Holmstrom¨ , Jim: Growing neural gas: Experiments with GNG, GNG with utility and supervised GNG. In: Master’s thesis, Uppsala University - (2002), S. –

[HST03] Hagenbuchner, Markus ; Sperduti, Alessandro ; Tsoi, Ah C.: A self-organizing map for adaptive processing of structured data. In: Neural Networks, IEEE Transactions on 14 (2003), Nr. 3, S. 491–505

[KLF05] Keogh, Eamonn ; Lin, Jessica ; Fu, Ada: Hot sax: Efficiently finding the most unusual time series subsequence. In: Data mining, fifth IEEE international conference on IEEE, 2005, S. 8—-pp

[Koh90] Kohonen, Teuvo: The self-organizing map. In: Proceedings of the IEEE 78 (1990), Nr. 9, S. 1464–1480

[LKLC03] Lin, Jessica ; Keogh, Eamonn ; Lonardi, Stefano ; Chiu, Bill: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery - DMKD ’03 (2003), 2. http://dx.doi.org/10.1145/882085. 882086. – DOI 10.1145/882085.882086

[MGO77] Mackey, Michael C. ; Glass, Leon ; Others: Oscillation and chaos in physiological control systems. In: Science 197 (1977), Nr. 4300, S. 287–289

[MS91] Martinetz, Thomas ; Schulten, Klaus: A” neural-gas” network learns topologies. University of Illinois at Urbana-Champaign, 1991

[NuP] Numenta Platform for Intelligent Computing (NuPIC). http://numenta.org/nupic.html

[OPF] Online Prediction Framework. https://github.com/numenta/nupic/wiki/Online-Prediction-Framework

[PP07] Patcha, Animesh ; Park, Jung-Min: An overview of anomaly detection techniques: Existing solutions and latest technological trends. In: Computer Networks 51 (2007), Nr. 12, S. 3448–3470

[Sas07] Sasaki, Yutaka: The truth of the F-measure. (2007)

[SH] Strickert, Marc ; Hammer, Barbara: Merge SOM for temporal data. , Nr. October 2004

[SH03] Strickert, Marc ; Hammer, Barbara: Neural gas for sequences. In: Proceedings of the Workshop on Self-Organizing Maps (WSOM’03) Citeseer, 2003, S. 53–57

[SHB05] Strickert, Marc ; Hammer, Barbara ; Blohm, Sebastian: Unsupervised recursive sequence processing. In: Neurocomputing 63 (2005), S. 69–97

[SNK12] Saaid, Fatimah ; Nur, Darfiana ; King, Robert: Change Points Detection of Vector Autoregressive Model using SDVAR Algorithm. (2012), Nr. February, S. 2–3

[TSK05] Tan, Pang-Ning ; Steinbach, Michael ; Kumar, Vipin: Introduction to Data Mining, Addison. 2005

[Voe02] Voegtlin, Thomas: Recursive self-organizing maps. In: Neural Networks 15 (2002), Nr. 8, S. 979–991

[WKL +05] Wei, Li ; Kumar, Nitin ; Lolla, Venkata N. ; Keogh, Eamonn ; Lonardi, Stefano ; Ratanamahatana, Chotirat (.: Assumption-Free Anomaly Detection in Time Series. In: SSDBM (2005). http://www.cs.ucr.edu/˜wli/publications/WeiL_AnomalyDetection.doc

[YT02] Yamanishi, Kenji ; Takeuchi, Jun-ichi: A unifying framework for detecting outliers and change points from non-stationary time series data. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining ACM, 2002, S. 676–681

Appendices

Classification of Anomaly Detectors

By Characteristics of the Input Data

  • Type of attributes:
    • Nominal, metric, ordinal, etc.
  • Dimensionality of instances:
    • One-dimensional vs. multi-dimensional.
  • Relationship among instances:
    • Spatial data, sequential data, graph data, etc.
  • Availability of instances:
    • Offline detectors
    • Online detectors

Classification of Anomaly Detectors


By Type of Anomaly

  • Point anomalies:
    • Data instances are considered anomalous with respect to the rest of the data.

Classification of Anomaly Detectors

By Type of Anomaly

  • Contextual anomalies:
    • Data instances are only considered anomalous in a specific context.
    • Requires each instance to be defined using:
      • Contextual attributes.
      • Behavioral attributes.

Classification of Anomaly Detectors

By Type of Anomaly

  • Collective anomalies:
    • A collection of instances is anomalous with respect the rest of the data.

Classification of Anomaly Detectors


By Type of Supervision

  • Supervised.
  • Semi-supervised.
  • Unsupervised.
  1. Requires labeled normal and anomalous instances.
  2. Make use of only one class, usually normal.
  3. Make the assumption that normal is more common than abnormal.

Classification of Anomaly Detectors


By Type of Output

  • Scores:
    • Represents the “level of (ab)normality” of an instance.
    • Most common type of output, particularly among unsupervised detectors.
  • Labels:
    • Indicates whether an instance belongs to normal or anomaly class.
    • Scores can be converted to labels with a threshold.

Offline vs online Anomaly Detectors

  • Offline detector example

Offline vs online Anomaly Detectors


  • Online detector example

Supervised vs. Unsupervised Detectors

  • Example: credit card payment sequence

    5, 6, 2, 3, 5, 89, 4, 6, 1, 2, 100, 90, 87, 88

  • Unsupervised detector assigns high scores to 89, 100, 100, 90, 87, 88

    • Anything that deviates from normal is an anomaly.
  • Supervised detector assigns anomaly class to 100, 90, 87, 88

    • Only several consecutive high payments are anomalies.

Merge Growing Neural Gas

Model

  • Is a tuple $M = (\mathcal{K}, \mathcal{E}, \mathcal{C}, \alpha, \beta, \gamma, \delta, \theta, \eta, \lambda, \epsilon_w, \epsilon_v)$
    • $\mathcal{K}=\{k_1,...,k_n\}$ is a set of neurons, $k_i=(\mathbf{w}_i,\mathbf{c}_i,e_i)$
    • $\mathcal{E}$ is a set of undirected connections,
    • $\mathcal{C}$ is a sequence of global temporal contexts, and
    • $\alpha, \beta, \gamma, \delta, \theta, \eta, \lambda, \epsilon_w, \epsilon_v$ are learning parameters.
  • $\mathcal{K}$ and $\mathcal{E}$ build the model’s network.
  • The input of MGNG is a time series $\mathcal{X}$.

Merge Growing Neural Gas


Algorithm

  1. Set the parameters $\alpha, \beta, \gamma, \delta, \theta, \eta, \lambda, \epsilon_w, \epsilon_v$.
  2. set time variable $t:=1$
  3. initialize neuron set $\mathcal{K}$ with two neurons with counters $e_1:=0$, $e_2:=0$ and random weight and context vectors $\mathbf{w}_1$ , $\mathbf{w}_2$, $\mathbf{c}_1$ and $\mathbf{c}_2$.
  4. initialize connections set $\mathcal{E} ⊆ \mathcal{K}\times\mathcal{K}:=\emptyset$
  5. initialize global temporal context
    $\mathbf{C}_1:=\mathbf{0}$
    $\mathcal{C}:=\mathcal{C}\dotplus(\mathbf{C}_1)$
  6. $M:=time\_step(t,\mathcal{X},M)$
  7. $t:=t+1$
  8. if more input signals available, go to step 6. Otherwise terminate.

Merge Growing Neural Gas

Algorithm

The $time\_step$ funciton

  1. Find winner $r$ and second winner $s$ (the two closest neurons to $\mathbf{x}_t$ and $\mathbf{C}_t$):

$d_{\mathcal{X}, \mathcal{C}}(t,n) = (1 − \alpha) \cdot \|\mathbf{x}_t − \mathbf{w}_n\|^2 + \alpha \cdot \| \mathbf{C}_t − \mathbf{c}_n\|^2 (\mathbf{x}_t\in\mathcal{X}, \mathbf{C}_t\in\mathcal{C})$

  1. Update GTC:

$\mathbf{C}_{t+1}:=(1 − \beta) \cdot \mathbf{w}_r + \beta \cdot \mathbf{c}_r$

  1. Connect $r$ with $s$.

   Set $age_{(r,s)}:=0$.

Merge Growing Neural Gas

Algorithm

The $time\_step$ funciton (contd.)

  1. Update neuron $r$ and its direct topological neighbors $n\in\mathcal{N}_r$:
    1. Move $\mathbf{w}_r$ towards $\mathbf{x}_t$ using $\epsilon_w$.
    2. Move $\mathbf{c}_r$ towards $\mathbf{C}_t$ using $\epsilon_w$.
    3. $e_r:=e_r + 1$
    4. Move $\mathbf{w}_n$ towards $\mathbf{x}_t$ using $\epsilon_v$.
    5. Move $\mathbf{c}_n$ towards $\mathbf{C}_t$ using $\epsilon_v$.
  2. Increment the age of all connections of $r$.
  3. Remove old connections ($(a,b) | age_{(a,b)} > \gamma$):
  4. Delete all neurons with no connections.

Merge Growing Neural Gas

Algorithm

The $time\_step$ funciton (contd.)

  1. If $t\ mod\ \lambda \equiv 0$ and $|K| < \theta$:
    1. Find neuron $q$ with the greatest $e_q$.
    2. Find neighbor $f$ of $q$ with greatest $e_f$.
    3. Create new neuron $l$ between $q$ and $f$.
    4. Decrease counter of $q$ and $f$ by the factor $\delta$.
  2. Decrease counter of all neurons by the factor $\eta$.
  3. Return $M$.

MGNG for Anomaly Detection

Model

A MGNG-AD model D with time series input $\mathcal{X}$ of dimensionality $m$ and size $s$ is a tuple $D=(f,\omega,\mathcal{A},M^P,M^Q,j)$, where:

  • $f$ is a neuron matching function (more on this later),
  • $\omega\in[0,1]$ is a scalar parameter for the function $f$,
  • $\mathcal{A}_{1,s;m}=(A_1,...,A_s)$ is a sequence of anomaly levels,
  • $j$ is the delay between $M^P$ and $M^Q$,
  • $M^P$ is a MGNG model (the “past” model) with time series input $P_{1,s−j− 1;m}\subset\mathcal{X}$ of size $s − j − 1$, where:

$M^P=(\mathcal{K}^P,\mathcal{E}^P,\mathcal{C}^P,\alpha^P,\beta^P,\gamma^P,\delta^P,\theta^P,\eta^P,\lambda^P,\epsilon^P_w,\epsilon^P_v)$

  • $M^Q$ is a MGNG model (the “present” model) with time series input $\mathcal{X}$, where:

$M^Q=(\mathcal{K}^Q,\mathcal{E}^Q,\mathcal{C}^Q,\alpha^Q,\beta^Q,\gamma^Q,\delta^Q,\theta^Q,\eta^Q,\lambda^Q,\epsilon^Q_w,\epsilon^Q_v)$

Experiments Details

Makey-Glass Time Series

  • Defined by the differential equation:

$\frac{dx}{dt}=\beta x\frac{x_\tau}{1 +x^n_\tau}-\gamma x; \gamma, \beta, n > 0$

  • Four series $MG1$,...,$MG4$ with parameters $\beta = 0.2$, $\gamma = 0.1$, $\tau = 17$ and $n = 10$ (same as Andreakis et alii) generated.
  • $MG2*=-1$, $MG3*=2$, $MG4*=-2$ to further differentiate.
  • Each instance $mg_t \in MG$ is defined as:

$mg_t := (mg1_t,mg2_t,mg3_t,mg4_t)$

Experiments Details

Makey-Glass Time Series

Random Noise in Time Domain - MGNG-AD

Experiments Details

Makey-Glass Time Series

Random Noise in Time Domain - MGNG-AD

Experiments Details

Makey-Glass Time Series

Random Noise in Time Domain - HTM

Experiments Details

Makey-Glass Time Series

Random Noise in Time Domain - HTM

Experiments Details

Makey-Glass Time Series

Incremental Noise in Time Domain - MGNG-AD

Experiments Details

Makey-Glass Time Series

Incremental Noise in Time Domain - MGNG-AD

Experiments Details

Makey-Glass Time Series

Incremental Noise in Time Domain - HTM

Experiments Details

Makey-Glass Time Series

Incremental Noise in Time Domain - HTM

Experiments Details

Makey-Glass Time Series

Random Noise in Frequency Domain - MGNG-AD

Experiments Details

Makey-Glass Time Series

Random Noise in Frequency Domain - MGNG-AD

Experiments Details

Makey-Glass Time Series

Random Noise in Frequency Domain - HTM

Experiments Details

Makey-Glass Time Series

Random Noise in Frequency Domain - HTM

Experiments Details


Randomly generated Time Series

  • Generated by randomly sampling a normal distribution:
    • $r1_t \in \mathcal{N}(0,1)$
    • $r2_t \in \mathcal{N}(0,-1)$
    • $r3_t \in \mathcal{N}(0,2)$
    • $r4_t \in \mathcal{N}(0,-2)$
    • $r_t := (r1_t,r2_t,r3_t,r4_t)$

Experiments Details

Randomly generated Time Series

Random Noise in Time Domain - MGNG-AD

Experiments Details

Randomly generated Time Series

Random Noise in Time Domain - MGNG-AD

Experiments Details

Randomly generated Time Series

Random Noise in Time Domain - HTM

Experiments Details

Randomly generated Time Series

Random Noise in Time Domain - HTM

Experiments Details

Randomly generated Time Series

Incremental Noise in Time Domain - MGNG-AD

Experiments Details

Randomly generated Time Series

Incremental Noise in Time Domain - MGNG-AD

Experiments Details

Randomly generated Time Series

Incremental Noise in Time Domain - HTM

Experiments Details

Randomly generated Time Series

Incremental Noise in Time Domain - HTM

Experiments Details


Mixed Time Series

  • Combination of two MAckey-Glass and two randomly generated time series:
    • $MG1$ "normal Mackey-Glass"
    • $MG2$ "normal Mackey-Glass" $\times -2$
    • $r1_t \in \mathcal{N}(0,1)$
    • $r2_t \in \mathcal{N}(0,-2)$
    • $s_t := (mg1_t,mg2_t,r1_t,r2_t)$

Experiments Details

Mixed Time Series

Noise in Mackey-Glass Component

Experiments Details

Mixed Time Series

Noise in Mackey-Glass Component - MGNG-AD

Experiments Details

Mixed Time Series

Noise in Mackey-Glass Component - MGNG-AD

Experiments Details

Mixed Time Series

Noise in Mackey-Glass Component - HTM

Experiments Details

Mixed Time Series

Noise in Mackey-Glass Component - HTM

Experiments Details

Mixed Time Series

Noise in Randomly Generated Component

Experiments Details

Mixed Time Series

Noise in Randomly Generated Component - MGNG-AD

Experiments Details

Mixed Time Series

Noise in Randomly Generated Component - MGNG-AD

Experiments Details

Mixed Time Series

Noise in Randomly Generated Component - HTM

Experiments Details

Mixed Time Series

Noise in Randomly Generated Component - HTM

Experiments Details


Time Series Shift

  • No noise added.
  • The constant vector $\mathbf{x}=(5,−5,10,−10)$ was added to every data point from data point $74601$ onwards instead.
  • Given that there are no anomalies, no F1-Score could be calculated.

Experiments Details

Time Series Shift

Mackey-Glass Time Series - MGNG-AD

Experiments Details

Time Series Shift

Mackey-Glass Time Series - MGNG-AD

Experiments Details

Time Series Shift

Mackey-Glass Time Series - HTM

Experiments Details

Time Series Shift

Mackey-Glass Time Series - HTM

Experiments Details

Time Series Shift

Randomly Generated Time Series - MGNG-AD

Experiments Details

Time Series Shift

Randomly Generated Time Series - MGNG-AD

Experiments Details

Time Series Shift

Randomly Generated Time Series - HTM

Experiments Details

Time Series Shift

Randomly Generated Time Series - HTM