Technology Articles

Time Series and How to Detect Anomalies in Them: Part III

Hello there, my name is Artur.

You might be reading this intro for the third time — and if this is the case, I appreciate your sticking with this article series.

I am the head of the Machine Learning team in Akvelon-Kazan and you are about to read the last part of the tutorial for anomaly detection in time series.

During our research, we’ve managed to gather a lot of information from tiny useful pieces all over the internet and we don’t want this knowledge to be lost so we are sharing it with you!

Eventually easier than it seemed

We already dove into the theory and data preparation in Part I and defined and trained three models in Part II:

We reuse our code so if something seems unclear, consider visiting the previous parts once more.

Fantastic, let’s complete this series!

Just to briefly remind the tools that we use:

And what type of models and how we trained:

  1. ARIMA statistical model — predicts next value
  2. Convolutional Neural Network — predicts next value
  3. Long Short-Term Memory Neural Network — reconstructs current value

Anomaly Detection with Static and Dynamic Threshold

Amazing, we trained all three models! But every line of the code before was just the preparation for the anomaly detection.

So just after a very small amount of additional preparations, we will be able to finally detect anomalies.

What exactly lies behind these “additional preparations”? These things (remember the “Saying what we want from our models out loud” from Part I?):

  1. Calculation of the errors for each item in datasets
  2. Threshold calculation based on errors

And then we will be able to detect anomalies extremely fast like literally just comparing errors with the threshold.

What are we waiting for? We’re ready for this!

The error calculation differs for each model due to different implementations of Datasets. The algorithm stays the same.

ARIMA’s errors calculation

Just as a reminder — we use absolute error for ARIMA because it causes better results. And if you are wondering how we came to this, the answer is — we just tried and it worked.

CNN’s errors calculation

LSTM’s errors calculation

Threshold calculation — common for all three models

Static threshold

This threshold is calculated with this formula of the three-sigma rule.

Dynamic threshold

For the dynamic threshold, we will need two more parameters — window inside which we will calculate threshold and std_coef that we will use instead of 3 from the static threshold formula.

  • For ARIMA window=40 and std_coef=5
  • For CNN and LSTM window=40 and std_coef=6

These two parameters are empirically chosen for each model using only the training data.

You may wonder — “Why does he always emphasize the usage of only training data? Why can’t I also use validation to choose better parameters?”.
The reason why we use just training data for choosing the parameters of our models is that this is the only way we can be sure that our models will work on data from the real world outside the training dataset. The validation part of the dataset imitates such real-world data and provides a better understanding of models’ capabilities because we know — it wasn’t used to train our models.

Let’s get down to business! Here is the code to calculate the dynamic threshold:

And the last element to fulfill our puzzle is metrics calculation. What kind of metrics? I am glad you asked. We calculate every base metric to fully analyze the models’ performance:

  • Confusion matrix to see how a model performs in detail
  • Precision to see how precisely our model predicts
  • Recall to see how a model detects true anomalies
  • F2-score to see the combined precision and recall, we are using F2 instead of F1 because detection of true anomalies is more important than avoiding false anomalies (recall is more important than precision)

Excellent! We can move to the piece of code with exact anomaly detection.

ARIMA with static threshold

For each model, we are going to filter errors with the given threshold and then simply return indexes of unfiltered ones. These unfiltered values we will consider as detected anomalies!

And of course, we are going to visualize everything that we detected! (By still using the same code from the Part I)

Detected anomalies on training data

Detected anomalies on validation data

We will leave the metrics until the results part. But here are the code and printed confusion matrices:

Confusion matrix for training data

Confusion matrix for validation data

Yeah, seems not so good (because of many falsely detected anomalies), but it still catches every anomaly.

Let’s do the same for the dynamic threshold and see if it can change the situation.

Detected anomalies on training data

Detected anomalies on validation data

The code for metrics is the same, so we can skip it and take a look at the confusion matrices.

Confusion matrix for training data

Confusion matrix for validation data

Well, these look much better (no more huge amount of incorrectly detected anomalies)! A tough baseline for our neural nets!

NN’s anomaly detection

For both neural nets, we will provide a unified generic function for anomaly detection.

And that’s it! We can effortlessly process the results of neural nets.

CNN with static threshold

Detected anomalies on training data

Detected anomalies on validation data

Confusion matrix for training data

Confusion matrix for validation data

It seems that our CNN model overfitted — it has an enormous amount of incorrect anomalies — but there is no need to make hasty decisions, it is better to look onto results with the dynamic threshold.

And let’s do the same with the dynamic threshold:

Detected anomalies on training data

Detected anomalies on validation data

The metrics calculation is still the same.

Confusion matrix for training data

Confusion matrix for validation data

These results are better than ARIMA’s. We already can say that we didn’t waste time on this!

And the last model (but certainly not the least) is LSTM.

LSTM with static threshold

Detected anomalies on training data

Detected anomalies on validation data

Once again, metrics calculations are identical to CNN’s.

Confusion matrix for training data

 

Confusion matrix for validation data

Here we have the same situation as with CNN, but now we know that the dynamic threshold will reveal the truth!

LSTM with dynamic threshold

Detected anomalies on training data

Detected anomalies on validation data

Confusion matrix for training data

Confusion matrix for validation data

And the dynamic evaluation certainly made a near-perfect detector out of our LSTM model.

Real-time evaluation with static/dynamic threshold

If it is hard to figure out from the code how to use these models in real-life data (and this is normal), so here are some visualizations of the real-time evaluation:

Evaluation with the static threshold (gif)

The top chart shows the original data with true anomalies and detected anomalies. On the bottom chart, we can see the error of a model with the purple static threshold line.

And here is the visualization of the same process with the dynamic threshold.

Evaluation with the dynamic threshold (gif)

As you can see, the dynamic threshold adapts to the dispersion of the error. That is why the threshold is low when the error deviates a bit and high otherwise.

Results of the models

Finally, we can compare the metrics to be sure that we correctly put the LSTM onto the first place. We are using F2-score to decide which model is the best. Precision and recall are shown separated for the understanding of weak and strong sides of our models.

Results with the static threshold

Results with the dynamic threshold

However, ARIMA performs slightly better with the static threshold, and the neural networks outperforms it with dynamic threshold — especially LSTM.

Ultimate Conclusion

Lastly, I would like to emphasize that these models can already be taken for production with not so much effort.

Nevertheless, these models are far from their limits and can be enhanced via:

  1. Increasing the amount of training data
  2. Adding other metrics such as memory, network, etc
  3. Combination of LSTM and CNN architectures
  4. Feature Engineering

Thank you very much for your attention, I hope that this tutorial gave you some understanding and hints on implementation.

And don’t stop looking for anomalies!

This article was written by Artur Khanin, a Technical Project Lead at Akvelon’s Kazan office, and was originally published on BecomingHuman.AI.

 

Artur Khanin is a Technical Project Lead at Akvelon’s Kazan office.

This project was designed and implemented with love by the following members of Akvelon’s team:

Team Lead — Artur Khanin
Delivery Manager —
 Sergei Volynkin
Technical Account Manager —
 Max Kostin
ML Engineers — 
Irina Nikolaeva, Rustem Saitgareev