ExchangeDEX+

Buy Crypto Markets Spot Futures500X Earn Events

Gold Bar & BTC Giveaway2000g

This article explores various formulations and methodologies for log-based anomaly detection, including binary classification, prediction, masked log modeling, and clustering. It contrasts supervised and unsupervised approaches, highlighting trade-offs between labeled accuracy and real-world practicality. The paper reviews how contextual, sequential, temporal, and semantic information from log data influences detection accuracy and discusses empirical studies comparing traditional versus deep-learning methods. Ultimately, the research proposes a Transformer-based anomaly detection model capable of capturing richer log features, offering a more holistic understanding of how AI identifies system anomalies across diverse datasets.This article explores various formulations and methodologies for log-based anomaly detection, including binary classification, prediction, masked log modeling, and clustering. It contrasts supervised and unsupervised approaches, highlighting trade-offs between labeled accuracy and real-world practicality. The paper reviews how contextual, sequential, temporal, and semantic information from log data influences detection accuracy and discusses empirical studies comparing traditional versus deep-learning methods. Ultimately, the research proposes a Transformer-based anomaly detection model capable of capturing richer log features, offering a more holistic understanding of how AI identifies system anomalies across diverse datasets.

An Overview of Log-Based Anomaly Detection Techniques

2025/11/04 01:52

Table of links

Abstract

1 Introduction

2 Background and Related Work

2.1 Different Formulations of the Log-based Anomaly Detection Task

2.2 Supervised v.s. Unsupervised

2.3 Information within Log Data

2.4 Fix-Window Grouping

2.5 Related Works

3 A Configurable Transformer-based Anomaly Detection Approach

3.1 Problem Formulation

3.2 Log Parsing and Log Embedding

3.3 Positional & Temporal Encoding

3.4 Model Structure

3.5 Supervised Binary Classification

4 Experimental Setup

4.1 Datasets

4.2 Evaluation Metrics

4.3 Generating Log Sequences of Varying Lengths

4.4 Implementation Details and Experimental Environment

5 Experimental Results

5.1 RQ1: How does our proposed anomaly detection model perform compared to the baselines?

5.2 RQ2: How much does the sequential and temporal information within log sequences affect anomaly detection?

5.3 RQ3: How much do the different types of information individually contribute to anomaly detection?

6 Discussion

7 Threats to validity

8 Conclusions and References

2 Background and Related Work

2.1 Different Formulations of the Log-based Anomaly Detection Task

Previous works formulate the log-based anomaly detection task differently. Generally, the common formulations can be classified into the following categories.

Binary Classification The most common way to formulate the log-based anomaly detection task is to transform it into a binary classification task where machine learning models are used to classify logs or log sequences into anomalies and normal samples [1]. Both supervised [18–20] and unsupervised [8] classifiers can be used under this formulation. In unsupervised schemes, a threshold is usually employed to determine whether it is an anomaly based on the degree of pattern violation.

Future Event Prediction There are also some approaches that formulate the anomaly detection task as a prediction task [10]. Usually, sequential models are trained to predict the potential future events given the past few logs within a fixed window frame. In the predicting phase, the models are expected to generate a prediction with Top-N probable candidates for a future event. If the real event is not among the predicted candidates, the unexpected log is considered an anomaly which violates the normal pattern of log sequences.

\ Masked Log Prediction The log-based anomaly detection task can also be formulated as a masked log prediction task [21], where models trained with normal log sequence data are expected to predict the randomly masked log events in a log sequence. Similar to future event prediction, a log sequence is considered normal if the actual log events that appeared in log sequences are among the predicted candidates.

\ Others

Some works formulate the anomaly detection task as a clustering task, where feature vectors of normal and abnormal log sequences are expected to fall into different clusters [22]. The prediction of the label for the log sequence is determined based on the distance between the sequence to be processed and the centroids of the clusters. Moreover, there are previous approaches that utilize invariant mining [9] to tackle the task. They identify anomalies by discerning pattern violations of feature vectors of log sequences.

\ 2.2 Supervised v.s. Unsupervised

Another dimension of the formulations of the anomaly detection tasks is based on the training mechanisms. Supervised anomaly detection methods demand labeled logs as training data to learn to discern abnormal samples from normal ones, while unsupervised methods learn from the normal pattern from normal log data and do not require labels in the model training process. Unsupervised methods offer greater practicality as we do not usually have access to well-annotated log data. However, supervised methods usually achieve superior and more stable performance according to previous empirical studies.

\ 2.3 Information within Log Data

Generally, log data that is formed by sequences of log events contains various types of information. Within a log sequence, the occurrences of logs from different templates serve as a context and are a distinctive feature for log sequences. Similar to the Bag-of-Words model, numerical presentation based on the frequency of the template occurrences can represent log sequences and be used in anomaly detection. Various works [1] utilize the MCV to represent this information. Moreover, the sequential information within the log items provides richer information about the occurrences of logs and probably reflects the execution sequence of applications and services. DeepLog [10] uses a LSTM model to encode the sequential information. Furthermore, the temporal information from the log data provides even richer details about the occurrence of logs. The time intervals between log events may offer valuable insights into anomaly detection and other log analysis tasks about the system status, workload, and potential blocks. Du et al. [10] tried to utilize this information in a Parameter Value Anomaly Detection model for anomaly detection.

Besides, textual or semantic information provided by log messages has garnered significant attention in recent studies [5, 11, 12]. Given the inherent nature of log data, log messages written by developers articulate crucial information in natural language regarding the system’s operations, errors, and events, making them valuable for troubleshooting and system analysis. Various natural language processing techniques are employed to extract textual features and generate embeddings for log messages. From basic numerical statistics such as TF-IDF to word embedding techniques like Word2Vec, and advancing to advanced contextual embedding methods like BERT, these advancements are geared towards more accurately capturing the semantic information contained within log messages. Their objective is to distinguish between unrelated logs and connect similar ones, thereby supplying more informative and distinguishable features for subsequent downstream models.

\ In addition, the parameters carried by the log messages offer more diverse information about the systems. However, as most parameters are system-specific and lack a consistent format or range, deciding on the best way to model the information from different parameters is a formidable challenge. In most previous works, the parameters that are usually numbers and tokens are removed in pre-processing stages. In DeepLog [10], a parameter value anomaly detection model for each log key (i.e., log template) is used to detect anomalies associated with parameter values as an auxiliary measure to the log key anomaly detection model. In a more recent study [12], a parameter encoding module is employed to produce character-level encodings for parameters. Following this, each output is assigned a learnable scalar, which functions as a bias term within the self-attention mechanism. Moreover, log data generated by various systems and applications often contains system-specific information that may require domain-specific knowledge and tailored approaches to optimize the performance of downstream tasks.

\ 2.4 Fix-Window Grouping

Available public datasets for log-based anomaly detection have either sequence-level or event-level annotations. For the datasets that do not have a grouping identifier, fix-length or fix-time grouping is often employed in the pre-processing process to form log sequences that can be processed by log representation techniques and anomaly detection models. Various grouping settings have been used in previous studies for public datasets [1]. The different grouping settings generate different amounts of samples and varying contextual windows of log data, making direct comparisons of their performance impossible. Moreover, the logs are not generated with fixed rates or fixed lengths. Using fixed-window grouped log sequences for training and testing samples does not align with the actual scenarios.

2.5 Related Works

Recent empirical studies on log-based anomaly detection aim to deepen the understanding of the existing log-based anomaly detection models and the public datasets for evaluation. They focus on several issues. Le et al. [15] conducted an in-depth analysis of recent deep-learning anomaly detection models over several aspects of model evaluation. Their findings suggest that different settings of stages in anomaly detection would greatly impact the evaluation process. Therefore, using diverse datasets and analyzing logical relationships between logs are important for assessing log-based anomaly detection approaches.

\ Wu et al. [7] conducted an empirical study on vectorization (i.e., representation) techniques for log-based anomaly detection. They evaluated the effectiveness of some existing classical and semantic-based techniques with different anomaly detection models. Their experimental results suggest that the classical ways of transforming textual logs into feature vectors can achieve competitive results with more complex semantic embeddings. A more recent work [23] compared classical and deep-learning approaches of log-based anomaly detection methods. Their experimental results also suggest that simple models can outperform complex log vectorization methods. The deep learning approaches fail to surpass the simpler techniques. Their work highlights the need to critically analyze the datasets used in evaluation. Moreover, Landauer et al. [16] critically reviewed the common log datasets used to evaluate anomaly detection techniques. Their analysis of the log datasets suggests that most anomalies are not directly associated with sequential information within the log sequence. Sophisticated detection methods are unnecessary for attaining excellent detection performance. Their findings also highlight the creation of new datasets that incorporate sequential anomalies for evaluating anomaly detection approaches.

\ In our work, we proposed a Transformer-based anomaly detection model capable of capturing sequential and temporal information within the log sequence, in addition to event occurrence and semantic information. Due to the flexibility of the proposed model, we can easily utilize various combinations of log features as input for our evaluations. Through a series of carefully designed experiments, we scrutinized the four common public datasets and deepened our understanding of the roles of different types of information in identifying anomalies within the log sequence. Our findings are generally in accordance with the previous empirical studies. However, our analysis offers a more comprehensive and detailed understanding of the anomaly detection task and the studied public datasets.

:::info Authors:

Xingfang Wu
Heng Li
Foutse Khomh

:::

:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.