Log-based vs Graph-based Approaches to Fault Diagnosis
Mathis Nguyen, Mohamed Ali Lajnef
TLDR
This paper compares log-based and graph-based fault diagnosis, finding that integrating log encoders into graph models achieves the best performance.
Key contributions
- Compares log-based encoder architectures (e.g., BERT) and graph-based models (e.g., GNNs) for fault diagnosis.
- Evaluates models on trace-oriented (TraceBench) and traditional system log (BGL) datasets.
- Finds graph-only models fail to outperform log encoder baselines in fault diagnosis.
- Shows integrating log encoder representations into graph models achieves the strongest overall performance.
Why it matters
This study provides crucial insights into fault diagnosis for distributed systems, showing the limitations of graph-only models and the power of hybrid approaches. It guides future research towards integrating diverse log representations for improved system reliability.
Original Abstract
Modern distributed systems generate large volumes of logs that can be analyzed to support essential AIOps tasks such as fault diagnosis, which plays a crucial role in maintaining system reliability. Most existing approaches rely on log-based models that treat logs as linear sequences of events. However, such representations discard the structural context between events that are often present in execution logs, such as parent-child dependencies, fan-out (branching), or temporal features. To better capture these relationships, recent works on Graph Neural Networks (GNNs) suggest that representing logs as graphs offers a promising alternative. Building on these observations, this paper conducts a comparative study of log-based encoder architectures (e.g., BERT) and graph-based models (e.g., GNNs) for automated fault diagnosis. We evaluate our models on TraceBench, a trace-oriented log dataset, and on BGL, a more traditional system log dataset, covering both anomaly detection and fault type classification. Our results show that graph-only models fail to outperform encoder baselines. However, integrating learned representations from log encoders into graph-based models achieves the strongest overall performance. These findings highlight conditions under which graph-augmented architectures can outperform traditional log-based approaches.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.