ArXiv TLDR

CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations

🐦 Tweet
2604.13024

Benzhao Tang, Shiyu Yang

cs.LGcs.DB

TLDR

CLAD is a deep learning framework for log anomaly detection that operates directly on compressed byte streams, eliminating decompression overhead.

Key contributions

  • Introduces CLAD, the first deep learning framework for LAD directly on compressed byte streams.
  • Proposes a novel architecture with a dilated convolutional encoder and hybrid Transformer-mLSTM.
  • Employs a two-stage training strategy for masked pre-training and focal-contrastive fine-tuning.
  • Achieves state-of-the-art F1-score (0.9909) while eliminating decompression and parsing overheads.

Why it matters

Existing log anomaly detection methods are inefficient due to severe pre-processing overheads. CLAD offers a robust, accurate, and efficient solution by processing logs directly in their compressed form, which is crucial for large-scale streaming systems.

Original Abstract

The explosive growth of system logs makes streaming compression essential, yet existing log anomaly detection (LAD) methods incur severe pre-processing overhead by requiring full decompression and parsing. We introduce CLAD, the first deep learning framework to perform LAD directly on compressed byte streams. CLAD bypasses these bottlenecks by exploiting a key insight: normal logs compress into regular byte patterns, while anomalies systematically disrupt them. To extract these multi-scale deviations from opaque bytes, we propose a purpose-built architecture integrating a dilated convolutional byte encoder, a hybrid Transformer--mLSTM, and four-way aggregation pooling. This is coupled with a two-stage training strategy of masked pre-training and focal-contrastive fine-tuning to effectively handle severe class imbalance. Evaluated across five datasets, CLAD achieves a state-of-the-art average F1-score of 0.9909 and outperforms the best baseline by 2.72 percentage points. It delivers superior accuracy while completely eliminating decompression and parsing overheads, offering a robust solution that generalizes to structured streaming compressors.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.