CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations

April 14, 20262604.13024

cs.LGcs.DB

TLDR

CLAD is a deep learning framework for log anomaly detection that operates directly on compressed byte streams, eliminating decompression overhead.

Key contributions

Introduces CLAD, the first deep learning framework for LAD directly on compressed byte streams.
Proposes a novel architecture with a dilated convolutional encoder and hybrid Transformer-mLSTM.
Employs a two-stage training strategy for masked pre-training and focal-contrastive fine-tuning.
Achieves state-of-the-art F1-score (0.9909) while eliminating decompression and parsing overheads.

Why it matters

Existing log anomaly detection methods are inefficient due to severe pre-processing overheads. CLAD offers a robust, accurate, and efficient solution by processing logs directly in their compressed form, which is crucial for large-scale streaming systems.

Original Abstract

The explosive growth of system logs makes streaming compression essential, yet existing log anomaly detection (LAD) methods incur severe pre-processing overhead by requiring full decompression and parsing. We introduce CLAD, the first deep learning framework to perform LAD directly on compressed byte streams. CLAD bypasses these bottlenecks by exploiting a key insight: normal logs compress into regular byte patterns, while anomalies systematically disrupt them. To extract these multi-scale deviations from opaque bytes, we propose a purpose-built architecture integrating a dilated convolutional byte encoder, a hybrid Transformer--mLSTM, and four-way aggregation pooling. This is coupled with a two-stage training strategy of masked pre-training and focal-contrastive fine-tuning to effectively handle severe class imbalance. Evaluated across five datasets, CLAD achieves a state-of-the-art average F1-score of 0.9909 and outperforms the best baseline by 2.72 percentage points. It delivers superior accuracy while completely eliminating decompression and parsing overheads, offering a robust solution that generalizes to structured streaming compressors.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers