Context-Aware Web Attack Detection in Open-Source SIEM Systems via MITRE ATT&CK-Enriched Behavioral Profiling
Badr Alboushy, Assef Jafar, Mohamad Aljnidi, Mohamad Bashar Disoki, Aref Shaheed
TLDR
Smart-SIEM enhances open-source SIEMs with an AI module for context-aware web attack detection using behavioral profiling and MITRE ATT&CK.
Key contributions
- Introduces a per-source-IP behavioral context vector using HTTP, rule activations, and MITRE ATT&CK.
- Presents a two-stage hybrid AI cascade (LightGBM + XGBoost) for attack detection and categorization.
- Achieves significant F1 score improvements (up to +0.324) over traditional SIEM methods.
- Includes a self-adaptive retraining mechanism to counter concept drift from new attack types.
Why it matters
Traditional SIEMs struggle with multi-step web attacks due to lack of behavioral context. Smart-SIEM addresses this by integrating AI and context-aware profiling into open-source platforms like Wazuh, drastically improving detection rates for complex threats. Its adaptive nature ensures continued effectiveness against evolving attack patterns.
Original Abstract
Security Information and Event Management (SIEM) systems aggregate log data from heterogeneous sources to detect coordinated attacks. Traditional rule-based correlation engines struggle to classify multi-step web application attacks because they examine each event without reference to the behavioural history of the originating host. We present Smart-SIEM, an AI module for the open-source Wazuh SIEM platform with two contributions: (1) a per-source-IP behavioural context vector encoding HTTP response-status distributions, peak rule activation counts, and MITRE ATT&CK technique frequencies from the N most recent prior events; (2) a two-stage hybrid cascade combining LightGBM for binary attack detection and XGBoost for six-class attack categorisation. Evaluated on 46,454 purpose-built Wazuh security events, context features improve all tested gradient boosting algorithms from ~0.705 macro F1 to 0.947-0.967 (Stage 1) and 0.876-0.914 (Stage 2), an average gain of +0.254 and +0.324 respectively. The hybrid cascade achieves F1 of 0.967 (binary) and 0.914 (six-class). Wazuh's native rule engine detects 0% of Brute Force and Broken Authentication events; the AI module detects 100% and 98.3% respectively. A self-adaptive retraining mechanism recovers from concept drift: F1 drops from 0.905 to 0.465 when unseen attack types emerge, recovering to 0.814 after retraining on the combined corpus.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.