Toward Autonomous SOC Operations: End-to-End LLM Framework for Threat Detection, Query Generation, and Resolution in Security Operations

April 30, 20262604.27321

cs.CRcs.AIcs.IR

TLDR

An end-to-end LLM framework automates SOC threat detection, query generation, and incident resolution, drastically reducing triage time.

Key contributions

Developed an ensemble LLM detection module achieving 82.8% accuracy and 0.120 FPR on SIEM logs.
Introduced SQM architecture for automated, syntax-constrained SIEM query generation, outperforming baselines.
Improved incident resolution prediction to 90.0% by integrating SQM-derived evidence.

Why it matters

This paper tackles critical SOC challenges by automating complex workflows. It demonstrates how domain-constrained LLMs with retrieval augmentation significantly enhance threat detection, evidence collection, and incident resolution, leading to substantial operational efficiency and reduced triage times.

Original Abstract

Security Operations Centers (SOCs) face mounting operational challenges. These challenges come from increasing threat volumes, heterogeneous SIEM platforms, and time-consuming manual triage workflows. We present an end-to-end threat management framework that integrates ensemble-based detection, syntax-constrained query generation, and retrieval-augmented resolution support to automate critical security workflows. Our detection module evaluates both traditional machine learning classifiers and large language models (LLMs), then combines the three best-performing LLMs to create an ensemble model, achieving 82.8% accuracy while maintaining 0.120 false positive rate on SIEM logs. We introduce the SQM (Syntax Query Metadata) architecture for automated evidence collection. It uses platform-specific syntax constraints, metadata-based retrieval, and documentation-grounded prompting to generate executable queries for IBM QRadar and Google SecOps. SQM achieves a BLEU score of 0.384 and a ROUGE-L score of 0.731. These results are more than twice as good as the baseline LLM performance. For incident resolution and recommendation generation, we demonstrate that integrating SQM-derived evidence improves resolution code prediction accuracy from 78.3% to 90.0%, with an overall recommendation quality score of 8.70. In production SOC environments, our framework reduces average incident triage time from hours to under 10 minutes. This work demonstrates that domain-constrained LLM architectures with retrieval augmentation can meet the strict reliability and efficiency requirements of operational security environments at scale.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers