Toward Autonomous SOC Operations: End-to-End LLM Framework for Threat Detection, Query Generation, and Resolution in Security Operations
TLDR
An end-to-end LLM framework automates SOC threat detection, query generation, and incident resolution, drastically reducing triage time.
Key contributions
- Developed an ensemble LLM detection module achieving 82.8% accuracy and 0.120 FPR on SIEM logs.
- Introduced SQM architecture for automated, syntax-constrained SIEM query generation, outperforming baselines.
- Improved incident resolution prediction to 90.0% by integrating SQM-derived evidence.
Why it matters
This paper tackles critical SOC challenges by automating complex workflows. It demonstrates how domain-constrained LLMs with retrieval augmentation significantly enhance threat detection, evidence collection, and incident resolution, leading to substantial operational efficiency and reduced triage times.
Original Abstract
Security Operations Centers (SOCs) face mounting operational challenges. These challenges come from increasing threat volumes, heterogeneous SIEM platforms, and time-consuming manual triage workflows. We present an end-to-end threat management framework that integrates ensemble-based detection, syntax-constrained query generation, and retrieval-augmented resolution support to automate critical security workflows. Our detection module evaluates both traditional machine learning classifiers and large language models (LLMs), then combines the three best-performing LLMs to create an ensemble model, achieving 82.8% accuracy while maintaining 0.120 false positive rate on SIEM logs. We introduce the SQM (Syntax Query Metadata) architecture for automated evidence collection. It uses platform-specific syntax constraints, metadata-based retrieval, and documentation-grounded prompting to generate executable queries for IBM QRadar and Google SecOps. SQM achieves a BLEU score of 0.384 and a ROUGE-L score of 0.731. These results are more than twice as good as the baseline LLM performance. For incident resolution and recommendation generation, we demonstrate that integrating SQM-derived evidence improves resolution code prediction accuracy from 78.3% to 90.0%, with an overall recommendation quality score of 8.70. In production SOC environments, our framework reduces average incident triage time from hours to under 10 minutes. This work demonstrates that domain-constrained LLM architectures with retrieval augmentation can meet the strict reliability and efficiency requirements of operational security environments at scale.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.