Beyond Pattern Matching: Seven Cross-Domain Techniques for Prompt Injection Detection
TLDR
This paper introduces seven novel cross-domain techniques for prompt injection detection, outperforming existing pattern matching and fine-tuned classifiers.
Key contributions
- Introduces seven cross-domain prompt injection detection techniques from diverse fields.
- Addresses limitations of regex and fine-tuned classifiers against paraphrased and adaptive attacks.
- Implements three techniques, showing local-alignment boosts F1 from 0.033 to 0.378 on deepset.
- Stylometric detector adds 11.1% F1 on indirect-injection benchmarks; all code is open-sourced.
Why it matters
Existing prompt injection detectors are vulnerable to adaptive attacks and paraphrasing. This paper introduces a novel cross-domain approach, porting techniques from diverse fields to create more robust defenses. This significantly advances LLM security by offering more effective and resilient detection methods.
Original Abstract
Current open-source prompt-injection detectors converge on two architectural choices: regular-expression pattern matching and fine-tuned transformer classifiers. Both share failure modes that recent work has made concrete. Regular expressions miss paraphrased attacks. Fine-tuned classifiers are vulnerable to adaptive adversaries: a 2025 NAACL Findings study reported that eight published indirect-injection defenses were bypassed with greater than fifty percent attack success rates under adaptive attacks. This work proposes seven detection techniques that each port a specific mechanism from a discipline outside large-language-model security: forensic linguistics, materials-science fatigue analysis, deception technology from network security, local-sequence alignment from bioinformatics, mechanism design from economics, spectral signal analysis from epidemiology, and taint tracking from compiler theory. Three of the seven techniques are implemented in the prompt-shield v0.4.1 release (Apache 2.0) and evaluated in a four-configuration ablation across six datasets including deepset/prompt-injections, NotInject, LLMail-Inject, AgentHarm, and AgentDojo. The local-alignment detector lifts F1 on deepset from 0.033 to 0.378 with zero additional false positives. The stylometric detector adds 11.1 percentage points of F1 on an indirect-injection benchmark. The fatigue tracker is validated via a probing-campaign integration test. All code, data, and reproduction scripts are released under Apache 2.0.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.