ArXiv TLDR
← All categories

Cryptography & Security

Research on AI security, adversarial attacks, privacy, and cryptographic methods.

cs.CR · 505 papers

Security Incentivization: An Empirical Study of how Micropayments Impact Code Security

This study shows that team-level incentives tied to automated security metrics significantly improve code security in development teams.

2605.13100May 13, 2026Stefan Rass, Martin Pinzger, Rainer W. Alexandrowicz +5

TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

TextSeal is a new LLM watermark using dual-key generation and multi-region localization for robust, distortion-free detection and distillation protection.

2605.12456May 12, 2026Tom Sander, Hongyan Chang, Tomáš Souček +10

Attacks and Mitigations for Distributed Governance of Agentic AI under Byzantine Adversaries

This paper analyzes attacks on agentic AI governance from compromised centralized providers and proposes Byzantine-resilient, monitoring, and auditing solutions.

2605.12364May 12, 2026Matthew D. Laws, Alina Oprea, Cristina Nita-Rotaru

Reconstruction of Personally Identifiable Information from Supervised Finetuned Models

This paper reveals that PII can be reconstructed from supervised finetuned LLMs, proposing COVA to enhance reconstruction under prefix attacks.

2605.12264May 12, 2026Sae Furukawa, Alina Oprea

No More, No Less: Task Alignment in Terminal Agents

A new benchmark, TAB, reveals terminal agents struggle with selectively following relevant instructions while ignoring distractors, highlighting a gap in task alignment.

2605.12233May 12, 2026Sina Mavali, David Pape, Jonathan Evertz +5

ACTING: A Platform for Cyber Ranges Federation

ACTING is a platform that uses a new language (EDL-FG) for federated cyber ranges, enabling automated, multi-domain cyber defense training and evaluation.

2605.12170May 12, 2026Kyriakos Christou, Maria Michalopoulou, Stefano Taggi +21

PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior

PrivacySIM evaluates LLMs' ability to simulate individual privacy decisions, finding persona conditioning improves accuracy but models still struggle.

2605.12147May 12, 2026James Flemings, Murali Annavaram

The Deepfakes We Missed: We Built Detectors for a Threat That Didn't Arrive

Deepfake detection research is misaligned, focusing on public figure manipulation while real threats are NCII, voice scams, and emotional fraud.

2605.12075May 12, 2026Shaina Raza

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

SkillSafetyBench evaluates how reusable skills in LLM agents create new attack surfaces, revealing vulnerabilities beyond model-level alignment.

2605.12015May 12, 2026Chang Jin, An Wang, Zeming Wei +7

A microservices-based endpoint monitoring platform with predictive NLP models for real-time security and hate-speech risk alerting

A microservices platform uses predictive NLP to provide real-time security and hate-speech risk alerts from endpoint data, unifying monitoring and analytics.

2605.11997May 12, 2026Darlan Noetzold, Anubis Graciela De Moraes Rossetto, Juan Francisco De Paz Santana +1

AccLock: Unlocking Identity with Heartbeat Using In-Ear Accelerometers

AccLock passively authenticates users via unique in-ear heartbeat signals captured by accelerometers, overcoming limitations of prior systems.

2605.11901May 12, 2026Lei Wang, Jiangxuan Shen, Xi Zhang +6

Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems

Proteus is a self-evolving red-team framework that uncovers adaptive leakage in LLM agent skills, showing current vetting underestimates risk.

2605.11891May 12, 2026Zhaojiacheng Zhou

IPI-proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents Against Indirect Prompt Injection

IPI-proxy is an intercepting proxy for red-teaming web-browsing AI agents against indirect prompt injection by rewriting whitelisted HTTP responses.

2605.11868May 12, 2026Chia-Pei, Chen, Kentaroh Toyoda +2

Five Attacks on x402 Agentic Payment Protocol

This paper identifies five practical attacks on the x402 agentic payment protocol, revealing critical vulnerabilities in its design and implementation.

2605.11781May 12, 2026Zelin Li, Qin Wang, Zhipeng Wang

Behavioral Integrity Verification for AI Agent Skills

This paper introduces Behavioral Integrity Verification (BIV) to audit AI agent skills, finding widespread deviations and improving malicious skill detection.

2605.11770May 12, 2026Yuhao Wu, Tung-Ling Li, Hongliang Liu

Persona-Conditioned Adversarial Prompting: Multi-Identity Red-Teaming for Adversarial Discovery and Mitigation

PCAP uses diverse personas for red-teaming LLMs, significantly boosting attack success and generating robust defense data for improved safety.

2605.11730May 12, 2026Cristian Morasso, Anisa Halimi, Muhammad Zaid Hameed +1

HySecTwin: A Knowledge-Driven Digital Twin Framework Augmented with Hybrid Reasoning for Cyber-Physical Systems

HySecTwin is a knowledge-driven digital twin framework using hybrid reasoning for real-time, interpretable cybersecurity threat detection in Cyber-Physical Systems.

2605.11682May 12, 2026David Holmes, Ahmad Moshin, Surya Nepal +2

Cochise: A Reference Harness for Autonomous Penetration Testing

Cochise is a minimal Python reference harness for LLM-driven autonomous penetration testing, providing reusable infrastructure for research and comparison.

2605.11671May 12, 2026Andreas Happe, Jürgen Cito

Options, Not Clicks: Lattice Refinement for Consent-Driven MCP Authorization

Conleash is a client-side middleware that uses a risk lattice and policy engine to provide consent-driven, boundary-scoped authorization for MCP tool invocations.

2605.11360May 12, 2026Ying Li, Yanju Chen, Peiran Wang +4

Natural Language based Specification and Verification

This paper explores using LLMs to generate and verify code implementations based on natural language specifications, showing promising preliminary results.

2605.11315May 11, 2026Zhaorui Li, Chengyu Song
PreviousPage 2 of 26Next

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.