ArXiv TLDR
โ† All categories

Cryptography & Security

Research on AI security, adversarial attacks, privacy, and cryptographic methods.

cs.CR ยท 505 papers

OrchJail: Jailbreaking Tool-Calling Text-to-Image Agents by Orchestration-Guided Fuzzing

OrchJail is a new fuzzing framework that jailbreaks tool-calling T2I agents by exploiting unsafe tool orchestration patterns, improving attack effectiveness.

2605.07414May 8, 2026Jianming Chen, Yawen Wang, Junjie Wang +3

From Conceptual Scaffold to Prototype: A Standardized Zonal Architecture for Wi-Fi Security Training

This paper introduces a standardized zonal architecture and open-source prototype for a Wi-Fi-focused Cyber Range to improve security training.

2605.07400May 8, 2026Vyron Kampourakis, Efstratios Chatzoglou, Vasileios Gkioulos +1

Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches

Patch2Vuln uses a language model agent to reconstruct vulnerabilities from Linux binary patches, evaluated on Ubuntu packages.

2605.06601May 7, 2026Isaac David, Arthur Gervais

FedAttr: Towards Privacy-preserving Client-Level Attribution in Federated LLM Fine-tuning

FedAttr enables privacy-preserving client-level attribution in federated LLM fine-tuning to detect data ownership violations without compromising privacy.

2605.06596May 7, 2026Su Zhang, Junfeng Guo, Heng Huang

CLAD: A Clustered Label-Agnostic Federated Learning Framework for Joint Anomaly Detection and Attack Classification

CLAD is a federated learning framework for IoT security, combining clustered FL and a dual-mode architecture for anomaly detection and attack classification.

2605.06571May 7, 2026Iason Ofeidis, Nikos Papadis, Randeep Bhatia +2

On the Security of Research Artifacts

This paper reveals that many research artifacts contain security vulnerabilities, proposing a framework (SAFE) to assess and mitigate these risks.

2605.06508May 7, 2026Nanda Rani, Christian Rossow

PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization

PACZero introduces a novel PAC-private zeroth-order method for fine-tuning LLMs, achieving strong privacy ($I=0$) with usable utility via sign quantization.

2605.06505May 7, 2026Murat Bilgehan Ertan, Xiaochen Zhu, Phuong Ha Nguyen +2

Privacy by Postprocessing the Discrete Laplace Mechanism

This paper shows the discrete Laplace mechanism can be post-processed for unbiased estimation and distribution matching, making it versatile for discrete data.

2605.06502May 7, 2026Quentin Hillebrand, Jacob Imola, Rasmus Pagh +1

Autonomous Adversary: Red-Teaming in the age of LLM

This paper explores Language Model Agents (LMAs) for red-teaming, benchmarking their effectiveness in lateral movement scenarios and identifying key limitations.

2605.06486May 7, 2026Mohammad Mamun, Mohamed Gaber, Scott Buffett +1

Pop Quiz Attack: Black-box Membership Inference Attacks Against Large Language Models

Introduces PopQuiz, a black-box membership inference attack that turns data into quizzes to reveal if LLMs memorized specific training examples.

2605.06423May 7, 2026Zeyuan Chen, Yihan Ma, Xinyue Shen +2

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

This paper proposes TEE-backed isolation to constrain host-level abuse in self-hosted computer-use agents, preventing unsafe operations.

2605.06393May 7, 2026Di Lu, Bo Zhang, Xiyuan Li +5

Fine-Tuning Small Language Models for Solution-Oriented Windows Event Log Analysis

Fine-tuned Small Language Models (SLMs) outperform LLMs for Windows event log analysis, providing actionable solutions with fewer computational resources.

2605.06330May 7, 2026Siraaj Akhtar, Saad Khan, Simon Parkinson

Gaming the Metric, Not the Harm: Certifying Safety Audits against Strategic Platform Manipulation

This paper shows how online safety metrics can be gamed by platforms using content variants and proposes a robust "semantic-envelope" metric to certify true harm reduction.

2605.06324May 7, 2026Florian A. D. Burnat, Brittany I. Davidson

Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds

This paper provides a tight, transparent analysis of the privacy-utility trade-off for DP-SGD using random shuffling subsampling.

2605.06259May 7, 2026Marten van Dijk, Murat Bilgehan Ertan

Profiling for Pennies: Unveiling the Privacy Iceberg of LLM Agents

LLM agents can create detailed personal profiles cheaply and quickly, exposing significant privacy risks due to platform failures and lack of awareness.

2605.06232May 7, 2026Jiahao Chen, Qi Zhang, Ruixiao Lin +7

ClawGuard: Out-of-Band Detection of LLM Agent Workflow Hijacking via EM Side Channel

ClawGuard uses electromagnetic side channels to out-of-band detect workflow hijacking in LLM agents, offering a forge-resistant security solution.

2605.06205May 7, 2026Leo Linqian Gan, Jeffery Wu, Longyuan Ge +6

Stateful Agent Backdoor

This paper introduces a stateful backdoor attack for LLM-based agents that persists across multiple sessions, enabling incremental, autonomous execution.

2605.06158May 7, 2026Zhengchunmin Dai, Jiaxiong Tang, Liantao Wu +2

Secure Seed-Based Multi-bit Watermarking for Diffusion Models from First Principles

This paper introduces a theoretical framework and a new method (SSB) for secure, robust, and model-independent watermarking of diffusion models.

2605.06153May 7, 2026Enoal Gesny, Eva Giboulot

Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks

Safety Anchor introduces Safety Bottleneck Regularization (SBR) to defend LLMs against harmful fine-tuning by anchoring hidden states in the unembedding layer.

2605.05995May 7, 2026Guoxin Lu, Letian Sha, Qing Wang +4

PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts

PragLocker protects valuable LLM agent prompts from unauthorized reuse by making them non-portable to other LLMs, securing intellectual property.

2605.05974May 7, 2026Qinfeng Li, Yuntai Bao, Jianghui Hu +5
PreviousPage 5 of 26Next

๐Ÿ“ฌ Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week โ€” summarized, scored, and delivered to your inbox every Monday.