Cryptography & Security
Research on AI security, adversarial attacks, privacy, and cryptographic methods.
cs.CR ยท 505 papersOrchJail: Jailbreaking Tool-Calling Text-to-Image Agents by Orchestration-Guided Fuzzing
OrchJail is a new fuzzing framework that jailbreaks tool-calling T2I agents by exploiting unsafe tool orchestration patterns, improving attack effectiveness.
From Conceptual Scaffold to Prototype: A Standardized Zonal Architecture for Wi-Fi Security Training
This paper introduces a standardized zonal architecture and open-source prototype for a Wi-Fi-focused Cyber Range to improve security training.
Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches
Patch2Vuln uses a language model agent to reconstruct vulnerabilities from Linux binary patches, evaluated on Ubuntu packages.
FedAttr: Towards Privacy-preserving Client-Level Attribution in Federated LLM Fine-tuning
FedAttr enables privacy-preserving client-level attribution in federated LLM fine-tuning to detect data ownership violations without compromising privacy.
CLAD: A Clustered Label-Agnostic Federated Learning Framework for Joint Anomaly Detection and Attack Classification
CLAD is a federated learning framework for IoT security, combining clustered FL and a dual-mode architecture for anomaly detection and attack classification.
On the Security of Research Artifacts
This paper reveals that many research artifacts contain security vulnerabilities, proposing a framework (SAFE) to assess and mitigate these risks.
PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization
PACZero introduces a novel PAC-private zeroth-order method for fine-tuning LLMs, achieving strong privacy ($I=0$) with usable utility via sign quantization.
Privacy by Postprocessing the Discrete Laplace Mechanism
This paper shows the discrete Laplace mechanism can be post-processed for unbiased estimation and distribution matching, making it versatile for discrete data.
Autonomous Adversary: Red-Teaming in the age of LLM
This paper explores Language Model Agents (LMAs) for red-teaming, benchmarking their effectiveness in lateral movement scenarios and identifying key limitations.
Pop Quiz Attack: Black-box Membership Inference Attacks Against Large Language Models
Introduces PopQuiz, a black-box membership inference attack that turns data into quizzes to reveal if LLMs memorized specific training examples.
Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation
This paper proposes TEE-backed isolation to constrain host-level abuse in self-hosted computer-use agents, preventing unsafe operations.
Fine-Tuning Small Language Models for Solution-Oriented Windows Event Log Analysis
Fine-tuned Small Language Models (SLMs) outperform LLMs for Windows event log analysis, providing actionable solutions with fewer computational resources.
Gaming the Metric, Not the Harm: Certifying Safety Audits against Strategic Platform Manipulation
This paper shows how online safety metrics can be gamed by platforms using content variants and proposes a robust "semantic-envelope" metric to certify true harm reduction.
Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds
This paper provides a tight, transparent analysis of the privacy-utility trade-off for DP-SGD using random shuffling subsampling.
Profiling for Pennies: Unveiling the Privacy Iceberg of LLM Agents
LLM agents can create detailed personal profiles cheaply and quickly, exposing significant privacy risks due to platform failures and lack of awareness.
ClawGuard: Out-of-Band Detection of LLM Agent Workflow Hijacking via EM Side Channel
ClawGuard uses electromagnetic side channels to out-of-band detect workflow hijacking in LLM agents, offering a forge-resistant security solution.
Stateful Agent Backdoor
This paper introduces a stateful backdoor attack for LLM-based agents that persists across multiple sessions, enabling incremental, autonomous execution.
Secure Seed-Based Multi-bit Watermarking for Diffusion Models from First Principles
This paper introduces a theoretical framework and a new method (SSB) for secure, robust, and model-independent watermarking of diffusion models.
Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks
Safety Anchor introduces Safety Bottleneck Regularization (SBR) to defend LLMs against harmful fine-tuning by anchoring hidden states in the unembedding layer.
PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts
PragLocker protects valuable LLM agent prompts from unauthorized reuse by making them non-portable to other LLMs, securing intellectual property.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.