ArXiv TLDR
← All categories

Natural Language Processing

Research on language models, text understanding, generation, and computational linguistics.

cs.CL · 805 papers

Model-Agnostic Lifelong LLM Safety via Externalized Attack-Defense Co-Evolution

EvoSafety introduces a novel framework for lifelong, model-agnostic LLM safety via externalized attack-defense co-evolution to counter adversarial prompts.

2605.13411May 13, 2026Xiaozhe Zhang, Chaozhuo Li, Hui Liu +4

TruncProof: A Guardrail for LLM-based JSON Generation under Token-Length Constraints

TruncProof enables LLMs to generate grammatically valid JSON outputs while strictly adhering to predefined token length constraints.

2605.13076May 13, 2026Yoshio Kato, Shuhei Tarashima

RAG-Enhanced Large Language Models for Dynamic Content Expiration Prediction in Web Search

This paper introduces an LLM-based framework for dynamic content expiration prediction in web search, improving freshness and user experience.

2605.13052May 13, 2026Tingyu Chen, Wenkai Zhang, Li Gao +4

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

LongMemEval-V2 introduces a new benchmark to evaluate long-term agent memory for acquiring environment-specific experience in web environments.

2605.12493May 12, 2026Di Wu, Zixiang Ji, Asmi Kawatkar +4

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

This paper introduces an LLM-guided query refinement method that adapts embedding models in real-time for challenging zero-shot search and classification tasks.

2605.12487May 12, 2026Ariel Gera, Shir Ashury-Tahan, Gal Bloch +2

MEME: Multi-entity & Evolving Memory Evaluation

MEME is a new benchmark evaluating LLM agents' multi-entity and evolving memory, revealing severe limitations in dependency reasoning.

2605.12477May 12, 2026Seokwon Jung, Alexander Rubinstein, Arnas Uselis +2

Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts

This paper reveals a geometric coupling between SMoE routers and experts, explaining how routers learn effective assignment geometry and proposing a coupling-based router.

2605.12476May 12, 2026Sagi Ahrac, Noya Hochwald, Mor Geva

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

KV-Fold enables stable, training-free long-context inference by treating the KV-cache as an accumulator, achieving high fidelity and memory efficiency.

2605.12471May 12, 2026Alireza Nadali, Patrick Cooper, Ashutosh Trivedi +1

Solve the Loop: Attractor Models for Language and Reasoning

Attractor Models introduce a stable, efficient fixed-point refinement method for iterative Transformers, significantly boosting performance in language and reasoning tasks.

2605.12466May 12, 2026Jacob Fein-Ashley, Paria Rashidinejad

Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

Multi-Stream LLMs introduce parallel computation streams to unblock language models, enabling simultaneous reading, thinking, and acting for improved efficiency.

2605.12460May 12, 2026Guinan Su, Yanwu Yang, Xueyan Li +1

TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection

TextSeal is a new LLM watermark using dual-key generation and multi-region localization for robust, distortion-free detection and distillation protection.

2605.12456May 12, 2026Tom Sander, Hongyan Chang, Tomáš Souček +10

The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events

This paper audits LLM-generated political discourse during crises, finding it lacks population realism compared to observed online content.

2605.12452May 12, 2026Gunjan, Sidahmed Benabderrahmane, Talal Rahwan

ORCE: Order-Aware Alignment of Verbalized Confidence in Large Language Models

ORCE improves LLM verbalized confidence by decoupling its estimation from answer generation and using rank-based optimization for better calibration.

2605.12446May 12, 2026Chen Li, Xiaoling Hu, Songzhu Zheng +2

A Causal Language Modeling Detour Improves Encoder Continued Pretraining

A Causal Language Modeling detour during encoder continued pretraining boosts downstream performance, outperforming standard MLM, especially in biomedicine.

2605.12438May 12, 2026Rian Touchent, Eric de la Clergerie

Geometric Factual Recall in Transformers

Transformers memorize facts geometrically, using embeddings that encode relational structure and an MLP as a relation-conditioned selector.

2605.12426May 12, 2026Shauli Ravfogel, Gilad Yehudai, Joan Bruna +1

Predicting Disagreement with Human Raters in LLM-as-a-Judge Difficulty Assessment without Using Generation-Time Probability Signals

This paper proposes a method to predict disagreement between LLM-as-a-Judge difficulty ratings and human raters, without using generation-time probability signals.

2605.12422May 12, 2026Yo Ehara

ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging

ORBIT prevents catastrophic forgetting in GenRetrieval LLMs by regulating weight drift, preserving foundational language capabilities.

2605.12419May 12, 2026Neha Verma, Nikhil Mehta, Shao-Chuan Wang +7

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

LLMs update beliefs in a low-dimensional conceptual space, showing in-context learning as trajectories through this space, grounded in structured representations.

2605.12412May 12, 2026Eric Bigelow, Raphaël Sarfati, Daniel Wurgaft +5

Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling

A new text-tabular model, using an "LLM-as-Observer," accurately predicts unfamiliar AI agent decisions in negotiation games from limited interactions.

2605.12411May 12, 2026Eilam Shapira, Moshe Tennenholtz, Roi Reichart

Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring

Q-DAPS estimates LLM question difficulty by analyzing the entropy of answer plausibility scores, outperforming baselines and aligning with human judgment.

2605.12398May 12, 2026Jamshid Mozafari, Bhawna Piryani, Adam Jatowt
PreviousPage 2 of 41Next

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.