Natural Language Processing

Research on language models, text understanding, generation, and computational linguistics.

cs.CL · 805 papers

Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation

TS-DFM improves discrete flow matching by guiding trajectory generation with an energy compass, achieving 128x faster text generation.

2605.07924May 8, 2026Amin Karimi Monsefi, Dominic Culver, Nikhil Bhendawade +3

CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers

CoCoReviewBench is a new benchmark for AI reviewers, focusing on completeness and correctness by curating 3,900 papers with expert annotations.

2605.07905May 8, 2026Hexuan Deng, Xiaopeng Ke, Yichen Li +6

Beyond "I cannot fulfill this request": Alleviating Rigid Rejection in LLMs via Label Enhancement

LANCE introduces a label enhancement method using variational inference to enable LLMs to provide safe yet flexible and natural responses, avoiding rigid rejections.

2605.07883May 8, 2026Ying Zhang, Congyu Qiao, Xin Geng +1

KL for a KL: On-Policy Distillation with Control Variate Baseline

vOPD stabilizes On-Policy Distillation for LLMs by applying an RL control variate baseline, significantly improving reasoning performance efficiently.

2605.07865May 8, 2026Minjae Oh, Sangjun Song, Gyubin Choi +2

MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning

MatryoshkaLoRA introduces a novel framework for LLM fine-tuning, enabling accurate hierarchical low-rank representations and dynamic rank selection.

2605.07850May 8, 2026Ionut-Vlad Modoranu, Mher Safaryan, Dan Alistarh

Measuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviors

This paper introduces a method to measure the distributional gap between real and simulated user behaviors, evaluating 24 LLM-based simulators.

2605.07847May 8, 2026Shuhaib Mehri, Philippe Laban, Sumuk Shashidhar +4

SCENE: Recognizing Social Norms and Sanctioning in Group Chats

SCENE is a new benchmark for evaluating LLMs' ability to recognize and adapt to implicit social norms and sanctions in group chats.

2605.07823May 8, 2026Mateusz Jacniacki, Maksymilian Bilski

TRACE: Tourism Recommendation with Accountable Citation Evidence

TRACE introduces a new dataset and benchmark for conversational tourism recommender systems, focusing on verifiable evidence and rejection recovery.

2605.07677May 8, 2026Zixu Zhao, Sijin Wang, Yu Hou +6

InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search

InterLV-Search is a new benchmark for interleaved language-vision agentic search, revealing current multimodal agents struggle with complex visual evidence integration.

2605.07510May 8, 2026Bohan Hou, Jiuning Gu, Jiayan Guo +5

TCMIIES: A Browser-Based LLM-Powered Intelligent Information Extraction System for Academic Literature

TCMIIES is a browser-based, zero-installation system leveraging commercial LLMs for privacy-preserving, schema-guided information extraction from academic literature.

2605.07507May 8, 2026Hanqing Zhao

DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

DiffRetriever uses diffusion language models to generate multiple representative tokens in parallel, significantly improving retrieval performance over sequential autoregressive methods.

2605.07210May 8, 2026Shuai Wang, Yin Yu, Shengyao Zhuang +2

Topic Is Not Agenda: A Citation-Community Audit of Text Embeddings

Text embeddings fail to capture fine-grained research agendas, leading to 80% off-agenda retrievals in scientific RAG.

2605.07158May 8, 2026Junseon Yoo

Bridging Textual Profiles and Latent User Embeddings for Personalization

BLUE unifies interpretable textual user profiles with discriminative latent embeddings using reinforcement learning for personalized recommendations.

2605.06981May 7, 2026Zhaoxuan Tan, Xiang Zhai, Yan Zhu +2

From Surface Learning to Deep Understanding: A Grounded AI Tutoring System for Moodle

A Moodle plugin uses RAG and LLMs for Socratic tutoring and educator content generation, ensuring high-quality, hallucination-free education.

2605.06963May 7, 2026Anna Ostrowska, Michał Kukla, Gabriela Majstrak +4

EMO: Pretraining Mixture of Experts for Emergent Modularity

EMO is a new Mixture-of-Experts model that achieves emergent modularity, allowing efficient selective expert use for memory-constrained LLM deployment.

2605.06663May 7, 2026Ryan Wang, Akshita Bhagia, Sewon Min

Verifier-Backed Hard Problem Generation for Mathematical Reasoning

VHG is a novel verifier-enhanced framework for generating valid and challenging mathematical problems for LLMs, outperforming existing methods.

2605.06660May 7, 2026Yuhang Lai, Jiazhan Feng, Yee Whye Teh +1

When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels

This paper introduces a method for validating comparative LLM safety scoring without ground-truth labels, using an instrumental-validity chain.

2605.06652May 7, 2026Sushant Gautam, Finn Schwall, Annika Willoch Olstad +6

Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients

POPO is a novel RLVR framework for LLMs that learns exclusively from positive rollouts, achieving strong performance by implicitly deriving negative gradients.

2605.06650May 7, 2026Mingwei Xu, Hao Fang

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

StraTA introduces strategic trajectory abstraction to agentic RL, improving LLM performance in long-horizon tasks by enhancing exploration and credit assignment.

2605.06642May 7, 2026Xiangyuan Xue, Yifan Zhou, Zidong Wang +5

Recursive Agent Optimization

Recursive Agent Optimization (RAO) trains agents to recursively delegate sub-tasks, enabling them to scale and generalize more effectively.

2605.06639May 7, 2026Apurva Gandhi, Satyaki Chakraborty, Xiangjun Wang +2

PreviousPage 6 of 41Next

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.