Natural Language Processing

Research on language models, text understanding, generation, and computational linguistics.

cs.CL · 805 papers

Qwen Goes Brrr: Off-the-Shelf RAG for Ukrainian Multi-Domain Document Understanding

A Qwen-based RAG system achieves high accuracy in Ukrainian multi-domain document understanding using contextual chunking and question-aware reranking.

2605.10296May 11, 2026Anton Bazdyrev, Ivan Bashtovyi, Ivan Havlytskyi +2

To Redact, or not to Redact? A Local LLM Approach to Deliberative Process Privilege Classification

A local LLM approach effectively classifies deliberative process privilege in government documents, outperforming prior methods securely.

2605.10211May 11, 2026Maik Larooij, David Graus

ASTRA-QA: A Benchmark for Abstract Question Answering over Documents

ASTRA-QA is a new benchmark for abstract question answering over documents, providing robust evaluation for coverage, hallucination, and retrieval scope.

2605.10168May 11, 2026Shu Wang, Shansong Zhou, Xinyang Wang +3

Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents

Nautilus Compass detects persona drift in black-box LLM agents using prompt-text analysis, offering an efficient and accessible memory solution.

2605.09863May 11, 2026Chunxiao Wang

Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution

QD-LLM uses neuroevolution to evolve prompt embeddings, enabling diverse and high-quality LLM outputs without fine-tuning.

2605.09781May 10, 2026Dongxin Guo, Jikun Wu, Siu Ming Yiu

EvoPref: Multi-Objective Evolutionary Optimization Discovers Diverse LLM Alignments Beyond Gradient Descent

EvoPref, a multi-objective evolutionary algorithm, discovers diverse LLM alignments, overcoming preference collapse in gradient-based methods.

2605.09777May 10, 2026Dongxin Guo, Jikun Wu, Siu Ming Yiu

Meow-Omni 1: A Multimodal Large Language Model for Feline Ethology

Meow-Omni 1 is the first quad-modal MLLM for feline ethology, fusing video, audio, physiology, and text to achieve SOTA intent recognition.

2605.09152May 9, 2026Jucheng Hu, Zhangquan Chen, Yulin Chen +9

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

AutoTTS automatically discovers optimal test-time scaling strategies for LLMs, outperforming hand-crafted methods with efficient, agentic search.

2605.08083May 8, 2026Tong Zheng, Haolin Liu, Chengsong Huang +10

Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration

Conformal Path Reasoning (CPR) improves trustworthy Knowledge Graph Question Answering by providing reliable coverage guarantees with compact answer sets.

2605.08077May 8, 2026Shuhang Lin, Chuhao Zhou, Xiao Lin +5

The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents

Expanded recall in LLMs can paradoxically degrade cooperation in multi-agent social dilemmas, a phenomenon termed the "memory curse."

2605.08060May 8, 2026Jiayuan Liu, Tianqin Li, Shiyi Du +7

CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation

CA-SQL improves Text-to-SQL performance on challenging tasks by dynamically scaling solution exploration and using evolutionary prompt seeding and novel voting.

2605.08057May 8, 2026James Petullo, Nianwen Xue

Accurate and Efficient Statistical Testing for Word Semantic Breadth

This paper introduces a Householder-aligned permutation test to accurately compare word semantic breadth, reducing Type-I error and achieving significant speedup.

2605.08048May 8, 2026Yo Ehara

Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs

CMR-EXTR extracts structured data from free-text cardiac MRI reports with high accuracy and provides per-field confidence scores using distilled LLMs.

2605.08045May 8, 2026Yi Yu, Parker Martin, Zhenyu Bu +5

Fast Byte Latent Transformer

The Fast Byte Latent Transformer (BLT) introduces novel training and generation techniques to significantly speed up byte-level language models.

2605.08044May 8, 2026Julie Kallini, Artidoro Pagnoni, Tomasz Limisiewicz +5

Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims

Mechanistic interpretability papers often make causal claims without disclosing necessary identification assumptions, proposing a new norm for scientific rigor.

2605.08012May 8, 2026Zezheng Lin, Fengming Liu

Tool Calling is Linearly Readable and Steerable in Language Models

Researchers found that tool selection in LLMs is linearly readable and steerable, allowing for error prediction and correction before execution.

2605.07990May 8, 2026Zekun Wu, Ze Wang, Seonglae Cho +4

GLiGuard: Schema-Conditioned Classification for LLM Safeguard

GLiGuard is a compact 0.3B-parameter model that uses schema-conditioned classification to efficiently safeguard LLMs, outperforming larger models in speed.

2605.07982May 8, 2026Urchade Zaratiana, Mary Newhauser, George Hurn-Maloney +1

Ask Early, Ask Late, Ask Right: When Does Clarification Timing Matter for Long-Horizon Agents?

Research shows optimal clarification timing for long-horizon agents varies by information type, challenging "earlier is better" and highlighting task-intrinsic timing profiles.

2605.07937May 8, 2026Anmol Gulati, Hariom Gupta, Elias Lumer +2

How to Train Your Latent Diffusion Language Model Jointly With the Latent Space

LDLM jointly trains latent space and diffusion model for faster, higher-quality non-autoregressive text generation.

2605.07933May 8, 2026Viacheslav Meshchaninov, Alexander Shabalin, Egor Chimbulatov +4

How Value Induction Reshapes LLM Behaviour

Value induction in LLMs has unintended effects, influencing other values, safety, and increasing anthropomorphic, sycophantic language.

2605.07925May 8, 2026Arnav Arora, Natalie Schluter, Katherine Metcalf +1

PreviousPage 5 of 41Next

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.