Natural Language Processing
Research on language models, text understanding, generation, and computational linguistics.
cs.CL ยท 805 papersQwen Goes Brrr: Off-the-Shelf RAG for Ukrainian Multi-Domain Document Understanding
A Qwen-based RAG system achieves high accuracy in Ukrainian multi-domain document understanding using contextual chunking and question-aware reranking.
To Redact, or not to Redact? A Local LLM Approach to Deliberative Process Privilege Classification
A local LLM approach effectively classifies deliberative process privilege in government documents, outperforming prior methods securely.
ASTRA-QA: A Benchmark for Abstract Question Answering over Documents
ASTRA-QA is a new benchmark for abstract question answering over documents, providing robust evaluation for coverage, hallucination, and retrieval scope.
Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents
Nautilus Compass detects persona drift in black-box LLM agents using prompt-text analysis, offering an efficient and accessible memory solution.
Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution
QD-LLM uses neuroevolution to evolve prompt embeddings, enabling diverse and high-quality LLM outputs without fine-tuning.
EvoPref: Multi-Objective Evolutionary Optimization Discovers Diverse LLM Alignments Beyond Gradient Descent
EvoPref, a multi-objective evolutionary algorithm, discovers diverse LLM alignments, overcoming preference collapse in gradient-based methods.
Meow-Omni 1: A Multimodal Large Language Model for Feline Ethology
Meow-Omni 1 is the first quad-modal MLLM for feline ethology, fusing video, audio, physiology, and text to achieve SOTA intent recognition.
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
AutoTTS automatically discovers optimal test-time scaling strategies for LLMs, outperforming hand-crafted methods with efficient, agentic search.
Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration
Conformal Path Reasoning (CPR) improves trustworthy Knowledge Graph Question Answering by providing reliable coverage guarantees with compact answer sets.
The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents
Expanded recall in LLMs can paradoxically degrade cooperation in multi-agent social dilemmas, a phenomenon termed the "memory curse."
CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation
CA-SQL improves Text-to-SQL performance on challenging tasks by dynamically scaling solution exploration and using evolutionary prompt seeding and novel voting.
Accurate and Efficient Statistical Testing for Word Semantic Breadth
This paper introduces a Householder-aligned permutation test to accurately compare word semantic breadth, reducing Type-I error and achieving significant speedup.
Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs
CMR-EXTR extracts structured data from free-text cardiac MRI reports with high accuracy and provides per-field confidence scores using distilled LLMs.
Fast Byte Latent Transformer
The Fast Byte Latent Transformer (BLT) introduces novel training and generation techniques to significantly speed up byte-level language models.
Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims
Mechanistic interpretability papers often make causal claims without disclosing necessary identification assumptions, proposing a new norm for scientific rigor.
Tool Calling is Linearly Readable and Steerable in Language Models
Researchers found that tool selection in LLMs is linearly readable and steerable, allowing for error prediction and correction before execution.
GLiGuard: Schema-Conditioned Classification for LLM Safeguard
GLiGuard is a compact 0.3B-parameter model that uses schema-conditioned classification to efficiently safeguard LLMs, outperforming larger models in speed.
Ask Early, Ask Late, Ask Right: When Does Clarification Timing Matter for Long-Horizon Agents?
Research shows optimal clarification timing for long-horizon agents varies by information type, challenging "earlier is better" and highlighting task-intrinsic timing profiles.
How to Train Your Latent Diffusion Language Model Jointly With the Latent Space
LDLM jointly trains latent space and diffusion model for faster, higher-quality non-autoregressive text generation.
How Value Induction Reshapes LLM Behaviour
Value induction in LLMs has unintended effects, influencing other values, safety, and increasing anthropomorphic, sycophantic language.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.