Hadas Orgad

2 papers · Latest: April 13, 2026

Hidden Failures in Robustness: Why Supervised Uncertainty Quantification Needs Better Evaluation

A systematic study reveals supervised uncertainty probes for LLMs lack robustness, especially OOD, finding input features and aggregation are key drivers.

2604.11662Apr 13, 2026

Natural Language Processing

Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism

LLMs generate harmful content via a compact, distinct set of weights, explaining why alignment training is brittle and emergent misalignment occurs.

2604.09544Apr 10, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.