Hadas Orgad
2 papers ยท Latest:
Natural Language Processing
Hidden Failures in Robustness: Why Supervised Uncertainty Quantification Needs Better Evaluation
A systematic study reveals supervised uncertainty probes for LLMs lack robustness, especially OOD, finding input features and aggregation are key drivers.
2604.11662
Natural Language ProcessingLarge Language Models Generate Harmful Content Using a Distinct, Unified Mechanism
LLMs generate harmful content via a compact, distinct set of weights, explaining why alignment training is brittle and emergent misalignment occurs.
2604.09544
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.