Francesco Belardinelli

2 papers · Latest: April 29, 2026

Tatemae: Detecting Alignment Faking via Tool Selection in LLMs

Tatemae detects LLM alignment faking by observing tool selection changes when monitoring is lifted, revealing strategic compliance.

2604.26511Apr 29, 2026

Machine Learning

SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning

SafeAdapt enables provably safe policy updates in deep RL by projecting updates onto a certified safety region, preventing catastrophic forgetting of safety.

2604.09452Apr 10, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.