Federico Pierucci
2 papers ยท Latest:
Natural Language Processing
Adversarial Humanities Benchmark: Results on Stylistic Robustness in Frontier Model Safety
New benchmark reveals frontier models' safety refusals are easily bypassed by stylistic prompt transformations, showing weak generalization in current safety techniques.
2604.18487
Agentic Microphysics: A Manifesto for Generative AI Safety
This paper proposes "Agentic Microphysics" and "Generative Safety" to analyze and mitigate population-level risks in interacting agentic AI systems.
2604.15236
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.