Michael Backes
2 papers ยท Latest:
Cryptography & Security
Pop Quiz Attack: Black-box Membership Inference Attacks Against Large Language Models
Introduces PopQuiz, a black-box membership inference attack that turns data into quizzes to reveal if LLMs memorized specific training examples.
2605.06423
Natural Language ProcessingSafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts
SafeReview defends LLM-based peer review systems against adversarial hidden prompts using a co-evolving generator-defender framework.
2604.26506
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.