Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling

April 28, 20262604.25860

cs.CLcs.AIcs.CY

TLDR

Luminol-AIDetect detects machine-generated text by analyzing perplexity changes after text shuffling, achieving state-of-the-art, cost-effective, zero-shot performance.

Key contributions

Proposes Luminol-AIDetect, a zero-shot, model-agnostic method for detecting machine-generated text.
Leverages text shuffling to expose structural fragility in MGT via characteristic perplexity shifts.
Extracts perplexity features from original and shuffled text for robust detection using density estimation.
Achieves state-of-the-art performance across 18 languages and 11 attack types, with 17x lower FPR.

Why it matters

This paper introduces a novel, efficient, and highly effective method for detecting machine-generated text, crucial for combating misinformation. Its model-agnostic and zero-shot nature makes it broadly applicable and future-proof against evolving LLMs. The significant performance gains and cost reduction are major advancements.

Original Abstract

Machine-generated text (MGT) detection requires identifying structurally invariant signals across generation models, rather than relying on model-specific fingerprints. In this respect, we hypothesize that while large language models excel at local semantic consistency, their autoregressive nature results in a specific kind of structural fragility compared to human writing. We propose Luminol-AIDetect, a novel, zero-shot statistical approach that exposes this fragility through coherence disruption. By applying a simple randomized text-shuffling procedure, we demonstrate that the resulting shift in perplexity serves as a principled, model-agnostic discriminant, as MGT displays a characteristic dispersion in perplexity-under-shuffling that differs markedly from the more stable structural variability of human-written text. Luminol-AIDetect leverages this distinction to inform its decision process, where a handful of perplexity-based scalar features are extracted from an input text and its shuffled version, then detection is performed via density estimation and ensemble-based prediction. Evaluated across 8 content domains, 11 adversarial attack types, and 18 languages, Luminol-AIDetect demonstrates state-of-the-art performance, with gains up to 17x lower FPR while being cheaper than prior methods.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers