Provably Secure Steganography Based on List Decoding
TLDR
This paper introduces a provably secure steganography scheme using list decoding, significantly boosting embedding capacity in LLMs while maintaining security.
Key contributions
- Proposes a provably secure steganography (PSS) scheme with theoretically high embedding capacity.
- Utilizes list decoding to maintain candidate secret messages, maximizing information use.
- Introduces a suffix-matching mechanism to correctly extract the secret message from candidates.
- Achieves significant capacity improvement on LLMs with comparable efficiency to prior PSS methods.
Why it matters
Existing Provably Secure Steganography (PSS) schemes lack high embedding capacity, particularly in low-entropy LLMs. This work introduces a novel list-decoding approach that dramatically increases capacity while maintaining security, making covert communication more practical.
Original Abstract
Steganography embeds secret messages in seemingly innocuous carriers for covert communication under surveillance. Current Provably Secure Steganography (PSS) schemes based on language models can guarantee computational indistinguishability between the covertext and stegotext. However, achieving high embedding capacity remains a challenge for existing PSS. The inefficient entropy utilization renders them not well-suited for Large Language Models (LLMs), whose inherent low-entropy tendencies severely constrain feasible embedding capacity. To address this, we propose a provably secure steganography scheme with a theoretically proved high capacity. Our scheme is based on the concept of list decoding: it maintains a set of candidates that contain the correct secret message, instead of directly finding the correct message with more effort. This strategy fully utilizes the information content of the generated text, yielding higher capacity. To ensure the correctness of our scheme, we further introduce a suffix-matching mechanism to distinguish the correct secret message from the candidates. We provide theoretical proofs for both the security and correctness of our scheme, alongside a derivation of its theoretical capacity lower bound. Our approach is plug-and-play, requiring only a direct replacement of the model's standard random sampling module. Experiments on three LLMs and seven PSS baselines demonstrate that our method achieves computational efficiency comparable to prior PSS schemes while delivering a substantial improvement in embedding capacity.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.