ArXiv TLDR

Low-Stack HAETAE for Memory-Constrained Microcontrollers

🐦 Tweet
2604.15868

Gustavo Banegas, Kim Youngbeom, Seo Seog Chung, Vredendaal Christine Van

cs.CR

TLDR

Presents a low-stack HAETAE implementation for microcontrollers, drastically reducing memory usage for key generation, signing, and verification.

Key contributions

  • Introduces rejection-aware pass decomposition to isolate encoding post-acceptance.
  • Employs component-level early rejection to short-circuit response computation.
  • Uses reverse-order streaming entropy coding (rANS) to eliminate large buffers.
  • Achieves 92-95% stack reduction for signing and 85-91% for verification on HAETAE-2/3.

Why it matters

This work makes the HAETAE post-quantum signature scheme practical for memory-constrained microcontrollers, which are critical for IoT security. It drastically reduces stack usage, enabling robust cryptographic operations on devices with limited SRAM. This is crucial for deploying future-proof security in embedded systems.

Original Abstract

We present a low-stack implementation of the module-lattice signature scheme HAETAE, targeting microcontrollers with 8 kB-16 kB of available SRAM. On such devices, peak stack usage is often the binding constraint, and HAETAE's hyperball-based sampler, large transient polynomial vectors, and variable-length signature payloads (hint and high-bits arrays) pose a particular challenge. To address this we introduce (i) Rejection-aware pass decomposition, which isolates encoding to the post-acceptance path; (ii) Component-level early rejection, which short-circuits the response computation when a partial norm already exceeds the bound; and (iii) Reverse-order streaming entropy coding using range Asymmetric Numeral Systems (rANS), which eliminates full hint and high-bits staging buffers. Combined with streamed matrix generation, a two-pass hyperball sampler with streaming Gaussian backend, and row-streamed verification, these techniques bring Signing stack from 71 kB-141 kB in the reference implementation down to 5.8 kB-6.0 kB, key generation to 4.7 kB-5.7 kB, and verification to 4.7 kB-4.8 kB across all three security levels. Our pure C implementation covers all three security levels (HAETAE-2/3/5), whose optimization paths differ due to the public-key domain (d>0 vs. d=0) and rejection structure. We implement our optimization on a Nucleo-L4R5ZI and compare to the reference pqm4 (for HAETAE-2 and -3) and a recently published memory-optimized implementation (targeting HAETAE-5 only). We reduce HAETAE-2, -3, and -5 stack by respectively 75, 86 and 8 % for key generation, 92, 95 and 24 % for signature generation, and 85, 91 and 22 % for verification. Depending on the parameter set, this impacts performance by at most a factor 1.8 and 3.4 for key and signature generation respectively, while even offering a performance improvement up to 18 % for verification. Verification at all security levels fits within 8 kB of RAM (signature buffer + stack) and is 2.34-3.34x faster than ML-DSA m4fstack at each comparable security level. We additionally validate portability under RIOT-OS on ARM Cortex-M4 and RISC-V targets.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.