ArXiv TLDR

Mycelium-Index: A Streaming Approximate Nearest Neighbor Index with Myelial Edge Decay, Traffic-Driven Reinforcement, and Adaptive Living Hierarchy

🐦 Tweet
2604.11274

Anton Pakhunov

cs.LGcs.IR

TLDR

Mycelium-Index is a streaming Approximate Nearest Neighbor (ANN) index that significantly outperforms existing methods in memory efficiency and query speed.

Key contributions

  • Mycelium-Index adapts its topology using myelial edge decay, reinforcement, and a traffic-driven hierarchy.
  • Achieves competitive recall with 5.7x less RAM and 4.7x higher QPS than FreshDiskANN on streaming data.
  • Matches HNSW recall on static indices using 5.2x less RAM, demonstrating significant memory efficiency.
  • Identifies "topological repair invariance," showing topological mechanisms succeed where geometric fail in high-dim ANN.

Why it matters

This paper introduces a highly efficient and adaptive ANN index crucial for large-scale, dynamic datasets. Its superior memory usage and query speed make it ideal for real-time applications, while its insights into topological repair advance ANN graph theory.

Original Abstract

We present mycelium-index, a streaming approximate nearest neighbor (ANN) index for high-dimensional vector spaces, inspired by the adaptive growth patterns of biological mycelium. The system continuously adapts its topology through myelial edge decay and reinforcement, a traffic-driven living hierarchy, and hybrid deletion combining O(1) bypass for cold nodes with O(k) beam-search repair for hub nodes. Experimental evaluation on SIFT-1M demonstrates that mycelium achieves 0.927 +/- 0.028 recall@5 under FreshDiskANN's 100%-turnover benchmark protocol -- within the measurement confidence interval of FreshDiskANN's ~0.95 -- while using 5.7x less RAM (88 MB vs. >500 MB) and achieving 4.7x higher QPS (2,795 vs. ~600). On the static index, at ef=192, mycelium matches HNSW M=16 recall (0.962 vs. 0.965) at 5.2x less RAM (163 MB vs. 854 MB). Performance optimizations including NEON SIMD distance computation, Vec-backed node storage, and bitset visited tracking yield a cumulative 2.7x QPS improvement. A systematic study of ten streaming repair mechanisms finds that geometric heuristics universally fail in high dimensions, while topological mechanisms succeed -- a principle we term the topological repair invariance of high-dimensional ANN graphs.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.