ArXiv TLDR

Agentic Discovery with Active Hypothesis Exploration for Visual Recognition

🐦 Tweet
2604.12999

Jaywon Koo, Jefferson Hernandez, Ruozhen He, Hanjie Chen, Chen Wei + 1 more

cs.CV

TLDR

HypoExplore is an agentic framework that discovers neural architectures for visual recognition by actively exploring hypotheses and learning design principles.

Key contributions

  • Formulates neural architecture discovery as hypothesis-driven scientific inquiry using an LLM.
  • Maintains a Trajectory Tree and Hypothesis Memory Bank to track lineage and confidence scores.
  • Uses multiple feedback agents to analyze experimental results and update hypothesis confidence.
  • Achieves strong performance on CIFAR-10 (94.11%), CIFAR-100, Tiny-ImageNet, and MedMNIST.

Why it matters

This paper introduces an agentic framework that not only discovers high-performing neural architectures but also builds understanding of the design space. It demonstrates that learned principles can transfer, suggesting a path towards more intelligent and interpretable architecture search.

Original Abstract

We introduce HypoExplore, an agentic framework that formulates neural architecture discovery for visual recognition as a hypothesis-driven scientific inquiry. Given a human-specified high-level research direction, HypoExplore ideates, implements, evaluates, and improves neural architectures through evolutionary branching. New hypotheses are created using a large language model by selecting a parent hypothesis to build upon, guided by a dual strategy that balances exploiting validated principles with resolving uncertain ones. Our proposed framework maintains a Trajectory Tree that records the lineage of all proposed architectures, and a Hypothesis Memory Bank that actively tracks confidence scores acquired through experimental evidence. After each experiment, multiple feedback agents analyze the results from different perspectives and consolidate their findings into hypothesis confidence updates. Our framework is tested on discovering lightweight vision architectures on CIFAR-10, with the best achieving 94.11% accuracy evolved from a root node baseline that starts at 18.91%, and generalizes to CIFAR-100 and Tiny-ImageNet. We further demonstrate applicability to a specialized domain by conducting independent architecture discovery runs on MedMNIST, which yield a state-of-the-art performance. We show that hypothesis confidence scores grow increasingly predictive as evidence accumulates, and that the learned principles transfer across independent evolutionary lineages, suggesting that HypoExplore not only discovers stronger architectures, but can help build a genuine understanding of the design space.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.