ArXiv TLDR

Meow-Omni 1: A Multimodal Large Language Model for Feline Ethology

🐦 Tweet
2605.09152

Jucheng Hu, Zhangquan Chen, Yulin Chen, Chengjie Hong, Liang Zhou + 7 more

cs.CLq-bio.NC

TLDR

Meow-Omni 1 is the first quad-modal MLLM for feline ethology, fusing video, audio, physiology, and text to achieve SOTA intent recognition.

Key contributions

  • Introduces Meow-Omni 1, the first open-source quad-modal MLLM for computational ethology.
  • Fuses video, audio, physiological time-series, and text for deeper feline intent understanding.
  • Achieves state-of-the-art 71.16% intent-recognition accuracy on the new MeowBench benchmark.
  • Releases open-source model, training framework, and Meow-10K dataset for broader research.

Why it matters

This paper addresses a key challenge in animal intent understanding by integrating physiological data into MLLMs. It offers a scalable, open-source solution that can advance veterinary diagnostics and wildlife conservation efforts.

Original Abstract

Deciphering animal intent is a fundamental challenge in computational ethology, largely because of semantic aliasing, the phenomenon where identical external signals (e.g., a cat's purr) correspond to radically different internal states depending on physiological context. Existing Multimodal Large Language Models (MLLMs) are blind to high-frequency biological time-series data, restricting them to superficial behavioural pattern matching rather than genuine latent-state reasoning. To bridge this gap, we introduce Meow-Omni 1, the first open-source, quad-modal MLLM purpose-built for computational ethology. It natively fuses video, audio, and physiological time-series streams with textual reasoning. Through targeted architectural adaptation, we integrate specialized scientific encoders into a unified backbone and formalize intent inference via physiologically grounded cross-modal alignment. Evaluated on MeowBench, a novel, expert-verified quad-modal benchmark, Meow-Omni 1 achieves state-of-the-art intent-recognition accuracy (71.16%), substantially outperforming leading vision-language and omni-modal baselines. We release the complete open-source pipeline including model weights, training framework, and the Meow-10K dataset, to establish a scalable paradigm for inter-species intent understanding and to advance foundation models toward real-world veterinary diagnostics and wildlife conservation.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.