Meow-Omni 1: A Multimodal Large Language Model for Feline Ethology
Jucheng Hu, Zhangquan Chen, Yulin Chen, Chengjie Hong, Liang Zhou + 7 more
TLDR
Meow-Omni 1 is the first quad-modal MLLM for feline ethology, fusing video, audio, physiology, and text to achieve SOTA intent recognition.
Key contributions
- Introduces Meow-Omni 1, the first open-source quad-modal MLLM for computational ethology.
- Fuses video, audio, physiological time-series, and text for deeper feline intent understanding.
- Achieves state-of-the-art 71.16% intent-recognition accuracy on the new MeowBench benchmark.
- Releases open-source model, training framework, and Meow-10K dataset for broader research.
Why it matters
This paper addresses a key challenge in animal intent understanding by integrating physiological data into MLLMs. It offers a scalable, open-source solution that can advance veterinary diagnostics and wildlife conservation efforts.
Original Abstract
Deciphering animal intent is a fundamental challenge in computational ethology, largely because of semantic aliasing, the phenomenon where identical external signals (e.g., a cat's purr) correspond to radically different internal states depending on physiological context. Existing Multimodal Large Language Models (MLLMs) are blind to high-frequency biological time-series data, restricting them to superficial behavioural pattern matching rather than genuine latent-state reasoning. To bridge this gap, we introduce Meow-Omni 1, the first open-source, quad-modal MLLM purpose-built for computational ethology. It natively fuses video, audio, and physiological time-series streams with textual reasoning. Through targeted architectural adaptation, we integrate specialized scientific encoders into a unified backbone and formalize intent inference via physiologically grounded cross-modal alignment. Evaluated on MeowBench, a novel, expert-verified quad-modal benchmark, Meow-Omni 1 achieves state-of-the-art intent-recognition accuracy (71.16%), substantially outperforming leading vision-language and omni-modal baselines. We release the complete open-source pipeline including model weights, training framework, and the Meow-10K dataset, to establish a scalable paradigm for inter-species intent understanding and to advance foundation models toward real-world veterinary diagnostics and wildlife conservation.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.