MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang, Yahan Yu, Jiahua Dong, Chenxing Li, Dan Su + 2 more
TLDR
This paper surveys recent advances in MultiModal Large Language Models (MM-LLMs), detailing their architectures, training methods, and performance across diverse tasks.
Key contributions
- Provides a comprehensive taxonomy of 126 MM-LLMs based on design and training formulations.
- Reviews performance benchmarks and key training strategies that enhance MM-LLM capabilities.
- Identifies future research directions and maintains a real-time tracking website for ongoing developments.
Why it matters
As MM-LLMs integrate multimodal inputs and outputs with large language models, they significantly expand AI's ability to understand and generate across diverse data types. This survey consolidates the rapidly evolving landscape, offering researchers a structured overview and practical insights to accelerate innovation and application in multimodal AI.
Original Abstract
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive survey aimed at facilitating further research of MM-LLMs. Initially, we outline general design formulations for model architecture and training pipeline. Subsequently, we introduce a taxonomy encompassing 126 MM-LLMs, each characterized by its specific formulations. Furthermore, we review the performance of selected MM-LLMs on mainstream benchmarks and summarize key training recipes to enhance the potency of MM-LLMs. Finally, we explore promising directions for MM-LLMs while concurrently maintaining a real-time tracking website for the latest developments in the field. We hope that this survey contributes to the ongoing advancement of the MM-LLMs domain.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.