Zhe Chen

2 papers · Latest: December 21, 2023

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

InternVL is a 6-billion parameter vision-language foundation model that aligns large-scale vision models with LLMs to achieve state-of-the-art results across diverse visual-linguistic tasks.

2312.14238Dec 21, 2023

Natural Language Processing

Gemini: A Family of Highly Capable Multimodal Models

Gemini is a new family of multimodal AI models excelling in image, audio, video, and text understanding, achieving state-of-the-art results across numerous benchmarks including human-expert level on MMLU.

2312.11805Dec 19, 2023

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.