Toward Multimodal Conversational AI for Age-Related Macular Degeneration
Ran Gu, Benjamin Hou, Mélanie Hébert, Asmita Indurkar, Yifan Yang + 3 more
TLDR
OcularChat, a new multimodal LLM, accurately diagnoses age-related macular degeneration (AMD) from fundus photos with clinical reasoning and interactive dialogue.
Key contributions
- Introduces OcularChat, an MLLM fine-tuned on 700k+ simulated dialogues for AMD diagnosis from fundus photos.
- Achieves superior classification accuracy (e.g., 0.954 for advanced AMD) on AREDS/AREDS2 datasets, outperforming existing MLLMs.
- Provides diagnostic reasoning and interactive explanations, scoring higher in ophthalmologist evaluations than baseline models.
Why it matters
This paper is significant as it moves beyond static predictions in retinal disease detection, offering interactive and interpretable AI. OcularChat's ability to provide clinical reasoning and engage in dialogue could greatly enhance diagnostic support and patient counseling for AMD, making AI more clinically useful.
Original Abstract
Despite strong performance of deep learning models in retinal disease detection, most systems produce static predictions without clinical reasoning or interactive explanation. Recent advances in multimodal large language models (MLLMs) integrate diagnostic predictions with clinically meaningful dialogue to support clinical decision-making and patient counseling. In this study, OcularChat, an MLLM, was fine-tuned from Qwen2.5-VL using simulated patient-physician dialogues to diagnose age-related macular degeneration (AMD) through visual question answering on color fundus photographs (CFPs). A total of 705,850 simulated dialogues paired with 46,167 CFPs were generated to train OcularChat to identify key AMD features and produce reasoned predictions. OcularChat demonstrated strong classification performance in AREDS, achieving accuracies of 0.954, 0.849, and 0.678 for the three diagnostic tasks: advanced AMD, pigmentary abnormalities, and drusen size, significantly outperforming existing MLLMs. On AREDS2, OcularChat remained the top-performing method on all tasks. Across three independent ophthalmologist graders, OcularChat achieved higher mean scores than a strong baseline model for advanced AMD (3.503 vs. 2.833), pigmentary abnormalities (3.272 vs. 2.828), drusen size (3.064 vs. 2.433), and overall impression (2.978 vs. 2.464) on a 5-point clinical grading rubric. Beyond strong objective performance in AMD severity classification, OcularChat demonstrated the ability to provide diagnostic reasoning, clinically relevant explanations, and interactive dialogue, with high performance in subjective ophthalmologist evaluation. These findings suggest that MLLMs may enable accurate, interpretable, and clinically useful image-based diagnosis and classification of AMD.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.