ArXiv TLDR

Seeing the imagined: a latent functional alignment in visual imagery decoding from fMRI data

🐦 Tweet
2604.15374

Fabrizio Spera, Tommaso Boccato, Michal Olak, Sara Cammarota, Matteo Ciferri + 3 more

q-bio.NCcs.AIeess.IV

TLDR

This paper adapts a SOTA fMRI decoder for visual perception to reconstruct imagined content using latent functional alignment and data augmentation.

Key contributions

  • Adapts SOTA fMRI perception decoder (DynaDiff) for visual imagery reconstruction.
  • Proposes latent functional alignment to map imagery activity into model's conditioning space.
  • Introduces retrieval-based augmentation to address limited imagery-perception supervision.
  • Achieves improved semantic reconstruction and above-chance decoding from cortical regions.

Why it matters

This research significantly advances fMRI-based visual imagery decoding, bridging the gap between perception and imagination. By leveraging semantic structures learned from perception, it offers a novel approach to reconstruct imagined content. This could lead to better understanding of mental processes and new applications in BCI.

Original Abstract

Recent progress in visual brain decoding from fMRI has been enabled by large-scale datasets such as the Natural Scenes Dataset (NSD) and powerful diffusion-based generative models. While current pipelines are primarily optimized for perception, their performance under mental-imagery remains less well understood. In this work, we study how a state-of-the-art (SOTA) perception decoder (DynaDiff) can be adapted to reconstruct imagined content from the Imagery-NSD benchmark. We propose a latent functional alignment approach that maps imagery-evoked activity into the pretrained model's conditioning space, while keeping the remaining components frozen. To mitigate the limited amount of matched imagery-perception supervision, we further introduce a retrieval-based augmentation strategy that selects semantically related NSD perception trials. Across four subjects, latent functional alignment consistently improves high-level semantic reconstruction metrics relative to the frozen pretrained baseline and a voxel-space ridge alignment baseline, and enables above-chance decoding from multiple cortical regions. These results suggest that semantic structure learned from perception can be leveraged to stabilize and improve visual imagery decoding under out-of-distribution conditions.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.