Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping
Maryam Maghsoudi, Shihab Shamma
TLDR
This paper introduces a zero-shot method for decoding imagined speech from MEG by mapping imagined neural activity to listened speech responses.
Key contributions
- Developed a three-stage pipeline mapping imagined MEG responses to listened responses.
- Trained a word decoder solely on listened MEG, then applied it to mapped imagined data from held-out subjects.
- Achieved significant above-chance decoding of imagined words using rank-based analysis.
- Demonstrates scalability and applicability to realistic brain-computer interface scenarios.
Why it matters
Decoding imagined speech from brain signals is difficult due to limited data. This paper presents a novel, scalable zero-shot approach that leverages abundant listened speech data, significantly advancing brain-computer interface technology. It offers a practical pathway for future BCI applications.
Original Abstract
Decoding imagined speech from non-invasive brain recordings is challenging because imagined datasets are scarce and difficult to align temporally across subjects and sessions In this work, we propose a new approach to the decoding of imagined speech that leverages the richer and more reliably labeled recordings during listening to speech. We collected paired listened and imagined MEG recordings to rhythmic melodic and spoken stimuli from trained musicians. Using trained musicians helped improve temporal alignment across conditions. We then developed a three-stage decoding pipeline that revealed consistent and meaningful relationships between neural activity evoked by imagining and listening to the same stimuli. First, we trained six linear and neural models to map imagined MEG responses to listened responses. We evaluated these models against a null baseline from unseen subjects to validate that the predicted-listening responses preserve stimulus-specific information. In the second stage, we trained a contrastive word decoder exclusively on the listened MEG responses, and evaluated it using four embedding strategies including semantic, acoustic, and phonetic representations. In the third stage, we process the imagined MEG responses from held-out subjects through the mapping pipeline to compute the corresponding listening responses that are then decoded by the listened decoder. Using rank-based analysis, we show that the imagined words are decodable significantly above chance. We shall report here the results of a proof-of-concept implementation to decode imagined speech, where all evaluations are performed on held-out subjects. We also demonstrate that performance improves with training data size, suggesting that this approach is scalable and can directly be made applicable to realistic brain-computer interface scenarios.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.