A Linear-Transformer Hybrid for SNP-Based Genotype-to-Phenotype Prediction in Grapevine
Yibin Wang, Murukarthick Jayakodi, Silvas Kirubakaran, Ambika Chandra, Azlan Zahid
TLDR
LiT-G2P, a linear-Transformer hybrid, improves genotype-to-phenotype prediction in grapevines, enhancing breeding decisions and genetic gain.
Key contributions
- Proposes LiT-G2P, a linear-Transformer hybrid for SNP-based genotype-to-phenotype prediction.
- Integrates additive genetic variance with Transformer-based nonlinear interactions for robust predictions.
- Achieves superior prediction performance for leaf hair and trichome density in grapevines across years.
- Provides interpretable candidate SNP markers using attention weights for downstream validation.
Why it matters
This paper introduces a novel, robust G2P prediction framework crucial for accelerating breeding decisions and genetic gain in agriculture. Its improved accuracy and interpretability offer practical tools for genomic selection.
Original Abstract
Robust genotype-to-phenotype (G2P) prediction is essential for accelerating breeding decisions and genetic gain. However, it remains challenging to measure complex traits under variable field conditions and across years. In this study, we propose a linear-Transformer approach, LiT-G2P (Linear-Transformer Genotype-to-Phenotype), an automated predictive framework that integrates additive genetic variance effects with Transformer-based nonlinear interactions using genome-wide single-nucleotide polymorphisms (SNPs) data. We evaluated LiT-G2P on a panel of diverse grape accessions, genotyped with SNP markers and measured for phenotypes across two consecutive years. Target phenotypic traits include leaf hair density and trichome density of grapevines. Across both single-year and cross-year testing scenarios, LiT-G2P consistently improves prediction performance compared with baseline models. For hair density, LiT-G2P achieves the lowest error in both single-year and cross-year evaluations, with RMSEs of 0.469 and 0.454, respectively, while maintaining strong tolerance accuracies of 79.2% and 74.6%, respectively. For trichome density, LiT-G2P also presents the best overall G2P performance. In addition, we extract model-prioritized SNPs from attention weights and apply genotype-stratified analysis to provide interpretable candidate marker for downstream validation. These results demonstrate that integrating stable additive effects with learned interaction patterns can enhance cross-year robustness and support practical SNP-based predictive modeling for genomic selection.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.