ArXiv TLDR

Graph Neural Network based Hierarchy-Aware Embeddings of Knowledge Graphs: Applications to Yeast Phenotype Prediction

🐦 Tweet
2605.03690

Filip Kronström, Alexander H. Gower, Daniel Brunnsåker, Ievgeniia A. Tiukova, Ross D. King

cs.LGcs.AIq-bio.QM

TLDR

This paper introduces a GNN-based method for hierarchy-aware knowledge graph embeddings, improving yeast phenotype prediction and enabling biological discovery.

Key contributions

  • Developed a GNN-based method for hierarchy-aware knowledge graph embeddings, incorporating semantic loss from ontologies.
  • Achieved improved yeast phenotype prediction (R^2=0.377) for gene knockouts, outperforming baselines and generalizing.
  • Demonstrated the use of box embeddings for evaluating knowledge graph revisions and guiding biological discovery.
  • Validated a novel biological association between inositol utilisation and osmotic stress resistance in yeast.

Why it matters

This paper introduces a GNN method for hierarchy-aware KG embeddings, significantly improving predictions for biological systems. Its ability to uncover and validate new biological insights offers a powerful tool for accelerating scientific discovery and guiding hypothesis generation.

Original Abstract

We present a method for finding hierarchy-aware embeddings of knowledge graphs (KGs) using graph neural networks (GNNs) enriched with a semantic loss derived from underlying ontologies. This method yields embeddings that better reflect domain knowledge. To demonstrate their utility, we predict and interpret the effects of gene deletions in the yeast Saccharomyces cerevisiae and learn box embeddings for KGs in the absence of a prediction task. We further show how box embeddings can serve as the basis for evaluating KG revisions. Our yeast KG is constructed from community databases and ontology terms. Low-dimensional box embeddings combined with GNNs are used to predict cell growth for double gene knockouts. Over 10-fold cross validation, these predictions have a mean $R^2$~score~of~0.360, significantly higher than baseline comparisons, demonstrating that high-level qualitative knowledge is informative about experimental outcomes. Incorporating semantic loss terms in the training of the models improves their predictive performance ($R^2$=0.377) by aligning embeddings with ontology structure. This shows that class hierarchies from ontologies can be exploited for quantitative prediction. We also test the trained models on triple gene knockouts, showing they generalise to data beyond those seen in training. Additionally, by identifying co-occurring relations in the yeast KG important for the cell-growth predictions, we construct hypotheses about interacting traits in yeast. A biological experiment validates one such finding, revealing an association between inositol utilisation and osmotic stress resistance, highlighting the model's potential to guide biological discovery.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.