CGU-ILALab at FoodBench-QA 2026: Comparing Traditional and LLM-based Approaches for Recipe Nutrient Estimation

April 28, 20262604.25774

Wei-Chun Chen, Yu-Xuan Chen, I-Fang Chung, Ying-Jia Lin

cs.CLcs.AI

TLDR

This paper compares traditional and LLM-based methods for recipe nutrient estimation, finding LLMs offer higher accuracy but at a greater computational cost.

Key contributions

Evaluated TF-IDF, DeBERTa-v3, and LLMs (Gemini 2.5 Flash) for recipe nutrient estimation.
Few-shot LLM inference and a hybrid LLM pipeline achieved the highest nutrient estimation accuracy.
LLMs leverage pre-trained world knowledge to resolve ambiguous terminology and normalize units.
Identified a trade-off: LLM accuracy gains come with substantially higher inference latency.

Why it matters

Accurate nutrient estimation is crucial for dietary monitoring. This research highlights the potential of LLMs to improve precision in this challenging task, while also identifying critical performance trade-offs for real-world deployment.

Original Abstract

Accurate nutrient estimation from unstructured recipe text is an important yet challenging problem in dietary monitoring, due to ambiguous ingredient terminology and highly variable quantity expressions. We systematically evaluate models spanning a wide range of representational capacity, from lexical matching methods (TF-IDF with Ridge Regression), to deep semantic encoders (DeBERTa-v3), to generative reasoning with large language models (LLMs). Under the strict tolerance criteria defined by EU Regulation 1169/2011, our empirical results reveal a clear trade-off between predictive accuracy and computational efficiency. The TF-IDF baseline achieves moderate nutrient estimation performance with near-instantaneous inference, whereas the DeBERTa-v3 encoder performs poorly under task-specific data scarcity. In contrast, few-shot LLM inference (e.g., Gemini 2.5 Flash) and a hybrid LLM refinement pipeline (TF-IDF combined with Gemini 2.5 Flash) deliver the highest validation accuracy across all nutrient categories. These improvements likely arise from the ability of LLMs to leverage pre-trained world knowledge to resolve ambiguous terminology and normalize non-standard units, which remain difficult for purely lexical approaches. However, these gains come at the cost of substantially higher inference latency, highlighting a practical deployment trade-off between real-time efficiency and nutritional precision in dietary monitoring systems.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers