Ensembles at Any Cost? Accuracy-Energy Trade-offs in Recommender Systems

April 9, 20262604.07869

Jannik Nitschke, Lukas Wegmeth, Joeran Beel

cs.IRcs.LG

TLDR

Ensembles boost recommender accuracy by 0.3-5.7% but increase energy consumption by 19-2,549%, highlighting a significant accuracy-energy trade-off.

Key contributions

Measures accuracy-energy trade-offs of ensemble methods in recommender systems.
Finds ensembles improve accuracy by 0.3-5.7% but increase energy by 19-2,549%.
Evaluated four ensemble strategies across 93 experiments and four diverse datasets.
Suggests selective ensembles are more energy-efficient than exhaustive averaging.

Why it matters

This paper is crucial for understanding the environmental impact of recommender systems, moving beyond accuracy-only optimization. It provides empirical data on the significant energy costs of ensemble methods, guiding more sustainable AI development.

Original Abstract

Ensemble methods are frequently used in recommender systems to improve accuracy by combining multiple models. Recent work reports sizable performance gains, but most studies still optimize primarily for accuracy and robustness rather than for energy efficiency. This paper measures accuracy energy trade offs of ensemble techniques relative to strong single models. We run 93 controlled experiments in two pipelines: 1. explicit rating prediction with Surprise (RMSE) and 2. implicit feedback ranking with LensKit (NDCG@10). We evaluate four datasets ranging from 100,000 to 7.8 million interactions (MovieLens 100K, MovieLens 1M, ModCloth, Anime). We compare four ensemble strategies (Average, Weighted, Stacking or Rank Fusion, Top Performers) against baselines and optimized single models. Whole system energy is measured with EMERS using a smart plug and converted to CO2 equivalents. Across settings, ensembles improve accuracy by 0.3% to 5.7% while increasing energy by 19% to 2,549%. On MovieLens 1M, a Top Performers ensemble improves RMSE by 0.96% at an 18.8% energy overhead over SVD++. On MovieLens 100K, an averaging ensemble improves NDCG@10 by 5.7% with 103% additional energy. On Anime, a Surprise Top Performers ensemble improves RMSE by 1.2% but consumes 2,005% more energy (0.21 vs. 0.01 Wh), increasing emissions from 2.6 to 53.8 mg CO2 equivalents, and LensKit ensembles fail due to memory limits. Overall, selective ensembles are more energy efficient than exhaustive averaging,

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers