PeReGrINE: Evaluating Personalized Review Fidelity with User Item Graph Context
TLDR
PeReGrINE is a new benchmark and framework for personalized review generation using graph-structured user-item evidence and a User Style Parameter.
Key contributions
- Introduces PeReGrINE, a benchmark for personalized review generation using graph-structured user-item evidence.
- Develops a User Style Parameter to summarize user linguistic and affective tendencies from past reviews.
- Proposes Dissonance Analysis, a new metric to evaluate deviation from user style and product consensus.
- Enables controlled comparison of four graph-derived retrieval settings for evidence composition.
Why it matters
This paper matters because it provides a robust benchmark and evaluation framework for personalized review generation, addressing challenges of temporal consistency and sparse user history. It introduces novel metrics and a User Style Parameter, enabling deeper insights into how evidence composition impacts review fidelity and personalization in language models.
Original Abstract
We introduce PeReGrINE, a benchmark and evaluation framework for personalized review generation grounded in graph-structured user--item evidence. PeReGrINE restructures Amazon Reviews 2023 into a temporally consistent bipartite graph, where each target review is conditioned on bounded evidence from user history, item context, and neighborhood interactions under explicit temporal cutoffs. To represent persistent user preferences without conditioning directly on sparse raw histories, we compute a User Style Parameter that summarizes each user's linguistic and affective tendencies over prior reviews. This setup supports controlled comparison of four graph-derived retrieval settings: product-only, user-only, neighbor-only, and combined evidence. Beyond standard generation metrics, we introduce Dissonance Analysis, a macro-level evaluation framework that measures deviation from expected user style and product-level consensus. We also study visual evidence as an auxiliary context source and find that it can improve textual quality in some settings, while graph-derived evidence remains the main driver of personalization and consistency. Across product categories, PeReGrINE offers a reproducible way to study how evidence composition affects review fidelity, personalization, and grounding in retrieval-conditioned language models.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.