CARD: Non-Uniform Quantization of Visual Semantic Unit for Generative Recommendation
Yibiao Wei, Jie Zou, Pengfei Zhang, Xiao Ao, Weikang Guo + 2 more
TLDR
CARD enhances generative recommendation by integrating multimodal signals and employing non-uniform quantization for better semantic unit learning.
Key contributions
- Introduces a visual semantic unit unifying textual, visual, and collaborative signals for holistic semantic modeling.
- Develops NU-RQ-VAE, a non-uniform quantization framework, to map skewed semantic distributions to a balanced latent space.
- Significantly improves codebook utilization and quantization accuracy, outperforming baselines in generative recommendation.
Why it matters
Generative recommendation struggles with insufficient supervision for multimodal fusion and non-uniform item embeddings. CARD addresses these by creating a unified visual semantic unit and a novel non-uniform quantization method. This leads to more accurate and balanced item representations, improving recommendation quality.
Original Abstract
Generative recommendation frameworks typically represent items as discrete Semantic IDs (SIDs). While existing studies have sought to enhance SID construction by incorporating multimodal content, collaborative signals, or more advanced quantization techniques, learning high-quality SIDs still faces two key challenges: (1) The two-stage generative recommendation paradigm (SID construction and autoregressive generation) provides insufficient supervision for heterogeneous fusion, which hinders learning high-quality SIDs, and (2) non-uniform embeddings lead to codeword imbalance and generation bias. To address these challenges, we propose a novel generative recommendation framework, called CARD. CARD introduces a visual semantic unit that unifies textual, visual, and collaborative signals into a structured visual representation prior to encoding, enabling holistic semantic modeling and effectively alleviating the semantic gap, thereby reducing the reliance on supervision signals during SID learning. Furthermore, to deal with the highly non-uniform distribution of item semantic embeddings in recommendation scenarios, we develop a non-uniform quantization framework (NU-RQ-VAE), which incorporates a learnable and invertible non-uniform transformation into the quantization process to map skewed semantic distributions into a more balanced latent space, thereby significantly improving codebook utilization and quantization accuracy. Experiments on multiple datasets show that CARD consistently outperforms baseline methods under various settings; meanwhile, the proposed non-uniform transformation module is plug-and-play and remains robust across different quantization schemes. Code is available at https://github.com/HAI-UESTC/CARD.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.