ArXiv TLDR

ReLIC-SGG: Relation Lattice Completion for Open-Vocabulary Scene Graph Generation

🐦 Tweet
2604.22546

Amir Hosseini, Sara Farahani, Xinyi Li, Suiyang Guang

cs.CV

TLDR

ReLIC-SGG addresses open-vocabulary scene graph generation's annotation incompleteness by inferring missing relations using a semantic relation lattice.

Key contributions

  • Proposes ReLIC-SGG, treating unannotated relations as latent variables to handle annotation incompleteness.
  • Builds a semantic relation lattice to model predicate similarity, entailment, and contradiction.
  • Infers missing positive relations using visual-language compatibility, graph context, and semantic consistency.
  • Employs a positive-unlabeled graph learning objective and lattice-guided decoding for better scene graphs.

Why it matters

Open-vocabulary SGG is crucial for flexible scene understanding, but current methods suffer from incomplete and ambiguous annotations. ReLIC-SGG's novel approach infers missing relations using a semantic lattice, significantly improving performance. This advances the field by enabling more accurate and comprehensive scene descriptions, especially for rare and unseen predicates.

Original Abstract

Open-vocabulary scene graph generation (SGG) aims to describe visual scenes with flexible relation phrases beyond a fixed predicate set. Existing methods usually treat annotated triplets as positives and all unannotated object-pair relations as negatives. However, scene graph annotations are inherently incomplete: many valid relations are missing, and the same interaction can be described at different granularities, e.g., \textit{on}, \textit{standing on}, \textit{resting on}, and \textit{supported by}. This issue becomes more severe in open-vocabulary SGG due to the much larger relation space. We propose \textbf{ReLIC-SGG}, a relation-incompleteness-aware framework that treats unannotated relations as latent variables rather than definite negatives. ReLIC-SGG builds a semantic relation lattice to model similarity, entailment, and contradiction among open-vocabulary predicates, and uses it to infer missing positive relations from visual-language compatibility, graph context, and semantic consistency. A positive-unlabeled graph learning objective further reduces false-negative supervision, while lattice-guided decoding produces compact and semantically consistent scene graphs. Experiments on conventional, open-vocabulary, and panoptic SGG benchmarks show that ReLIC-SGG improves rare and unseen predicate recognition and better recovers missing relations.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.