ArXiv TLDR

LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)

🐦 Tweet
2605.05187

Wei Luo, Yiting Lu, Xin Li, Haoran Li, Fengbin Guan + 30 more

cs.CV

TLDR

The LoViF 2026 PhyScore challenge evaluates 4D world models on holistic quality, including physical realism and temporal consistency.

Key contributions

  • Introduces LoViF 2026 PhyScore challenge for holistic 4D world model quality assessment.
  • Focuses on metrics for Video Quality, Physical Realism, Condition-Video Alignment, and Temporal Consistency.
  • Provides a benchmark of 1,554 videos covering 26 physics-relevant categories and 3 generation tracks.
  • Evaluates solutions on both overall score prediction and fine-grained physical anomaly localization.

Why it matters

Current world model evaluation often overlooks physical plausibility and temporal consistency. This challenge addresses that critical gap by providing a robust benchmark and evaluation framework for holistic quality assessment. It pushes the field towards developing more realistic and consistent 4D world models.

Original Abstract

This paper reports on the LoViF 2026 PhyScore challenge, a competition on holistic quality assessment of world-model-generated videos across both 2D and 4D generation settings. The challenge is motivated by a central gap in current evaluation practice: perceptual quality alone is insufficient to judge whether generated dynamics are physically plausible, temporally coherent, and consistent with input conditions. Participants are required to build a metric that jointly predicts four dimensions, i.e., Video Quality, Physical Realism, Condition-Video Alignment, and Temporal Consistency. Depart from that, participants also need to localize physical anomaly timestamps for fine-grained diagnosis. The benchmark dataset contains 1,554 videos generated by seven representative world generative models, organized into three tracks (text-2D, image-to-4D, and video-to-4D) and spanning 26 categories. These categories explicitly cover physics-relevant scenarios, including dynamics, optics, and thermodynamics, together with diverse real-world and creative content. To ensure label reliability, scores and anomaly timestamps are produced through trained human annotation with an additional automated quality-control pass. Evaluation is based on both score prediction and anomaly localization, with a composite protocol that combines TimeStamp_IOU and SRCC/PLCC. This report summarizes the challenge design and provides method-level insights from submitted solutions.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.