Beyond VMAF: Towards Application-Specific Metrics for Teleoperation Video

May 13, 20262605.13525

Ines Trautmannsheimer, Richard Grauberger, Frank Diermeyer

cs.HCcs.RO

TLDR

This paper retrains VMAF with teleoperation-specific data, showing improved alignment with human perception for remote driving video quality.

Key contributions

Conducted an online study to gather subjective quality ratings for teleoperation video.
Retrained the VMAF model using domain-specific human ratings, creating an adapted variant.
The adapted VMAF significantly improved alignment with human perception (15% RMSE, 27% MAD reduction).
Identified outlier cases where high objective scores missed critical degradations in driving tasks.

Why it matters

This research underscores the importance of domain-specific data for enhancing video quality metrics in safety-critical applications like teleoperation. The adapted VMAF offers a more reliable tool for assessing video crucial for remote driving. However, the identified outliers suggest further work is needed to capture all critical degradations.

Original Abstract

Automated driving has made remarkable progress, yet situations still arise where human intervention is necessary. Teleoperation provides a scalable solution to address such cases, enabling remote operators to support vehicles without being physically present. In this context, video transmission forms the operator's primary source of situational awareness, making video quality a decisive factor for both safety and task performance. In an online study, participants rated compressed video sequences from the Zenseact Dataset and provided subjective quality ratings. These ratings were then used to retrain the Video Multi-Method Assessment Fusion (VMAF) model, yielding an adapted variant tailored to teleoperation. The retrained model demonstrated improved alignment with human ratings compared to the original 4K VMAF. In particular, RMSE decreased from 10.36 to 8.83, and MAD from 8.71 to 6.38, corresponding to improvements of 15% and 27%, respectively. These results highlight that incorporating domain-specific data can enhance the predictive power of established quality metrics in safety-critical applications. At the same time, Outlier cases emerged in which videos received high objective scores despite noticeable degradations in regions critical for the driving task.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers