Rethinking Dense Optical Flow without Test-Time Scaling
Praroop Chanda, Suryansh Kumar
TLDR
This paper proposes a single-pass optical flow method leveraging foundation models to achieve strong performance without computationally expensive test-time scaling.
Key contributions
- Estimates dense optical flow in a single forward pass, avoiding iterative refinement.
- Leverages visual semantic features from DINO-v2 and geometric cues from a depth model.
- Fuses complementary priors into a unified representation for global matching.
- Achieves strong cross-dataset generalization and SOTA performance on Sintel Final.
Why it matters
Existing dense optical flow methods rely on computationally expensive test-time scaling and iterative refinement. This work demonstrates that powerful foundation model priors can eliminate this need, offering a significantly more efficient and accurate alternative. It opens new avenues for high-performance, single-pass optical flow.
Original Abstract
Recent progress in dense optical flow has been driven by increasingly complex architectures and multi-step refinement for test-time scaling. While these approaches achieve strong benchmark performance, they also require substantial computation during inference. This raises a fundamental question: Is scaling test-time computation the only way to improve dense optical flow accuracy? We argue that it is not. Instead, powerful visual semantic and geometric priors encoded in modern foundation models can reduce, if not overcome, the need for computationally expensive iterative refinement at test-time. In this paper, we present a framework that estimates dense optical flow in a single forward pass, leveraging pretrained foundation representations, while avoiding iterative refinement and additional inference-time computation, thus offering an alternative to test-time scaling. Our method extracts visual semantic features from a frozen DINO-v2 backbone and combines them with geometric cues from a monocular depth foundation model. We fuse these complementary priors into a unified representation and apply a global matching formulation to estimate dense correspondences without recurrent updates or test-time optimization. Despite avoiding iterative refinement, our approach achieves strong cross-dataset generalization across challenging benchmarks. On Sintel Final, we obtain 2.81 EPE without refinement, significantly improving over state-of-the-art (SOTA) SEA-RAFT under comparable training conditions and outperforming RAFT, GMFlow (without refinement), and recent FlowSeek in the same setting. These results suggest that strong foundation priors can substitute for test-time scaling, offering a computationally efficient alternative to refinement-heavy pipelines.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.