MegaFlow: Zero-Shot Large Displacement Optical Flow

Dingxi Zhang¹, Fangjinhua Wang¹, Marc Pollefeys^1,2, Haofei Xu¹

¹ETH Zurich ²Microsoft

MegaFlow excels at large displacement optical flow and point tracking. (a) On the Sintel (Final) benchmark, MegaFlow consistently achieves the lowest End-Point Error (EPE), with its advantage widening significantly on large displacements. (b) MegaFlow also demonstrates superior zero-shot point tracking results on TAP-Vid. (c) Visuals and inset error maps further illustrate our state-of-the-art results.

Overview Video

Pipeline Overview

Given an input sequence, a frozen DINO and a trainable CNN extract dense patch tokens and local structural features. Alternating frame and global attention, followed by feature fusion, process these tokens into a globally consistent representation. Pair-wise global matching then computes initial flows. Finally, a recurrent module iteratively refines the initial flows using spatial convolutions and temporal attention for sub-pixel accuracy. Crucially, our design seamlessly processes variable-length inputs and extend to point tracking without architectural modifications.

Optical Flow Results

Point Tracking Results

* Displaying 1/16 points. Note: As a zero-shot tracking application of our flow model, point visibility is not explicitly predicted, resulting in tracking through occlusions.

BibTeX

@inproceedings{zhang2026megaflow,
  title     = {MegaFlow: Zero-Shot Large Displacement Optical Flow},
  author    = {Zhang, Dingxi and Wang, Fangjinhua and Pollefeys, Marc and Xu, Haofei},
  booktitle = {arXiv preprint arXiv:},
  year      = {2026}
}