Skip to main content Skip to main navigation


Rethinking RNN-Based Video Object Segmentation

Fatemeh Azimi; Federico Raue; Joern Hees; Andreas Dengel
In: Computer Vision, Imaging and Computer Graphics Theory and Applications: 16th International Joint Conference, VISIGRAPP 2021. International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP-2021), February 8-10, Online, Pages 348-365, springer, 2023.


Video Object Segmentation is a fundamental task in computer vision that aims at pixel-wise tracking of one or multiple foreground objects within a video sequence. This task is challenging due to real-world requirements such as handling unconstrained object and camera motion, occlusion, fast motion, and motion blur. Recently, methods utilizing RNNs have been successful in accurately and efficiently segmenting the target objects as RNNs can effectively memorize the object of interest and compute the spatiotemporal features which are useful in processing the visual sequential data. However, they have limitations such as lower segmentation accuracy in longer sequences. In this paper, we expand our previous work to develop a hybrid architecture that successfully eliminates some of these challenges by employing additional correspondence matching information, followed by extensively exploring the impact of various architectural designs. Our experiment results on YouTubeVOS dataset confirm the efficacy of our proposed architecture by obtaining an improvement of about 12pp on YoutTubeVOS compared to RNN-based baselines without a considerable increase in the computational costs.


Weitere Links