Directional Deep Embedding and Appearance Learning for Fast Video Object Segmentation

Abstract : Most recent semi-supervised video object segmentation (VOS) methods rely on fine-tuning deep convolutional neural networks online using the given mask of the first frame or predicted masks of subsequent frames. However, the online fine-tuning process is usually time-consuming, limiting the practical use of such methods. We propose a directional deep embedding and appearance learning (DDEAL) method, which is free of the online fine-tuning process, for fast VOS. First, a global directional matching module, which can be efficiently implemented by parallel convolutional operations, is proposed to learn a semantic pixel-wise embedding as an internal guidance. Second, an effective directional appearance model based statistics is proposed to represent the target and background on a spherical embedding space for VOS. Equipped with the global directional matching module and the directional appearance model learning module, DDEAL learns static cues from the labeled first frame and dynamically updates cues of the subsequent frames for object segmentation. Our method exhibits state-of-the-art VOS performance without using online fine-tuning. Specifically, it achieves a & mean score of 74.8% on DAVIS 2017 dataset and an overall score of 71.3% on the large-scale YouTube-VOS dataset, while retaining a speed of 25 fps with a single NVIDIA TITAN Xp GPU. Furthermore, our faster version runs 31 fps with only a little accuracy loss.
 EXISTING SYSTEM :
 ? we develop a method for video object segmentation with the following design goals: A VOS method should be ? Simple: Only a single neural network and no simulated data is used. ? Fast: The whole system is fast for deployment. In particular, the model does not rely on first-frame finetuning. ? End-to-end: The multi-object segmentation problem, where each video contains a different number of objects, is tackled in an end-to-end way. ? In this setup, each frame can be segmented individually and only information from matching globally to the first frame is used.
 DISADVANTAGE :
 ? To tackle the aforementioned problem of online fine-tuning-dependent VOS methods, recent studies have focused on designing fine-tuning-free network architectures which can completely avoid online optimization. ? The A-GAME achieves fine-tuning-free VOS by learning the target appearance with high dimensional deep features in the Euclidean space. ? However, learning probabilistic generative model in the high dimensional Euclidean space faces the problem of “curse of dimensionality”, which attenuates the representation of target and background for effective segmentation.
 PROPOSED SYSTEM :
 • Many online VOS methods have been proposed however, most of these methods limit their real-world applications due to computationally expensive online finetuning. • The proposed network is trained offline and does not require fine-tuning. Our algorithm achieved an overall J and F score of 64.9 on the DAVIS 2020 test-challenge data and 60.9 on the DAVIS 2020 test-dev dataset. • Many fine-tuning-free VOS algorithms have been proposed to learn pixel-wise embedding learning in Euclidean space. These approaches require high-computation because of similarity matching in Euclidean space.
 ADVANTAGE :
 ? We use two important measures, the mean Jaccard index , i.e. intersection-over-union (IoU), and the mean contour accuracy to evaluate the segmentation performance. ? All evaluation results are computed on DAVIS-2016’s validation set which includes 20 video sequences. ? We compare our proposed DDEAL with state-of-the-art fine-tuning-dependent and fine-tuning-free VOS methods. ? The average of all the four measures is 8 calculated as the overall performance . ? We compare our proposed DDEAL with recent state-of-the-art methods S2S, OnAVOS, OSVOS, MSK, A-GAME , RGMP and OSMN.

We have more than 145000 Documents , PPT and Research Papers

Have a question ?

Mail us : info@nibode.com