ON THE SYNERGIES BETWEEN MACHINE LEARNING AND BINOCULAR STEREO FO DEPTH ESTIMATION FROM IMAGES A SURVEY
Abstract
Stereo matching is one of the longest-standing problems in computer vision with close to 40 years of studies and research. Throughout the years the paradigm has shifted from local, pixel-level decision to various forms of discrete and continuous optimization to data-driven, learning-based methods.Recently, the rise of machine learning and the rapid proliferation of deep learning enhanced stereo matching with new exciting trends and applications unthinkable until a few years ago. Interestingly, the relationship between these two worlds is two-way. While machine, and especially deep, learning advanced the state-of-the-art in stereo matching, stereo itself enabled new ground-breaking methodologies such as self-supervised monocular depth estimation based on deep networks. In this paper, we review recent research in the field of learning-based depth estimation from single and binocular images highlighting the synergies, the successes achieved so far and the open challenges the community is going to face in the immediate future. Index Terms—Stereo matching, machine learning, deep learning, monocular depth estimation
Existing System
? While machine, and especially deep, learning advanced the state-of-the-art in stereo matching, stereo itself enabled new ground-breaking methodologies such as self-supervised monocular depth estimation based on deep networks. ? In this paper, we review recent research in the field of learning-based depth estimation from single and binocular images highlighting the synergies, the successes achieved so far and the open challenges the community is going to face in the immediate future. ? Index Terms—Stereo matching, machine learning, deep learning, monocular depth estimation
Disadvantages
? Then, we consider two aspects concerning respectively the conventional pipelines and the end-to-end models, that are confidence estimation, covered in the domain-shift problem, introduced in together with techniques aimed at mitigating it. ? In most computer vision problems, the availability of large and diverse datasets is of paramount importance for successfully developing new algorithms and for being able to measure their effectiveness.
Proposed System
? A few years later, an improved dataset and benchmark for scene flow estimation was proposed. In this case, the dataset consists of color stereo pairs, evenly split into training and test sets ? Three main versions have been proposed between 2002 [1] and with varying resolution and image content. We will focus on this latter version, namely Middlebury 2014, since it provides an online benchmark for evaluation and still represents one of the most challenging datasets for stereo matching. ? As evidence, we underline that most of the proposed end-to-end networks for stereo matching are trained from scratch on this large dataset, before being fine-tuned on real data.
Advantages
? The same procedure used for KITTI 2012 is followed here to obtain ground truth labels, except for moving objects whose 3D points cannot be properly accumulated over time. ? Some of the data are used multiple times under different exposure and illumination conditions. ? This dataset has rarely been used in the evaluationof stereo approaches .it is much more popular in optical flow. ? Although rarely used to train stereo networks [19], the aforementioned modelling power makes it a promising tool for future research
