On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images a Survey
Abstract : Stereo matching is one of the longest-standing problems in computer vision with close to 40 years of studies and research. Throughout the years the paradigm has shifted from local, pixel-level decision to various forms of discrete and continuous optimization to data-driven, learning-based methods. Recently, the rise of machine learning and the rapid proliferation of deep learning enhanced stereo matching with new exciting trends and applications unthinkable until a few years ago. Interestingly, the relationship between these two worlds is two-way. While machine, and especially deep, learning advanced the state-of-the-art in stereo matching, stereo itself enabled new ground-breaking methodologies such as self-supervised monocular depth estimation based on deep networks. In this paper, we review recent research in the field of learning-based depth estimation from single and binocular images highlighting the synergies, the successes achieved so far and the open challenges the community is going to face in the immediate future.
? we analyze the problem domain and its characteristics that make it difficult to work with images from this domain.
? We also look at existing datasets, their advantages, and their problems.
? Then we explain the hypothesis and the reason for attempting simulation. The simulation characteristics and parameters are also explained in this section.
? Afterwards, we explain our methodology in solving the Stereo Correspondence problem using deep learning and report the result of experiments on both synthesized data and real data.
? we move our focus to a new and exciting research trend: depth estimation from a single image, for which the synergy between stereo and deep learning recently allowed for results unimaginable just a few years ago.
? In monocular depth estimation, the goal is to learn a non-linear mapping between a single RGB image and its corresponding depth map.
? Even though this task comes natural to humans, it is an ill-posed problem, since a single 2D image might originate from an infinite number of different 3D scenes.
? In addition to optimizing the cost function for each scan-line, some constraints between the neighboring scan-lines can be used to reduce the ambiguity.
? Ohta and Kanade try optimizing a two-dimensional area around the scan-line.
? They have integrated the between scan-line optimization into the original optimization process.
? Belhumeur approached this issue in two stages. First, optimizing the cost function for each scan-line then smoothing disparities between the scan-lines. Cox et al.
? proposed to reduce the inconsistencies between scan-lines by penalizing the discontinuities.
? We observe that faster models (DispNet-C, MADNet and StereoNet) aiming for real-time performance achieve the worse D1-all score among all methods including the MCCNN-acrt pipeline.
? We also point out that unsupervised models like OASM-Net already outperform conventional non-data-driven algorithms like SGM.
? Efficiency, by maximizing accuracy improvement out of each adaptation step, is desirable when adapting online to new environments.
? To achieve a starting parameter configuration that is suitable for adaptation, Tonioni et al. propose the Learning to Adapt (L2A) training protocol.
|