On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images a Survey

Abstract : Stereo matching is one of the longest-standing problems in computer vision with close to 40 years of studies and research. Throughout the years the paradigm has shifted from local, pixel-level decision to various forms of discrete and continuous optimization to data-driven, learning-based methods. Recently, the rise of machine learning and the rapid proliferation of deep learning enhanced stereo matching with new exciting trends and applications unthinkable until a few years ago. Interestingly, the relationship between these two worlds is two-way. While machine, and especially deep, learning advanced the state-of-the-art in stereo matching, stereo itself enabled new ground-breaking methodologies such as self-supervised monocular depth estimation based on deep networks. In this paper, we review recent research in the field of learning-based depth estimation from single and binocular images highlighting the synergies, the successes achieved so far and the open challenges the community is going to face in the immediate future.
 ? we analyze the problem domain and its characteristics that make it difficult to work with images from this domain. ? We also look at existing datasets, their advantages, and their problems. ? Then we explain the hypothesis and the reason for attempting simulation. The simulation characteristics and parameters are also explained in this section. ? Afterwards, we explain our methodology in solving the Stereo Correspondence problem using deep learning and report the result of experiments on both synthesized data and real data.
 ? we move our focus to a new and exciting research trend: depth estimation from a single image, for which the synergy between stereo and deep learning recently allowed for results unimaginable just a few years ago. ? In monocular depth estimation, the goal is to learn a non-linear mapping between a single RGB image and its corresponding depth map. ? Even though this task comes natural to humans, it is an ill-posed problem, since a single 2D image might originate from an infinite number of different 3D scenes.
 ? In addition to optimizing the cost function for each scan-line, some constraints between the neighboring scan-lines can be used to reduce the ambiguity. ? Ohta and Kanade try optimizing a two-dimensional area around the scan-line. ? They have integrated the between scan-line optimization into the original optimization process. ? Belhumeur approached this issue in two stages. First, optimizing the cost function for each scan-line then smoothing disparities between the scan-lines. Cox et al. ? proposed to reduce the inconsistencies between scan-lines by penalizing the discontinuities.
 ? We observe that faster models (DispNet-C, MADNet and StereoNet) aiming for real-time performance achieve the worse D1-all score among all methods including the MCCNN-acrt pipeline. ? We also point out that unsupervised models like OASM-Net already outperform conventional non-data-driven algorithms like SGM. ? Efficiency, by maximizing accuracy improvement out of each adaptation step, is desirable when adapting online to new environments. ? To achieve a starting parameter configuration that is suitable for adaptation, Tonioni et al. propose the Learning to Adapt (L2A) training protocol.
Download DOC Download PPT

We have more than 145000 Documents , PPT and Research Papers

Have a question ?

Mail us : info@nibode.com