Object Detection System with Voice Output using Python

Abstract : As object recognition technology has developed recently, various technologies have been applied to autonomous vehicles, robots, and industrial facilities. However, the benefits of these technologies are not reaching the visually impaired, who need it the most. In this paper, we proposed an object detection system for the blind using deep learning technologies. We use voice recognition technology in order to know what objects a blind person wants, and then to find the objects via object recognition. Furthermore, a voice guidance technique is used to inform sight impaired persons as to the location of objects. The object recognition deep learning model utilizes the Single Shot Multibox Detector (SSD) neural network architecture, and voice recognition is designed through speech-to-text (STT) technology. In addition, a voice announcement is synthesized using text-to-speech (TTS) to make it easier for the blind to get information about objects. The system is builtusing python OpenCV tool. As a result, we implement an efficient object-detection system that helps the blind find objects in a specific space without help from others, and the system is analyzed through experiments to verify performance.
 EXISTING SYSTEM :
 ? Generic object detection aims at locating and classifying existing objects in any one image, and labeling them with rectangular bounding boxes to show the confidences of existence. ? The improvements over existing CNN methods can be obtained by carefully designing the framework and classifiers, extracting multi-scale and part based semantic information and searching for complementary information from other related tasks, such as segmentation.
 DISADVANTAGE :
 ? Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. ? Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output. ? The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood clearly. ? An intelligible text-to-speech program allows people with visual impairments or reading disabilities to listen to written words on a home computer.
 PROPOSED SYSTEM :
 • The object identified is later converted to an audio segment using gTTs which is a python library. • The audio segment is the output of our system that gives the spatial location and name of the object to the person. Now by using this information the person can have a visualization of the objects around him. • The proposed system will even protect the person from colliding to the objects around will secure him from injuries. The proposed system will be able to identify the object in front of the camera and will later on convert it into mp3 using gTTS. • The proposed system is very low cost, FIG 3 shows the whole system which is a Raspberry pi 3b+, Bluetooth headphones and a power bank in order to provide power to the raspberry pi.
 ADVANTAGE :
 ? OpenCV library is used for image processing since it provides support to real time applications. Python programming language is used for build the machine learning model. TensorFlow library is used for writing machine learning application process. ? It provides high performance numerical computation. It has a flexible architecture which makes easy deployment of computation across a variety of platforms possible. ? The predicted region proposals are then reshaped using a region of interest (ROI) pooling layer, which is then used toclassify the image within the proposed region and predict the offset values for the bounding boxes.

We have more than 145000 Documents , PPT and Research Papers

Have a question ?

Mail us : info@nibode.com