Optical Character Recognition System (OCR) using machine learning

      

ABSTARCT :

OCR is used to identify the character from human written text. To recognize the text segmentation of character is important stage. So here, we addressed different techniques to recognize the character. This document also presents comparison of different languages for character and numeral recognition with its accuracy achieved by different writer. Segmentation problem of each language were different also handwritten character was also varied user to user, so it is necessary to make OCR systems more effective and accurate for segmentation. Comparative study concludes that deep learning technique gives good segmentation and gives better result in case with large dataset compares to other techniques.

EXISTING SYSTEM :

Machine Learning-based Optical character recognition (OCR) Scanner will convert images of a typed, handwritten or printed text into machine-encoded text. It has been man’s ancient dream to develop machines which replicate human functions. One such replication of human functions is reading of documents encompassing different forms of text. Over the last few decades machine reading has grown from dream to reality through the development of sophisticated and robust Optical character recognition (OCR) systems [1].

DISADVANTAGE :

1. Data Dependency Large Dataset Requirement: Machine learning models need substantial amounts of labeled training data to perform well. Collecting and annotating this data can be labor-intensive and costly. Data Quality: The performance of machine learning OCR models heavily depends on the quality and diversity of the training data. Poor or insufficient data can lead to suboptimal model performance. 2. Computational Demands Resource Intensive: Training machine learning models, especially deep learning ones, requires significant computational resources (e.g., powerful GPUs). This can be expensive and may not be feasible for all organizations. Processing Power: Even during inference (i.e., when the model is used to make predictions), the computational requirements can be high, potentially affecting real-time performance.

PROPOSED SYSTEM :

The focus of this application is to help various educators, lecturers, and students to make a text document of their handwritten notes. The process of character recognition can be divided into two parts, namely, printed and handwritten character recognition. The printed documents a further be divided into two parts: good quality printed documents and degraded printed documents. Handwritten character recognition has been divided into offline and online character recognition [2] .

ADVANTAGE :

Improved Accuracy Advanced Algorithms: Machine learning models, especially deep learning approaches like Convolutional Neural Networks (CNNs), can better recognize characters by learning complex patterns and features from large datasets. Handling Variability: They can handle a wide range of fonts, handwriting styles, and distortions more effectively than traditional methods. 2. Flexibility and Adaptability Learning from Data: Machine learning models can be trained on diverse datasets to adapt to different languages, scripts, and formats. Continuous Improvement: Models can be updated and retrained with new data to improve performance and adapt to new challenges. 3.Robustness to Noise and Distortions Noise Handling: Machine learning models are better at handling noisy or degraded images, thanks to their ability to learn features that are invariant to such distortions. Image Quality Variability: They can work well even with low-resolution images or those with varying lighting conditions. 4. Contextual Understanding Context-Aware Recognition: Modern OCR systems can leverage context (e.g., using language models) to improve recognition accuracy, especially in cases where individual characters are ambiguous. Semantic Understanding: They can use context to correct errors based on the meaning of the text, such as correcting common OCR mistakes based on the surrounding words.

Download DOC Download PPT

We have more than 145000 Documents , PPT and Research Papers

Have a question ?

Chat on WhatsApp