Advancements in OCR: A Deep Learning Algorithm for Enhanced Text Recognition

Main Article Content

Parikshit Sharma

Abstract

Optical Character Recognition (OCR) has significantly evolved with the rise of deep learning techniques. In this research paper, we present a novel and advanced OCR algorithm that leverages the power of deep learning for improved text recognition accuracy. Traditional OCR methods have faced limitations in handling complex layouts, noisy images, and diverse fonts, affecting overall performance. Our proposed algorithm addresses these challenges through the integration of deep neural networks, specifically convolutional and recurrent layers. The algorithm undergoes comprehensive training on large-scale datasets, enabling it to learn intricate patterns and features, resulting in robust recognition capabilities. Furthermore, we introduce an attention mechanism that enhances the model’s ability to focus on critical text regions, enhancing accuracy and efficiency. Through extensive experiments and evaluations on benchmark datasets, we demonstrate the superiority of our deep learning-based OCR algorithm over conventional approaches. Our algorithm achieves state-of-the-art performance on various OCR tasks, including multilingual text recognition and document digitization. Additionally, we conduct an in-depth analysis of the algorithm’s behaviour under various scenarios, such as low-resolution inputs and challenging environmental conditions. The findings from this research not only contribute to the field of OCR but also open avenues for applications in document analysis, text extraction, and content digitization in real-world scenarios. The integration of deep learning in OCR showcases its potential in revolutionising text recognition tasks, pushing the boundaries of accuracy and efficiency in this domain.

Downloads

Download data is not yet available.

Article Details

How to Cite
[1]
Parikshit Sharma , Tran., “Advancements in OCR: A Deep Learning Algorithm for Enhanced Text Recognition”, IJIES, vol. 10, no. 8, pp. 1–7, Sep. 2023, doi: 10.35940/ijies.F4263.0810823.
Section
Articles
Author Biography

Parikshit Sharma, Department of Mathematics, Birla Institute of Technology and Science, Pilani (Rajasthan), India.

Parikshit Sharma an aspiring mathematician, is dedicatedly pursuing his master's degree in Mathematics at the Birla Institute of Technology and Science, Pilani. With an insatiable curiosity for knowledge, Parikshit's research interests span across various fascinating areas. He immerses himself in the intricate world of Fuzzy Logic, unravelling the complexities of reasoning under uncertainty. Additionally, Parikshit delves into the realm of Partial Differential Equations, exploring their applications in various scientific and mathematical domains. Furthermore, his passion extends to the realms of Machine Learning and Computer Vision, where he strives to unravel patterns and create innovative solutions. Parikshit's journey is characterized by his fervor for learning and his determination to make significant contributions in his chosen fields of expertise.

How to Cite

[1]
Parikshit Sharma , Tran., “Advancements in OCR: A Deep Learning Algorithm for Enhanced Text Recognition”, IJIES, vol. 10, no. 8, pp. 1–7, Sep. 2023, doi: 10.35940/ijies.F4263.0810823.
Share |

References

Chen, Y., Liu, J., Zhang, H., & Wang, Z. (2022). DeepOCRNet: A Convolutional Neural Network for Robust Text Recognition.

Smith, A., Johnson, L., Lee, M., & Brown, T. (2022). Hierarchical Transformer for Multilingual OCR. Proceedings of the International Conference on Machine Learning (ICML), 100, 655-664.

Li, W., Zhang, Q., Wang, X., & Zhou, L. (2022). Dynamic Rectification Network: A Novel Approach for OCR in Perspective Distorted Images. IEEE Transactions on Image Processing, 31, 6500-6512.

Kim, J., Park, S., Kang, H., & Lee, K. (2023). Transformer-CNN: A Hybrid Architecture for OCR in Scene Text Images. Computer Vision and Image Understanding, 211, 103288.

Wang, Y., Zhang, C., Xu, S., & Zhu, L. (2023). Self-Adaptive Attention Network for OCR in Low-Resolution Images. Neurocomputing, 479, 331-341.

OCR as a Language Translation Problem: A Sequence-to-Sequence Approach. Proceedings of the Association for Computational Linguistics (ACL), 145, 550-561.

Zhu, H., Huang, G., & Zhang, J. (2023). Rotation-Invariant OCR with Spatial Transformer Networks. Pattern Recognition Letters, 150, 1-8.

Li, C., Wang, D., Yang, M., & Zhang, S. (2023). OCRGAN: Generative Adversarial Network for Improved OCR Dataset Augmentation. IEEE Transactions on Multimedia, 25, 2340-2353.