A Holistic Approach to Urdu Language Word Recognition using Deep Neural Networks

Authors

  • H. R. Khan Department of Electronic Engineering, NED University of Engineering and Technology, Pakistan
  • M. A. Hasan Department of Bio-Medical Engineering, NED University of Engineering and Technology, Pakistan
  • M. Kazmi Faculty of Electrical and Computer Engineering, NED University of Engineering & Technology, Pakistan
  • N. Fayyaz Department of Electrical Engineering, NED University of Engineering and Technology, Pakistan
  • H. Khalid Department of Computer and Information Systems Engineering, NED University of Engineering and Technology, Pakistan
  • S. A. Qazi Department of Computer and Information Systems Engineering, NED University of Engineering and Technology, Pakistan

Abstract

Urdu is one of the most popular languages in the world. It is a Persianized standard register of the Hindi language with considerable and valuable literature. While digital libraries are constantly replacing conventional libraries, a vast amount of Urdu literature is still handwritten. Digitizing this handwritten literature is essential to preserve it and make it more accessible. Nevertheless, the scarcity of Urdu Optical Character Recognition (OCR) research limits a digital library's scope to a manual document search. The limited research work in this area is mainly due to the complexity of Urdu Script. Unlike the English language, the Urdu writing style is cursive, bidirectional, and character shapes and sizes highly vary depending on their position. Holistic word recognition is found to be a better solution among many other text segmentation techniques as it takes the complete word into account instead of segmenting it explicitly or implicitly. For this project, the data of five different Urdu words were collected for training and testing a convolutional neural network and 96% recognition accuracy was achieved.

Keywords:

word recognition, Urdu, deep learning, cursive writting

Downloads

Download data is not yet available.

References

M. W. Sagheer, C. L. He, N. Nobile, and C. Y. Suen, "Holistic Urdu Handwritten Word Recognition Using Support Vector Machine," in 2010 20th International Conference on Pattern Recognition, Aug. 2010, pp. 1900–1903. DOI: https://doi.org/10.1109/ICPR.2010.468

A. Abidi, A. Jamil, I. Siddiqi, and K. Khurshid, "Word Spotting Based Retrieval of Urdu Handwritten Documents," in 2012 International Conference on Frontiers in Handwriting Recognition, Bari, Italy, Sep. 2012, pp. 331–336. DOI: https://doi.org/10.1109/ICFHR.2012.289

H. Bunke and P. S. P. Wang, Handbook of Character Recognition and Document Image Analysis. World Scientific, 1997. Chaudhuri, K. Mandaviya, P. Badelia, and S. K. Ghosh, Optical Character Recognition Systems for Different Languages with Soft Computing. Springer International Publishing, 2017.

N. H. Khan and A. Adnan, "Urdu Optical Character Recognition Systems: Present Contributions and Future Directions," IEEE Access, vol. 6, pp. 46019–46046, 2018. DOI: https://doi.org/10.1109/ACCESS.2018.2865532

U. Pal and A. Sarkar, "Recognition of Printed Urdu Script," in Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2, USA, Aug. 2003, pp. 1183-1187.

Q. U. A. Akram and S. Hussain, "Improving Urdu Recognition Using Character-Based Artistic Features of Nastalique Calligraphy," IEEE Access, vol. 7, pp. 8495–8507, 2019. DOI: https://doi.org/10.1109/ACCESS.2018.2887103

Z. Ahmad, J. K. Orakzai, and I. Shamsher, "Urdu compound Character Recognition using feed forward neural networks," in 2009 2nd IEEE International Conference on Computer Science and Information Technology, Beijing, China, Aug. 2009, pp. 457–462. DOI: https://doi.org/10.1109/ICCSIT.2009.5234683

V. Lavrenko, T. M. Rath, and R. Manmatha, "Holistic word recognition for handwritten historical documents," in First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings., Palo Alto, CA, USA, Jan. 2004, pp. 278–287.

N. P. T. Kishna and S. Francis, "Intelligent tool for Malayalam cursive handwritten character recognition using artificial neural network and Hidden Markov Model," in 2017 International Conference on Inventive Computing and Informatics (ICICI), Coimbatore, India, Nov. 2017, pp. 595–598. DOI: https://doi.org/10.1109/ICICI.2017.8365201

M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, "Reading Text in the Wild with Convolutional Neural Networks," International Journal of Computer Vision, vol. 116, no. 1, pp. 1–20, Jan. 2016. DOI: https://doi.org/10.1007/s11263-015-0823-z

X. Chen and A. L. Yuille, "Detecting and reading text in natural scenes," in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., Washington, DC, USA, Jun. 2004, vol. 2.

P. R. Cavalin, A. de Souza Britto, F. Bortolozzi, R. Sabourin, and L. E. S. Oliveira, "An implicit segmentation-based method for recognition of handwritten strings of characters," in Proceedings of the 2006 ACM symposium on Applied computing, New York, NY, USA, Apr. 2006, pp. 836–840. DOI: https://doi.org/10.1145/1141277.1141468

M. Ghosh, R. Ghosh, and B. Verma, "A fully automated offline handwriting recognition system incorporating rule based neural network validated segmentation and hybrid neural network classifier," International Journal of Pattern Recognition and Artificial Intelligence, vol. 18, no. 7, pp. 1267–1283, Nov. 2004. DOI: https://doi.org/10.1142/S0218001404003654

N. Sivashanmugam, "A Study of Various Segmentation Techniques for Cursive Handwritten Words Recognition," International Journal of Modern Trends in Engineering and Research, 2015.

O. Alsharif and J. Pineau, "End-to-End Text Recognition with Hybrid HMM Maxout Models," arXiv:1310.1811 [cs], Oct. 2013, Accessed: Apr. 23, 2021. [Online]. Available: http://arxiv.org/abs/1310.1811.

M. Jaderberg, A. Vedaldi, and A. Zisserman, "Speeding up Convolutional Neural Networks with Low Rank Expansions," arXiv:1405.3866 [cs], May 2014, Accessed: Apr. 23, 2021. [Online]. Available: http://arxiv.org/abs/1405.3866. DOI: https://doi.org/10.5244/C.28.88

S. Hijazi, R. Kumar, and C. Rowen, "Using Convolutional Neural Networks for Image Recognition." Cadence Design Systems Inc, 2015.

S. Sahel, M. Alsahafi, M. Alghamdi, and T. Alsubait, "Logo Detection Using Deep Learning with Pretrained CNN Models," Engineering, Technology & Applied Science Research, vol. 11, no. 1, pp. 6724–6729, Feb. 2021. DOI: https://doi.org/10.48084/etasr.3919

U. Khan, K. Khan, F. Hassan, A. Siddiqui, and M. Afaq, "Towards Achieving Machine Comprehension Using Deep Learning on Non-GPU Machines," Engineering, Technology & Applied Science Research, vol. 9, no. 4, pp. 4423–4427, Aug. 2019. DOI: https://doi.org/10.48084/etasr.2734

Downloads

How to Cite

[1]
Khan, H.R., Hasan, M.A., Kazmi, M., Fayyaz, N., Khalid, H. and Qazi, S.A. 2021. A Holistic Approach to Urdu Language Word Recognition using Deep Neural Networks. Engineering, Technology & Applied Science Research. 11, 3 (Jun. 2021), 7140–7145. DOI:https://doi.org/10.48084/etasr.4143.

Metrics

Abstract Views: 1228
PDF Downloads: 714

Metrics Information

Most read articles by the same author(s)