PKIT: Printed Kashmiri Image Text Recognition Using Deep Learning

Maajid Bashir; Vishal Goyal; Kaiser J.Giri

PKIT: Printed Kashmiri Image Text Recognition Using Deep Learning

Maajid Bashir ¹, Vishal Goyal ², Kaiser J.Giri ³

Affiliations
1 Department of Computer Science, Punjabi University, Patiala,147002, Punjab, India., India
2 Department of Computer Science, Punjabi University, Patiala, 147002, Punjab, India., India
3 Department of Computer Science, Islamic University of Science and Technology, Awantipora, 192122,Jammu and Kashmir, India

Optical Character Recognition, often known as OCR, is a method that turns scanned documents, images of text, and PDFs into text documents, that can be edited and searched on a computer. OCR software analyzes a scanned image of text and turns it into machine-encoded text by identifying the characters in the image and transforming them into a digital format. Acknowledging the relevance of optical character recognition (OCR) in the actual world, a multitude of approaches have evolved both for Western and Asian languages. Kashmiri is mostly spoken in the Kashmir Valley, which is located in Jammu and Kashmir India. In spite of the significant amount of effort that has been done into recognizing Indian scripts such as Devanagari, Bengali, Urdu, and Punjabi, no such effort has been made to recognize Kashmiri script. In addition, several benchmark corpora for other Perso Arabic scripts, such as Urdu, Arabic, and Pashto, have been developed for the purpose of training and assessing various OCR systems. Notably, there is currently no OCR corpus for Kashmiri script that can be utilized to train and evaluate deep neural networks for the development of Kashmir OCR. To that purpose, we have proposed a Kashmiri corpus Printed Kashmiri Image Text (PKIT)consisting of 120000 line, and 523000-word level printed text images respectively, well suited for use in deep learning techniques. Additionally, we used the proposed dataset for training different state of art deep learning approaches thereby obtaining the Word Error Rate (WER) and Character Error Rate ((CER)of 5.62% on average.

Keywords

Dataset Generation, Deep Learning, Kashmiri OCR, Optical Character Recognition, Printed Kashmiri Text Recognition.

I-Scholar

Journal Help

User

Notifications

Journal Content
Browse

Font Size

Information

PKIT: Printed Kashmiri Image Text Recognition Using Deep Learning

Abstract Views: 381 | PDF Views: 0

Authors

Maajid Bashir
Department of Computer Science, Punjabi University, Patiala,147002, Punjab, India., India

Vishal Goyal
Department of Computer Science, Punjabi University, Patiala, 147002, Punjab, India., India

Kaiser J.Giri
Department of Computer Science, Islamic University of Science and Technology, Awantipora, 192122,Jammu and Kashmir, India

Abstract

Keywords

Dataset Generation, Deep Learning, Kashmiri OCR, Optical Character Recognition, Printed Kashmiri Text Recognition.

Research Cell: An International Journal of Engineering Sciences

PKIT: Printed Kashmiri Image Text Recognition Using Deep Learning

Keywords

PKIT: Printed Kashmiri Image Text Recognition Using Deep Learning

Authors

Abstract

Keywords

References

Username
Password
Remember me

Username
Password
Remember me