Analysis of Image Preprocessing Techniques to Improve OCR of Garhwali Text Obtained Using the Hindi Tesseract Model

Sukhbindra Singh Rawat; Ashutosh Sharma; Rachana Gusain

Analysis of Image Preprocessing Techniques to Improve OCR of Garhwali Text Obtained Using the Hindi Tesseract Model

Sukhbindra Singh Rawat , Ashutosh Sharma , Rachana Gusain

Affiliations
1 Department of Computer Science, Doon University, India

Subscribe/Renew Journal

A huge amount of information exists in the form of textbooks, paper documents, newspapers, and other physical forms, that is required to be digitized for its effective access and long-time availability. Optical Character Recognition (OCR) is an effective way to digitize the text. In this study, we have used Google’s Tesseract as the OCR tool. The focus of our study is to improve Tesseract’s accuracy on machine-printed Garhwali documents by using image pre-processing techniques including Super-Resolution (SR), different binarization methods (Otsu and adaptive thresholding), skew correction, morphological operations, and Image Magick methods. To improve the Tesseract results, we used the three proposed approaches – two approaches differed by the binarization method (Otsu and adaptive thresholding), and the third approach used ImageMagick methods for pre-processing. For evaluation purposes, we created a dataset by capturing images from a sample of five Garhwali textbooks using two mobile cameras with different resolutions; two books were captured by a high resolution camera and the other three were captured through a low resolution camera. Our experiments showed good results in specific cases, for high-resolution images, 88.13% accuracy was achieved for Otsu thresholding without applying the Super-Resolution and for low resolution images, 87.44% accuracy was achieved for Image Magick with Super-Resolution.

Keywords

Optical Character Recognition, Garhwali Language, Devanagari Script, Image Preprocessing, ImageMagick

I-Scholar

Journal Help

Subscription Login to verify subscription

User

Notifications

Journal Content
Browse

Font Size

Information

Analysis of Image Preprocessing Techniques to Improve OCR of Garhwali Text Obtained Using the Hindi Tesseract Model

Abstract Views: 264 | PDF Views: 0

Authors

Sukhbindra Singh Rawat
Department of Computer Science, Doon University, India

Ashutosh Sharma
Department of Computer Science, Doon University, India

Rachana Gusain
Department of Computer Science, Doon University, India

Abstract

Keywords

Optical Character Recognition, Garhwali Language, Devanagari Script, Image Preprocessing, ImageMagick

ICTACT Journal on Image and Video Processing

Analysis of Image Preprocessing Techniques to Improve OCR of Garhwali Text Obtained Using the Hindi Tesseract Model

Subscribe/Renew Journal

Keywords

Analysis of Image Preprocessing Techniques to Improve OCR of Garhwali Text Obtained Using the Hindi Tesseract Model

Authors

Abstract

Keywords

References

Username
Password
Remember me

Username
Password
Remember me