Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

NoolOCR for Printed Tamil Text


Affiliations
1 MCA Department of Panimalar Engineering College, Chennai, India
2 Panimalar Engineering College, Chennai, India
     

   Subscribe/Renew Journal


Optical Character Recognition (OCR) is a process of converting printed materials into text or word processing files that can be easily edited and stored. The technology has enabled such materials to be stored using much less storage space than the hard materials. OCR technology has made a huge impact on the way information is stored, shared and edited. Prior to optical character recognition, if someone wanted to turn a book into a word processing file, each page would have to be typed word for word. Now a days there are lot of OCR available in the market for different languages but there is no centralized framework for all languages. The intension of the paper is to create a framework capable to handle all available languages. This can be achieved through Eclipse plug-in architecture. So there will be a separate plug-in for different languages.

Keywords

Binarization, Bounding Box, GOCR, OCR, Tesseract.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 203

PDF Views: 1




  • NoolOCR for Printed Tamil Text

Abstract Views: 203  |  PDF Views: 1

Authors

L. Jaba Sheela
MCA Department of Panimalar Engineering College, Chennai, India
Syed Mohammed Yasmine
Panimalar Engineering College, Chennai, India

Abstract


Optical Character Recognition (OCR) is a process of converting printed materials into text or word processing files that can be easily edited and stored. The technology has enabled such materials to be stored using much less storage space than the hard materials. OCR technology has made a huge impact on the way information is stored, shared and edited. Prior to optical character recognition, if someone wanted to turn a book into a word processing file, each page would have to be typed word for word. Now a days there are lot of OCR available in the market for different languages but there is no centralized framework for all languages. The intension of the paper is to create a framework capable to handle all available languages. This can be achieved through Eclipse plug-in architecture. So there will be a separate plug-in for different languages.

Keywords


Binarization, Bounding Box, GOCR, OCR, Tesseract.