Open Access Open Access  Restricted Access Subscription Access

Identification of Telugu, Devanagari and English Scripts Using Discriminating Features


Affiliations
1 Department of Computer Science Engineering, PES College of Engineering, Mandya, India
2 Department of Electronics and Communication Engineering, Malnad College of Engineering, Hassan, India
 

In a multi-script multi-lingual environment, a document may contain text lines in more than one script/language forms. It is necessary to identify different script regions of the document in order to feed the document to the OCRs of individual language. With this context, this paper proposes to develop a model to identify and separate text lines of Telugu, Devanagari and English scripts from a printed trilingual document. The proposed method uses the distinct features extracted from the top and bottom profiles of the printed text lines. Experimentation conducted involved 1500 text lines for learning and 900 text lines for testing. The performance has turned out to be 99.67%.

Keywords

Multi-Script Multi-Lingual Document, Script Identification, Feature Extraction.
User
Notifications
Font Size

Abstract Views: 360

PDF Views: 178




  • Identification of Telugu, Devanagari and English Scripts Using Discriminating Features

Abstract Views: 360  |  PDF Views: 178

Authors

M. C. Padma
Department of Computer Science Engineering, PES College of Engineering, Mandya, India
P. A. Vijaya
Department of Electronics and Communication Engineering, Malnad College of Engineering, Hassan, India

Abstract


In a multi-script multi-lingual environment, a document may contain text lines in more than one script/language forms. It is necessary to identify different script regions of the document in order to feed the document to the OCRs of individual language. With this context, this paper proposes to develop a model to identify and separate text lines of Telugu, Devanagari and English scripts from a printed trilingual document. The proposed method uses the distinct features extracted from the top and bottom profiles of the printed text lines. Experimentation conducted involved 1500 text lines for learning and 900 text lines for testing. The performance has turned out to be 99.67%.

Keywords


Multi-Script Multi-Lingual Document, Script Identification, Feature Extraction.