Open Access Open Access  Restricted Access Subscription Access

A Rule Based Approach for Root Word Identification in Malayalam Language


Affiliations
1 Department of Computer Science, University of Kerala, Kerala, India
2 Department of Linguistics, University of Kerala, Kerala, India
 

Words are tools of life which is omnipresent in every language. All words in a language are unique having their own function and meaning. The syntactic and semantic knowledge about individual words can be encapsulated in a highly structured repository known as computational lexicon which is very essential for Machine Translation. For designing a computational lexicon, the first and foremost task is to identify the head words or ischolar_main words in the language. The Root Word Identifier proposed in this work is a rule based approach which automatically removes the inflected part and derive the ischolar_main words using morphophonemic rules. The system is tested with 2400 words from a Malayalam corpus to generate the linguistic information such as the ischolar_main form, their inflected forms and grammatical category. The performance is evaluated using the statistical measures like Precision, Recall and F-measure. The values obtained for these measures are more than 90%.

Keywords

Corpus, Computational Lexicon, Morphophonemic Rules, Root Word, Root Word Identifier.
User
Notifications
Font Size

Abstract Views: 388

PDF Views: 227




  • A Rule Based Approach for Root Word Identification in Malayalam Language

Abstract Views: 388  |  PDF Views: 227

Authors

Meera Subhash
Department of Computer Science, University of Kerala, Kerala, India
M. Wilscy
Department of Computer Science, University of Kerala, Kerala, India
S. A. Shanavas
Department of Linguistics, University of Kerala, Kerala, India

Abstract


Words are tools of life which is omnipresent in every language. All words in a language are unique having their own function and meaning. The syntactic and semantic knowledge about individual words can be encapsulated in a highly structured repository known as computational lexicon which is very essential for Machine Translation. For designing a computational lexicon, the first and foremost task is to identify the head words or ischolar_main words in the language. The Root Word Identifier proposed in this work is a rule based approach which automatically removes the inflected part and derive the ischolar_main words using morphophonemic rules. The system is tested with 2400 words from a Malayalam corpus to generate the linguistic information such as the ischolar_main form, their inflected forms and grammatical category. The performance is evaluated using the statistical measures like Precision, Recall and F-measure. The values obtained for these measures are more than 90%.

Keywords


Corpus, Computational Lexicon, Morphophonemic Rules, Root Word, Root Word Identifier.