Words are tools of life which is omnipresent in every language. All words in a language are unique having their own function and meaning. The syntactic and semantic knowledge about individual words can be encapsulated in a highly structured repository known as computational lexicon which is very essential for Machine Translation. For designing a computational lexicon, the first and foremost task is to identify the head words or ischolar_main words in the language. The Root Word Identifier proposed in this work is a rule based approach which automatically removes the inflected part and derive the ischolar_main words using morphophonemic rules. The system is tested with 2400 words from a Malayalam corpus to generate the linguistic information such as the ischolar_main form, their inflected forms and grammatical category. The performance is evaluated using the statistical measures like Precision, Recall and F-measure. The values obtained for these measures are more than 90%.
Keywords
Corpus, Computational Lexicon, Morphophonemic Rules, Root Word, Root Word Identifier.
User
Font Size
Information