Open Access Open Access  Restricted Access Subscription Access

TNT Tagger for Malayalam with Fuzzy Rule Based Learning


Affiliations
1 Computational Linguistics, Government Engineering College, Sreekrishnapuram, Palakkad, Kerala, India
2 VRCLC, IIITM-K, Thiruvananthapuram, Kerala, India
3 Dept. of Computer Science and Engineering, Government Engineering College, Sreekrishnapuram, Palakkad, Kerala, India
 

TnT is an efficient statistical Parts-of-speech (POS) Tagger based on Hidden Markov Model. TnT performs well on known word sequences. But, the performance degrades with increase in the number of unknown words. In this paper, we propose a method to overcome this performance degradation using fuzzy rules. Fuzzy rule based model is designed to provide TnT with sufficient information about the tag of unknown words without degrading the performance of TnT. On processing an unknown word from the input, the TnT tagger relies on the probability distribution of words having the same suffix within the training corpus. In Indian languages like Malayalam, the POS tag of an unknown word depends not only on suffix. Due to high inflectional and free order nature, the dependency is rather complex than the one captured by suffix tag distribution probabilities. When TnT with fuzzy rule based learning encounters an unknown word, the TnT generates a set of possible tags for the given word based on the fuzzy rules matched by the word. If the word does not match any fuzzy rule then the model depends upon the probability distribution of the suffix. This approach guarantees that the performance of TnT will only be improved from its normal performance.
User
Notifications
Font Size

Abstract Views: 240

PDF Views: 0




  • TNT Tagger for Malayalam with Fuzzy Rule Based Learning

Abstract Views: 240  |  PDF Views: 0

Authors

Alen Jacob
Computational Linguistics, Government Engineering College, Sreekrishnapuram, Palakkad, Kerala, India
Amal Babu
Computational Linguistics, Government Engineering College, Sreekrishnapuram, Palakkad, Kerala, India
R. R. Rajeev
VRCLC, IIITM-K, Thiruvananthapuram, Kerala, India
P. C. Reghu Raj
Dept. of Computer Science and Engineering, Government Engineering College, Sreekrishnapuram, Palakkad, Kerala, India

Abstract


TnT is an efficient statistical Parts-of-speech (POS) Tagger based on Hidden Markov Model. TnT performs well on known word sequences. But, the performance degrades with increase in the number of unknown words. In this paper, we propose a method to overcome this performance degradation using fuzzy rules. Fuzzy rule based model is designed to provide TnT with sufficient information about the tag of unknown words without degrading the performance of TnT. On processing an unknown word from the input, the TnT tagger relies on the probability distribution of words having the same suffix within the training corpus. In Indian languages like Malayalam, the POS tag of an unknown word depends not only on suffix. Due to high inflectional and free order nature, the dependency is rather complex than the one captured by suffix tag distribution probabilities. When TnT with fuzzy rule based learning encounters an unknown word, the TnT generates a set of possible tags for the given word based on the fuzzy rules matched by the word. If the word does not match any fuzzy rule then the model depends upon the probability distribution of the suffix. This approach guarantees that the performance of TnT will only be improved from its normal performance.