Open Access
Subscription Access
TNT Tagger for Malayalam with Fuzzy Rule Based Learning
TnT is an efficient statistical Parts-of-speech (POS) Tagger based on Hidden Markov Model. TnT performs well on known word sequences. But, the performance degrades with increase in the number of unknown words. In this paper, we propose a method to overcome this performance degradation using fuzzy rules. Fuzzy rule based model is designed to provide TnT with sufficient information about the tag of unknown words without degrading the performance of TnT. On processing an unknown word from the input, the TnT tagger relies on the probability distribution of words having the same suffix within the training corpus. In Indian languages like Malayalam, the POS tag of an unknown word depends not only on suffix. Due to high inflectional and free order nature, the dependency is rather complex than the one captured by suffix tag distribution probabilities. When TnT with fuzzy rule based learning encounters an unknown word, the TnT generates a set of possible tags for the given word based on the fuzzy rules matched by the word. If the word does not match any fuzzy rule then the model depends upon the probability distribution of the suffix. This approach guarantees that the performance of TnT will only be improved from its normal performance.
User
Font Size
Information
Abstract Views: 239
PDF Views: 0