A New Approach to Parts of Speech Tagging in Malayalam
Parts-of-speech tagging is the process of labeling each word in a sentence. A tag mentions the word's usage in the sentence. Usually, these tags indicate syntactic classification like noun or verb, and sometimes include additional information, with case markers (number, gender etc) and tense markers. A large number of current language processing systems use a parts-of-speech tagger for pre-processing.
There are mainly two approaches usually followed in Parts of Speech Tagging. Those are Rule based Approach and Stochastic Approach. Rule based Approach use predefined handwritten rules. This is the oldest approach and it use lexicon or dictionary for reference. Stochastic Approach use probabilistic and statistical information to assign tag to words. It use large corpus, so that Time complexity and Space complexity is high whereas Rule base approach has less complexity for both Time and Space. Stochastic Approach is the widely used one nowadays because of its accuracy.
Malayalam is a Dravidian family of languages, inflectional with suffixes with the ischolar_main word forms. The currently used Algorithms are efficient Machine Learning Algorithms but these are not built for Malayalam. So it affects the accuracy of the result of Malayalam POS Tagging.
My proposed Approach use Dictionary entries along with adjacent tag information. This algorithm use Multithreaded Technology. Here tagging done with the probability of the occurrence of the sentence structure along with the dictionary entry.
Keywords
Abstract Views: 298
PDF Views: 166