A New Approach to Parts of Speech Tagging in Malayalam

D. Muhammad Noorul Mubarak; Sareesh Madhu; S. A. Shanavas

A New Approach to Parts of Speech Tagging in Malayalam

D. Muhammad Noorul Mubarak ¹, Sareesh Madhu ², S. A. Shanavas ²

Affiliations
1 Department of Computer Science, University of Kerala, India
2 Department of Linguistics, University of Kerala, India

Abstract
References
Article Metrics
Refbacks

Parts-of-speech tagging is the process of labeling each word in a sentence. A tag mentions the word's usage in the sentence. Usually, these tags indicate syntactic classification like noun or verb, and sometimes include additional information, with case markers (number, gender etc) and tense markers. A large number of current language processing systems use a parts-of-speech tagger for pre-processing.

There are mainly two approaches usually followed in Parts of Speech Tagging. Those are Rule based Approach and Stochastic Approach. Rule based Approach use predefined handwritten rules. This is the oldest approach and it use lexicon or dictionary for reference. Stochastic Approach use probabilistic and statistical information to assign tag to words. It use large corpus, so that Time complexity and Space complexity is high whereas Rule base approach has less complexity for both Time and Space. Stochastic Approach is the widely used one nowadays because of its accuracy.

Malayalam is a Dravidian family of languages, inflectional with suffixes with the ischolar_main word forms. The currently used Algorithms are efficient Machine Learning Algorithms but these are not built for Malayalam. So it affects the accuracy of the result of Malayalam POS Tagging.

My proposed Approach use Dictionary entries along with adjacent tag information. This algorithm use Multithreaded Technology. Here tagging done with the probability of the occurrence of the sentence structure along with the dictionary entry.

Keywords

NLP, POS Tagger, Rule Based Approach, Stochastic Approach, Multithreading, Dictionary Entry, Malayalam.

I-Scholar

Journal Help

User

Notifications

Journal Content
Browse

Font Size

Information

Abstract Views: 298

PDF Views: 166

A New Approach to Parts of Speech Tagging in Malayalam

Abstract Views: 298 | PDF Views: 166

Authors

D. Muhammad Noorul Mubarak
Department of Computer Science, University of Kerala, India

Sareesh Madhu
Department of Linguistics, University of Kerala, India

S. A. Shanavas
Department of Linguistics, University of Kerala, India

Abstract

Keywords

NLP, POS Tagger, Rule Based Approach, Stochastic Approach, Multithreading, Dictionary Entry, Malayalam.

Username
Password
Remember me

Username
Password
Remember me

AIRCC's International Journal of Computer Science and Information Technology

AIRCC's International Journal of Computer Science and Information Technology

A New Approach to Parts of Speech Tagging in Malayalam

Keywords

A New Approach to Parts of Speech Tagging in Malayalam

Authors

Abstract

Keywords