Open Access Open Access  Restricted Access Subscription Access

Named Entity Recognition for Odia Text Using Machine Learning Algorithm


Affiliations
1 Department of Computer Application. Maharaja Sriram Chandra Bhanjadeo University, Baripada, India., India
2 Linguistic Research Unit, Indian Statistical Institute, Kolkata, India., India
 

This paper presents a novel approach to recognize named entities for Odia newspaper text. The development of a NER system for Odia newspaper text using Support Vector Machine is a challenging task in the field of intelligent computing. Named Entity Recognition aims at classifying each word in a piece of document into predefined target named entity classes in a linear as well as non-linear fashion. Starting with named entity annotated corpora and a set of features it requires to develop a base-line NER System. Some language specific rules are added to the system to recognize some specific NE classes. Moreover, some gazetteers and context patterns are added to the system to increase its performance level as it is observed that identification of rules and context patterns requires language-based knowledge to make the system work better. A lexical database is used to prepare the rules as well as to identify the context patterns for Odia text. A very large corpus including one lakhs sentences both training and test set is taken for experimental test and results show that our approach achieves much higher accuracy than previous approaches.

Keywords

Support Vector Machine, Name Entity Recognition, Part of Speech Tagging, Root Word.
User
Notifications
Font Size


  • Named Entity Recognition for Odia Text Using Machine Learning Algorithm

Abstract Views: 266  |  PDF Views: 0

Authors

Bishwa Ranjan Das
Department of Computer Application. Maharaja Sriram Chandra Bhanjadeo University, Baripada, India., India
Hima Bindu Maringanti
Department of Computer Application. Maharaja Sriram Chandra Bhanjadeo University, Baripada, India., India
Niladri Sekhar Dash
Linguistic Research Unit, Indian Statistical Institute, Kolkata, India., India

Abstract


This paper presents a novel approach to recognize named entities for Odia newspaper text. The development of a NER system for Odia newspaper text using Support Vector Machine is a challenging task in the field of intelligent computing. Named Entity Recognition aims at classifying each word in a piece of document into predefined target named entity classes in a linear as well as non-linear fashion. Starting with named entity annotated corpora and a set of features it requires to develop a base-line NER System. Some language specific rules are added to the system to recognize some specific NE classes. Moreover, some gazetteers and context patterns are added to the system to increase its performance level as it is observed that identification of rules and context patterns requires language-based knowledge to make the system work better. A lexical database is used to prepare the rules as well as to identify the context patterns for Odia text. A very large corpus including one lakhs sentences both training and test set is taken for experimental test and results show that our approach achieves much higher accuracy than previous approaches.

Keywords


Support Vector Machine, Name Entity Recognition, Part of Speech Tagging, Root Word.

References