Open Access Open Access  Restricted Access Subscription Access

Named Entity Recognition for Odia Text Using Machine Learning Algorithm


Affiliations
1 Department of Computer Application. Maharaja Sriram Chandra Bhanjadeo University, Baripada, India., India
2 Linguistic Research Unit, Indian Statistical Institute, Kolkata, India., India
 

This paper presents a novel approach to recognize named entities for Odia newspaper text. The development of a NER system for Odia newspaper text using Support Vector Machine is a challenging task in the field of intelligent computing. Named Entity Recognition aims at classifying each word in a piece of document into predefined target named entity classes in a linear as well as non-linear fashion. Starting with named entity annotated corpora and a set of features it requires to develop a base-line NER System. Some language specific rules are added to the system to recognize some specific NE classes. Moreover, some gazetteers and context patterns are added to the system to increase its performance level as it is observed that identification of rules and context patterns requires language-based knowledge to make the system work better. A lexical database is used to prepare the rules as well as to identify the context patterns for Odia text. A very large corpus including one lakhs sentences both training and test set is taken for experimental test and results show that our approach achieves much higher accuracy than previous approaches.

Keywords

Support Vector Machine, Name Entity Recognition, Part of Speech Tagging, Root Word.
User
Notifications
Font Size

  • Taku kudo, Yuji Matsumoto, Chunking with Support Vector Machine, Proceedings of NAACL-2001, pp 192-199.
  • Sitanath Biswas, S.P. Mishra, S. Acharya, and S. Mohanty, A Hybrid Oriya Named Entity Recognition system: Harnessing the Power of Rule, International Journal of Artificial Intelligence and Expert Systems,2010, Volume 1, Issue 1, pp 639- 643.
  • Asif Ekbal, and Sivaji Bandyopadhyay, Bengali Named Entity Recognition using Support Vector Machine, Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp 51–58.
  • S.K. Saha, S. Sarkar, and P. Mitra, A Hybrid Feature Set based Maximum Entropy Hindi Named Entity Recognition, Proceedings of the 3rd International Joint Conference on NLP, Hyderabad, India, January 2008, pp. 343–349.
  • A. Goyal, Named Entity Recognition for South Asian Languages, Proceedings of the IJCNLP-08 Workshop on NER for South and South-East Asian Languages, Hyderabad, India, Jan 2008, pp. 89–96.
  • B. Sasidhar, P.M. Yohan, A. Vinaya Babu, and A. Govardhan, A Survey on Named Entity Recognition in Indian Languages with particular reference to Telugu, International Journal of Computer Science Issues, Vol. 8, Issue 2, March 2011, ISSN: 1694-0814 www.IJCSI.org
  • Hai, Leong Chieu, Hwee Tou Ng, Named Entity Recognition: A Maximum Entropy Approach Using Global Information, 19th International Conference on Computational Linguistics, COLING 2002, August 24 - September 1, 2002.
  • Padmaja Sharma, Utpal Sharma, and Jugal Kalita, Named Entity Recognition: A Survey for the Indian Languages, Language in India, www.languageinindia.com Volume 11, No. 5, May 2011 Special Volume: Problems of Parsing in Indian Languages.
  • Asif Ekbal and Sivaji Bandyopadhyay, Named Entity Recognition using Support Vector Machine: A Language Independent Approach, International Journal of Electrical and Electronics Engineering, Volume 4, No. 2, 2010, pp. 155-170.
  • Saha, S.K., P.S. Ghosh, S. Sarkar, and P. Mitra, Named Entity Recognition in Hindi using Maximum Entropy and Transliteration, Research journal on Computer Science and Computer Engineering with Applications, pp. 33–41, 2008.
  • Akshar Bharati, Rajeev Sangal and Veenit Chaitnya, Natural Language Processing – A Paninian Perspective, 1995, New Delhi, Prentice Hall-India.
  • Pradipta Ranjan Ray, V. Harish, Sudeshna Sarkar, and Anupam Basu, Part of Speech Tagging and Local Word Grouping Techniques for Natural Language Parsing in Hindi, Proceedings of the International Conference on Natural Language Processing. ICON-2003, pp.118- 125.
  • Kumar, Satish. Neural Network Book: A Classroom Approach, 10th edition, 2010, TMH publication, New Delhi.
  • Mahapatra, Dhaneswar, Adhunika Odia Byakarana (Modern Odia Grammar), 5th Edition, 2010, Cuttack, Kitab Mahal.
  • Das, Bishwa Ranjan, Srikanta Patnaik, Niladri Sekhar Dash, Development of Odia Language Corpus from Modern News Paper Texts: Some Problems and Issues, Proceedings of the International Conference On Intelligent Computing, Communication & Devices (ICCD 2014), 18-19 Apr 2014, SOA University, Bhubaneswar, India, Springer Book Series on AISC, Pp. 88- 94.
  • Dash, Niladri Sekhar, Indian scenario in language corpus generation, in, Dash, Niladri Sekhar Dash, Probal Dasgupta, and Pabitra Sarkar (eds.) Rainbow of Linguistics: Vol. I., 2007, pp. 129-162, Kolkata: T. Media Publication.

Abstract Views: 131

PDF Views: 0




  • Named Entity Recognition for Odia Text Using Machine Learning Algorithm

Abstract Views: 131  |  PDF Views: 0

Authors

Bishwa Ranjan Das
Department of Computer Application. Maharaja Sriram Chandra Bhanjadeo University, Baripada, India., India
Hima Bindu Maringanti
Department of Computer Application. Maharaja Sriram Chandra Bhanjadeo University, Baripada, India., India
Niladri Sekhar Dash
Linguistic Research Unit, Indian Statistical Institute, Kolkata, India., India

Abstract


This paper presents a novel approach to recognize named entities for Odia newspaper text. The development of a NER system for Odia newspaper text using Support Vector Machine is a challenging task in the field of intelligent computing. Named Entity Recognition aims at classifying each word in a piece of document into predefined target named entity classes in a linear as well as non-linear fashion. Starting with named entity annotated corpora and a set of features it requires to develop a base-line NER System. Some language specific rules are added to the system to recognize some specific NE classes. Moreover, some gazetteers and context patterns are added to the system to increase its performance level as it is observed that identification of rules and context patterns requires language-based knowledge to make the system work better. A lexical database is used to prepare the rules as well as to identify the context patterns for Odia text. A very large corpus including one lakhs sentences both training and test set is taken for experimental test and results show that our approach achieves much higher accuracy than previous approaches.

Keywords


Support Vector Machine, Name Entity Recognition, Part of Speech Tagging, Root Word.

References