Author Details

It will be a nice learning experience for the Bangla language learners if an on-line Bangla education system is supported with a Digital Bangla Pronunciation Dictionary (DBPD), which is accessed in classroom and at home, as the case may be, as one of the most useful reference guides for learning standard and acceptable pronunciation of Bangla words. Keeping this idea at background, in this paper, we have made an attempt to report the design architecture of the proposed digital Bangla pronunciation dictionary, which is being developed with a large lexical database of nearly hundred thousand words that are directly obtained from a digital corpus of Bangla written texts as well as from other digital lexical sources available in the language. This is perhaps the first attempt ever made for any of the Indian languages with a mission for serving the Bangla speakers as well as Bangla language learners with better learning resources and devices for the language across the world. The immediate application of the resource is visualized as a tool for e-governance and on-line language teaching where the learners can access this device to address various linguistic purposes including spelling, pronunciation, part-of-speech, meaning, and usage of words.

Keywords

Pronunciation, Part-of-Speech, Transliteration, Orthography, IPA, Meaning.

Full Text

Culling Scientific and Technical Terms from Text Corpora for Compiling a TermBank in Bangla

Named Entity Recognition for Odia Text Using Machine Learning Algorithm

Abstract Views :74 | PDF Views:0

Authors

Bishwa Ranjan Das ¹, Hima Bindu Maringanti ¹, Niladri Sekhar Dash ²

Affiliations
1 Department of Computer Application. Maharaja Sriram Chandra Bhanjadeo University, Baripada, India., IN
2 Linguistic Research Unit, Indian Statistical Institute, Kolkata, India., IN

Source

Research Cell: An International Journal of Engineering Sciences, Vol 35 (2023), Pagination: 01-08

Abstract

This paper presents a novel approach to recognize named entities for Odia newspaper text. The development of a NER system for Odia newspaper text using Support Vector Machine is a challenging task in the field of intelligent computing. Named Entity Recognition aims at classifying each word in a piece of document into predefined target named entity classes in a linear as well as non-linear fashion. Starting with named entity annotated corpora and a set of features it requires to develop a base-line NER System. Some language specific rules are added to the system to recognize some specific NE classes. Moreover, some gazetteers and context patterns are added to the system to increase its performance level as it is observed that identification of rules and context patterns requires language-based knowledge to make the system work better. A lexical database is used to prepare the rules as well as to identify the context patterns for Odia text. A very large corpus including one lakhs sentences both training and test set is taken for experimental test and results show that our approach achieves much higher accuracy than previous approaches.

Keywords

Support Vector Machine, Name Entity Recognition, Part of Speech Tagging, Root Word.

Full Text

References

Taku kudo, Yuji Matsumoto, Chunking with Support Vector Machine, Proceedings of NAACL-2001, pp 192-199.

Sitanath Biswas, S.P. Mishra, S. Acharya, and S. Mohanty, A Hybrid Oriya Named Entity Recognition system: Harnessing the Power of Rule, International Journal of Artificial Intelligence and Expert Systems,2010, Volume 1, Issue 1, pp 639- 643.

Asif Ekbal, and Sivaji Bandyopadhyay, Bengali Named Entity Recognition using Support Vector Machine, Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp 51–58.

S.K. Saha, S. Sarkar, and P. Mitra, A Hybrid Feature Set based Maximum Entropy Hindi Named Entity Recognition, Proceedings of the 3rd International Joint Conference on NLP, Hyderabad, India, January 2008, pp. 343–349.

A. Goyal, Named Entity Recognition for South Asian Languages, Proceedings of the IJCNLP-08 Workshop on NER for South and South-East Asian Languages, Hyderabad, India, Jan 2008, pp. 89–96.

B. Sasidhar, P.M. Yohan, A. Vinaya Babu, and A. Govardhan, A Survey on Named Entity Recognition in Indian Languages with particular reference to Telugu, International Journal of Computer Science Issues, Vol. 8, Issue 2, March 2011, ISSN: 1694-0814 www.IJCSI.org

Hai, Leong Chieu, Hwee Tou Ng, Named Entity Recognition: A Maximum Entropy Approach Using Global Information, 19th International Conference on Computational Linguistics, COLING 2002, August 24 - September 1, 2002.

Padmaja Sharma, Utpal Sharma, and Jugal Kalita, Named Entity Recognition: A Survey for the Indian Languages, Language in India, www.languageinindia.com Volume 11, No. 5, May 2011 Special Volume: Problems of Parsing in Indian Languages.

Asif Ekbal and Sivaji Bandyopadhyay, Named Entity Recognition using Support Vector Machine: A Language Independent Approach, International Journal of Electrical and Electronics Engineering, Volume 4, No. 2, 2010, pp. 155-170.

Saha, S.K., P.S. Ghosh, S. Sarkar, and P. Mitra, Named Entity Recognition in Hindi using Maximum Entropy and Transliteration, Research journal on Computer Science and Computer Engineering with Applications, pp. 33–41, 2008.

Akshar Bharati, Rajeev Sangal and Veenit Chaitnya, Natural Language Processing – A Paninian Perspective, 1995, New Delhi, Prentice Hall-India.

Pradipta Ranjan Ray, V. Harish, Sudeshna Sarkar, and Anupam Basu, Part of Speech Tagging and Local Word Grouping Techniques for Natural Language Parsing in Hindi, Proceedings of the International Conference on Natural Language Processing. ICON-2003, pp.118- 125.

Kumar, Satish. Neural Network Book: A Classroom Approach, 10th edition, 2010, TMH publication, New Delhi.

Mahapatra, Dhaneswar, Adhunika Odia Byakarana (Modern Odia Grammar), 5th Edition, 2010, Cuttack, Kitab Mahal.

Das, Bishwa Ranjan, Srikanta Patnaik, Niladri Sekhar Dash, Development of Odia Language Corpus from Modern News Paper Texts: Some Problems and Issues, Proceedings of the International Conference On Intelligent Computing, Communication & Devices (ICCD 2014), 18-19 Apr 2014, SOA University, Bhubaneswar, India, Springer Book Series on AISC, Pp. 88- 94.

Dash, Niladri Sekhar, Indian scenario in language corpus generation, in, Dash, Niladri Sekhar Dash, Probal Dasgupta, and Pabitra Sarkar (eds.) Rainbow of Linguistics: Vol. I., 2007, pp. 129-162, Kolkata: T. Media Publication.

Username
Password
Remember me