Open Access Open Access  Restricted Access Subscription Access

Part of Speech Tagger for Low Resource Indian Language Using Machine Learning Approach


Affiliations
1 Research Scholar, Department of Computer Science and Applications, DAV University, Jalandhar, India
2 Associate Professor, Department of Computer Science and Applications, DAV University, Jalandhar, India
 

In Language Processing, Part of Speech tagger is one of the fundamental components that are used as a preprocessor for a number of natural language processing tools. For every language before developing the advance tools, POS tagger is developed at the early stage. Various approaches are used for the development of POS tagger. In this research article, a comparative analysis of various Punjabi POS taggers developed by various researchers has been provided and an architecture using an efficient Machine Learning technique is proposed to enhance the accuracy of POS tagger. As all the researchers have used their own test data and not all the developed POS taggers are available online, therefore it is not feasible to test all the POS taggers on common test data set. The claimed results show that POS tagger developed using hybrid approach performs better as compare to rule based technique and other statistic techniques like N-gram, bigram and HMM.

Keywords

Ambiguity, Part of Speech, POS, Punjabi, Rule Based Approach, Statistical Approach, Machine Learning, NLP.
User
Notifications
Font Size

  • . Gill, M. S., Lehal, G. S., & Joshi, S. S. (2009). Part of speech tagging for grammar checking of Punjabi. TheLinguistic Journal, 4(1), 6-21.
  • . Sharma, S. K., & Lehal, G. S. (2011, June). Using Hidden Markov Model to improve the accuracy of Punjabi POS tagger. In Computer Science and Automation Engineering (CSAE), 2011 IEEE International Conference on (Vol. 2, pp. 697-701). IEEE.
  • . Mittal, S., Sethi, N. S., & Sharma, S. K. (2014). Part of Speech Tagging of Punjabi Language using N Gram Model. International Journal of Computer Applications, 100(19).
  • . Kaur, M., Aggerwal, M., & Sharma, S. K. (2014). Improving Punjabi Part of Speech Tagger by Using Reduced Tag Set. International Journal of Computer Applications & Information Technology, 7(2), 142.
  • . Kashyap, D. K., & Josan, G. S. (2013, October). A trigram language model to predict part of speech tags using neural network. In International Conference on Intelligent Data Engineering and Automated Learning (pp. 513-520). Springer, Berlin, Heidelberg.
  • . Singh, K. (2015). Part-of-Speech Tagging using Genetic Algorithms. International Journal of Simulation-- Systems, Science & Technology, 16(6).
  • . Sood, S., Arora, V., & Sharma, S. K. (2014). Word Class Prediction of Ambiguous and Unknown Words ofPunjabi Language Using Bi-gram Methods. International Journal of Computer Applications & InformationTechnology, 7(2), 152.
  • . Kanwar S.,Ravishankar, Sharma, S.K. (2011) POS tagging of Punjabi language Using Hidden Markov Model. Research Cell: International Journal of Engineering Sciences. pp 98-106.
  • . Kumar, D., & Josan, G. (2016). Prediction of Part of Speech Tags for Punjabi using Support Vector Machines. International Arab Journal of Information Technology (IAJIT), 13(6).
  • . Kumar D. and Josan G., “Developing a tagset for machine learning based POS tagging in Punjabi,” international Journal of Applied Research on Information Technology and Computing, vol. 3, no. 2, pp. 132-143, 2012.
  • . http://tdildc.in/tdildcMain/articles/134692Draft%20POS%20Tag%20standard.pdf (Accessed on Oct 5, 2021).
  • . Vijayalaxmi .F. Patil (2010), “Designing POS Tagset for Kannada, Linguistic Data Consortium for Indian Languages (LDC-IL), Organized by Central Institute of Indian Languages, Department of Higher EducationMinistry of Human Resource Development, Government of India, March 2010.
  • . E. Alba, G. Luque, L. Araujo, Natural language tagging with genetic algorithms, Information Processing Letters 100 (5) (2006) pp. 173 – 182.
  • . Sreeganesh, T. (2006). Telugu parts of speech tagging in WSD. Language of India, 6.
  • . Milidiú, R. L., Santos, C. N., & Duarte, J. C. (2008). Phrase chunking using entropy guided transformation learning. Proceedings of ACL-08: HLT, 647-655.
  • . Wilson, G., & Heywood, M. (2005, June). Use of a genetic algorithm in brill's transformation-based part-of-speech tagger. In Proceedings of the 7th annual conference on Genetic and evolutionary computation (pp. 2067-2073). ACM.
  • . E. Brill, “Some advances in rule based part of speech tagging”, In Proceedings of The Twelfth 5ational Conference on Artificial Intelligence (AAAI94), Seattle, Washington, 1994.
  • . Singh, J., Joshi, N., & Mathur, I. (2013, August). Development of Marathi part of speech tagger using statistical approach. In Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on (pp. 1554-1559). IEEE.
  • . Mishra, N., & Mishra, A. (2011, June). Part of speech tagging for Hindi corpus. In Communication Systems and Network Technologies (CSNT), 2011 International Conference on (pp. 554-558). IEEE.
  • . Ali, H. (2010). An unsupervised parts-of-speech tagger for the bangla language. Department of Computer Science, University of British Columbia, 20, 1-8.
  • . Antony, P. J., Mohan, S. P., & Soman, K. P. (2010, March). SVM based part of speech tagger for Malayalam. In Recent Trends in Information, Telecommunication and Computing (ITC), 2010 International Conferenceon (pp. 339-341). IEEE.
  • . Dalal, A., Nagaraj, K., Sawant, U., & Shelke, S. (2006). Hindi part-of-speech tagging and chunking: A maximum entropy approach. Proceeding of the NLPAI Machine Learning Competition.
  • . Agarwal, Himashu., and Mani,A. (2006), Part of Speech Tagging and Chunking with Conditional Random Fields. In the proceedings of NLPAI Contest, 2006.
  • . Ekbal, A., & Bandyopadhyay, S. (2008). Web-based Bengali news corpus for lexicon development and POStagging. Polibits, (37), 21-30.
  • . V Dhanalakshmi, M Anandkumar, MS Vijaya, R Loganathan, KP Soman, and S Rajendran. 2008. Tamil part-of-speech tagger based on svmtool. In Proceedings of the COLIPS International Conference on natural language processing (IALP), Chiang Mai, Thailand
  • . Binulal, G. S., Goud, P. A., & Soman, K. P. (2009). A SVM based approach to Telugu parts of speech tagging using SVMTool. International Journal of Recent Trends in Engineering, 1(2), 183.
  • . Antony, P. J., Mohan, S. P., & Soman, K. P. (2010, March). SVM based part of speech tagger for Malayalam. In Recent Trends in Information, Telecommunication and Computing (ITC), 2010 International Conferenceon (pp. 339-341). IEEE.
  • . Shrivastava, M., & Bhattacharyya, P. (2008, December). Hindi pos tagger using naive stemming: Harnessingmorphological information without extensive linguistic knowledge. In International Conference on NLP (ICON08), Pune, India.
  • . Manju, K., Soumya, S., & Idicula, S. M. (2009, October). Development of a POS tagger for Malayalam-an experience. In Advances in Recent Technologies in Communication and Computing, 2009. ARTCom'09. International Conference on (pp. 709-713). IEEE.
  • . Saharia, N., Das, D., Sharma, U., & Kalita, J. (2009, August). Part of speech tagger for Assamese text. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers (pp. 33-36). Association for Computational Linguistics.
  • . Sharma, S. K., & Lehal, G. S. (2011, June). Using hidden markov model to improve the accuracy of punjabi pos tagger. In Computer Science and Automation Engineering (CSAE), 2011 IEEE International Conferenceon (Vol. 2, pp. 697-701). IEEE.
  • . Ekbal, A., Mondal, S., & Bandyopadhyay, S. (2007). POS Tagging using HMM and Rule-based Chunking.The Proceedings of SPSAL, 8(1), 25-28.
  • . Dalal, A., Nagaraj, K., Sawant, U., & Shelke, S. (2006). Hindi part-of-speech tagging and chunking: A maximum entropy approach. Proceeding of the NLPAI Machine Learning Competition.
  • . Ekbal, A., & Bandyopadhyay, S. (2008). Web-based Bengali news corpus for lexicon development and POStagging. Polibits, (37), 21-30.
  • . Agrawal, H. (2007). POS tagging and chunking for Indian languages. Shallow Parsing for South Asian Languages, 37.
  • . Parikh, A. (2009). Part-of-speech tagging using neural network. Proceedings of ICON.
  • . Arulmozhi, P., & Sobha, L. (2006). A Hybrid POS Tagger for a Relatively Free Word Order Language. In Proceedings of the First National Symposium on Modeling and Shallow Parsing of Indian Languages (pp. 79-85).
  • . Patel, C., & Gali, K. (2008). Part-of-speech tagging for Gujarati using conditional random fields. InProceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages.
  • . Singh, Umrinder and Goyal, Vishal (2017). Punjabi POS tagger: Rule Based and HMM. International journalof computer science and software Engineering.
  • . http://punjabipos.learnpunjabi.org/ (Accessed on Oct 5, 2021).
  • . http://punjabi.aglsoft.com/punjabi/?show=tagger (Accessed on Oct 5, 2021).
  • . http://pgc.learnpunjabi.org/#Tagger (Accessed on Oct 5, 2021).
  • . http://sanskrit.jnu.ac.in/ilci/index.jsp. (Accessed on Oct 5, 2021).
  • . Todi, K. K., Mishra, P., & Sharma, D. M. (2018). Building a kannada pos tagger using machine learning andneural network models. arXiv preprint arXiv:1808.03175.
  • . Sayami, S., Shahi, T. B., & Shakya, S. (2019). Nepali POS Tagging Using Deep Learning Approaches (No. 2073). EasyChair.
  • . Kumar, S., Kumar, M. A., & Soman, K. P. (2019). Deep learning based part-of-speech tagging for Malayalam Twitter data (Special issue: deep learning techniques for natural language processing). Journal of IntelligentSystems, 28(3), 423-435.
  • . Prabha, G., Jyothsna, P. V., Shahina, K. K., Premjith, B., & Soman, K. P. (2018, September). A deep learning approach for part-of-speech tagging in nepali language. In 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1132-1136). IEEE.
  • . Deshmukh, R. D., & Kiwelekar, A. (2020, March). Deep learning techniques for part of speech tagging by natural language processing. In 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA) (pp. 76-81). IEEE.

Abstract Views: 103

PDF Views: 0




  • Part of Speech Tagger for Low Resource Indian Language Using Machine Learning Approach

Abstract Views: 103  |  PDF Views: 0

Authors

Vikas Verma
Research Scholar, Department of Computer Science and Applications, DAV University, Jalandhar, India
S.K. Sharma
Associate Professor, Department of Computer Science and Applications, DAV University, Jalandhar, India

Abstract


In Language Processing, Part of Speech tagger is one of the fundamental components that are used as a preprocessor for a number of natural language processing tools. For every language before developing the advance tools, POS tagger is developed at the early stage. Various approaches are used for the development of POS tagger. In this research article, a comparative analysis of various Punjabi POS taggers developed by various researchers has been provided and an architecture using an efficient Machine Learning technique is proposed to enhance the accuracy of POS tagger. As all the researchers have used their own test data and not all the developed POS taggers are available online, therefore it is not feasible to test all the POS taggers on common test data set. The claimed results show that POS tagger developed using hybrid approach performs better as compare to rule based technique and other statistic techniques like N-gram, bigram and HMM.

Keywords


Ambiguity, Part of Speech, POS, Punjabi, Rule Based Approach, Statistical Approach, Machine Learning, NLP.

References