Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Part of Speech Tagger for Arabic Text based Support Vector Machines:A Review


Affiliations
1 Department of Computing and Information Technology, Sohar University, Oman
     

   Subscribe/Renew Journal


There is not much research that discusses the Part of speech (POS) tagger for the Arabic language. Hence, the Arabic language is challenging to identify the types of part of the speech of a particular word in a given context because most modern texts do not use diacritical marks. Hence, one word could spell in several different ways. Also, the distinction between the differences in the Arab derivatives is a complicated issue, so the clarification of the correct types on the POS requires the use of different resources and advanced processing. Therefore, the study of part of the speech can contribute to literature and progress in the signs of the Arabic language. The POS is employed in different fields of natural languages processing such as text translation, and extraction, text classification and identifies the type of speech. Identifying unique POS tags for the Arabic language is a difficult task. This paper aims to review the implementation of support vector machines (SVM) for utilizing the POS for the Arabic Language. Therefore, the primary objectives of this paper are to summarize and organize the works for tagging the Arabic text based on SVM automatically and efficiently for motivating and guiding researchers to do more research on the online applications for the Arabic language.

Keywords

Part of Speech, Arabic Text Tagging, SVM, NLP, Machine Learning, Corpus.
Subscription Login to verify subscription
User
Notifications
Font Size

  • J.H. Yousif, “Information Technology Development”, Academic Publishing, 2011.
  • E. Byvatov, U. Fechner, J. Sadowski and G. Schneider, “Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification”, Journal of Chemical Information and Computer Sciences, Vol. 43, No. 6, pp. 1882-1889, 2003.
  • I. Zeroual, A. Lakhouaja and R. Belahbib, “Towards A Standard Part of Speech Tagset for the Arabic Language”, Journal of King Saud University-Computer and Information Sciences, Vol. 29, No. 2, pp. 171-178, 2017.
  • H.C. Carneiro, F.M. Franca and P.M. Lima, “Multilingual Part-of-Speech Tagging with Weightless Neural Networks”, Neural Networks, Vol. 66, pp. 11-21, 2015.
  • J.R. Bellegarda, “Combined Statistical and Rule-Based Part-of-Speech Tagging for Text-to-Speech Synthesis”, Available at: https://patents.google.com/patent/US20140324435.
  • S. Khoja, “APT: Arabic Part-of-Speech Tagger”, Proceedings of Student Workshop at North American Chapter of the Association for Computational Linguistics, pp. 20-25, 2001.
  • T. Brants, “TnT: A Statistical Part-of-Speech Tagger”, Proceedings of 6th International Conference on Applied Natural Language Processing, pp. 224-231, 2000.
  • J. Kupiec, “Robust Part-of-Speech Tagging using a Hidden Markov Model”, Computer Speech and Language, Vol. 6, No. 3, pp. 225-242, 1992.
  • H.Y. Jabar, T. Sembok and M. Tengku, “Design and Implement an Automatic Neural Tagger based Arabic Language for NLP Applications”, Asian Journal of Information Technology, Vol. 5, No. 7, pp. 784-789, 2006.
  • J.H. Yousif and T. Sembok, “Recurrent Neural Approach Based Arabic Part-Of-Speech Tagging”, Proceedings of International Conference on Computer and Communication Engineering, pp. 9-11, 2006.
  • J.H. Yousif and T. Sembok, “Arabic Part-of-Speech Tagger Based Neural Networks”, Proceedings of International Arab Conference on Information Technology, pp. 1-6, 2005.
  • B.R. Das, S. Sahoo, C.S. Panda and S. Patnaik, “Part of Speech Tagging in ODIA using Support Vector Machine”, Procedia Computer Science, Vol. 48, pp. 507-512, 2015.
  • T. Nakagawa, T. Kudo and Y. Matsumoto, “Unknown Word Guessing and Part-of-Speech Tagging Using Support Vector Machines”, Available at: http://afnlp.org/archives/nlprs2001/pdf/0053-01.pdf.
  • A. Ekbal and S. Bandyopadhyay, “Part of Speech Tagging in Bengali using Support Vector Machine”, Proceedings of IEEE International Conference on Information Technology, pp. 106-111, 2008.
  • D.D. Pham, G.B. Tran and S.B. Pham, “A Hybrid Approach to Vietnamese Word Segmentation using Part of Speech Tags”, Proceedings of International Conference on Knowledge and Systems Engineering, pp. 154-161, 2009.
  • Y. Benajiba, M. Diab and P. Rosso, “Arabic Named Entity Recognition using Optimized Feature Sets”, Proceedings of Conference on Empirical Methods in Natural Language Processing, pp. 284-293, 2008.
  • M.T. Diab, “Improved Arabic Base Phrase Chunking with a New Enriched POS Tag Set”, Proceedings of Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, pp. 89-96, 2007.
  • M. Diab, K. Hacioglu and D. Jurafsky, “Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks”, Proceedings of Student Workshop at North American Chapter of the Association for Computational Linguistics, pp. 149-152, 2004.
  • A. Pasha et al., “MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic”, Proceedings of International Conference on Language Resources and Evaluation, pp. 1094-1101, 2014.
  • M. Diab, “Second Generation AMIRA Tools for Arabic Processing: Fast and Robust Tokenization, POS Tagging, and Base Phrase Chunking”, Proceedings of 2nd International Conference on Arabic Language Resources and Tools, pp. 1-5, 2009.
  • K. Shaalan, H.M. Abo Bakr and I. Ziedan, “A Hybrid Approach for Building Arabic Diacritizer”, Proceedings of Workshop on Computational Approaches to Semitic Languages, pp. 27-35, 2009.
  • N. Habash and O. Rambow, “Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop”, Proceedings of 43rd Annual Meeting on Association for Computational Linguistics, pp. 573-580, 2005.
  • J.H. Yousif and T.M.T. Sembok, “Arabic Part-of-Speech Tagger based Support Vectors Machines”, Proceedings of. International Symposium on Information Technology, Vol. 3, pp. 1-7, 2008.
  • E. Zarrouk and Y. Benayed, “Hybrid SVM/HMM Model for the Arab Phonemes Recognition”, International Arab Journal of Information Technology, Vol. 13, No. 5, pp. 574-582, 2016.
  • A. Ali et al., “Automatic Dialect Detection in Arabic Broadcast Speech”, Available at: https://arxiv.org/pdf/1509.06928.pdf.
  • K. Elghamry et al., “Arabic Anaphora Resolution using the Web as Corpus”, Proceedings of 7th International Conference on Language Engineering, pp. 1131-1137, 2007.
  • J.H. Yousif, “Natural Language Processing based Soft Computing Techniques”, International Journal of Computer Applications, Vol. 77, No. 8, pp. 1-7, 2013.
  • N. Habash, O. Rambow and A. Roth, “MADA+TOKAN: A Toolkit for Arabic Tokenization, Diacritization, Morphological Disambiguation, POS Tagging, Stemming and Lemmatization”, Proceedings of 2nd International Conference on Arabic Language Resources and Tools, Vol. 41, pp. 62-68, 2009.
  • H.S. Ibrahim, S.M. Abdou and M. Gheith, “Sentiment Analysis for Modern Standard Arabic and Colloquial”, Available at: https://arxiv.org/ftp/arxiv/papers/1505/1505.03105.pdf.
  • G. Badaro, R. Baly, H. Hajj, N. Habash and W. El-Hajj, “A Large Scale Arabic Sentiment Lexicon for Arabic Opinion Mining”, Proceedings of Workshop on Arabic Natural Language Processing, pp. 165-173, 2014.
  • A.M. El-Halees, “Arabic Text Classification using Maximum Entropy”, The Islamic University Journal, Vol. 15, No. 1, pp. 157-167, 2015.
  • A. Shoukry and A. Rafea, “Sentence-Level Arabic Sentiment Analysis”, Proceedings of International Conference on Collaboration Technologies and Systems, pp. 546-550, 2012.
  • S. Mansour, K. Simaan and Y. Winter, “Smoothing a Lexicon-based POS Tagger for Arabic and Hebrew”, Proceedings of Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, pp. 97-103, 2007.
  • F.H. Mahyoub, M.A. Siddiqui and M.Y. Dahab, “Building an Arabic Sentiment Lexicon using Semi-Supervised Learning”, Journal of King Saud University-Computer and Information Sciences, Vol. 26, No. 4, pp. 417-424, 2014.
  • M. Oudah and K. Shaalan, “A Pipeline Arabic Named Entity Recognition using a Hybrid Approach”, Proceedings of International Conference on Computational Linguistics, pp. 2159-2176, 2012.
  • N. Khoufi, C. Aloulou and L.H. Belguith, “ARSYPAR: A Tool for Parsing the Arabic Language based on Supervised Learning”, Proceedings of International Arab Conference on Information Technology, pp. 1-6, 2013.
  • M. Outahajala, Y. Benajiba, P. Rosso and L. Zenkouar, “Using Confidence and Informativeness Criteria to Improve POS-Tagging in Amazigh”, Journal of Intelligent and Fuzzy Systems, Vol. 28, No. 3, pp. 1319-1330, 2015.
  • K. Darwish et al., “Arabic POS Tagging: Don't Abandon Feature Engineering Just Yet”, Proceedings of 3rd Arabic Natural Language Processing Workshop, pp. 130-137, 2017.
  • R. Eskander et al., “Foreign Words and the Automatic Processing of Arabic Social Media Text Written in Roman Script”, Proceedings of 1st Workshop on Computational Approaches to Code Switching, pp. 1-12, 2014.
  • A. Soudi, G. Neumann and A. Van den Bosch, “Arabic Computational Morphology: Knowledge-Based and Empirical Methods”, Proceedings of International Arab Conference on Arabic Computational Morphology, pp. 3-14, 2007.

Abstract Views: 176

PDF Views: 0




  • Part of Speech Tagger for Arabic Text based Support Vector Machines:A Review

Abstract Views: 176  |  PDF Views: 0

Authors

Jabar H. Yousif
Department of Computing and Information Technology, Sohar University, Oman
Maryam H. Al-Risi
Department of Computing and Information Technology, Sohar University, Oman

Abstract


There is not much research that discusses the Part of speech (POS) tagger for the Arabic language. Hence, the Arabic language is challenging to identify the types of part of the speech of a particular word in a given context because most modern texts do not use diacritical marks. Hence, one word could spell in several different ways. Also, the distinction between the differences in the Arab derivatives is a complicated issue, so the clarification of the correct types on the POS requires the use of different resources and advanced processing. Therefore, the study of part of the speech can contribute to literature and progress in the signs of the Arabic language. The POS is employed in different fields of natural languages processing such as text translation, and extraction, text classification and identifies the type of speech. Identifying unique POS tags for the Arabic language is a difficult task. This paper aims to review the implementation of support vector machines (SVM) for utilizing the POS for the Arabic Language. Therefore, the primary objectives of this paper are to summarize and organize the works for tagging the Arabic text based on SVM automatically and efficiently for motivating and guiding researchers to do more research on the online applications for the Arabic language.

Keywords


Part of Speech, Arabic Text Tagging, SVM, NLP, Machine Learning, Corpus.

References