Open Access Open Access  Restricted Access Subscription Access

Combining Machine Learning And Semantic Analysis For Efficient Misinformation Detection Of Arabic Covid-19 Tweets


Affiliations
1 Department of Information Systems, King Saud University, Riyadh, Saudi Arabia
 

With the spread of social media platforms and the proliferation of misleading news, misinformation detection within microblogging platforms has become a real challenge. During the Covid-19 pandemic, many fake news and rumors were broadcasted and shared daily on social media. In order to filter out these fake news, many works have been done on misinformation detection using machine learning and sentiment analysis in the English language. However, misinformation detection research in the Arabic language on social media is limited. This paper introduces a misinformation verification system for Arabic COVID-19 related news using an Arabic rumors dataset on Twitter. We explored the dataset and prepared it using multiple phases of preprocessing techniques before applying different machine learning classification algorithms combined with a semantic analysis method. The model was applied on 3.6k annotated tweets achieving 93% best overall accuracy of the model in detecting misinformation. We further build another dataset of Covid-19 related claims in Arabic to examine how our model performs with this new set of claims. Results show that the combination of machine learning techniques and linguistic analysis achieves the best scores reaching 92% best accuracy in detecting the veracity of sentences of the new dataset.


Keywords

Misinformation, machine learning, Arabic NLP, contextual exploration, rumor detection.
User
Notifications
Font Size

  • Radcliffe and H. Abuhmaid, “Social media in the middle east: 2019 in review,” SSRN Electron. J., 2020.
  • ExtraDigital Ltd, www. extradigital. co. uk, “Prominent Arabic Social Media,” Extradigital.co.uk. [Online]. Available: https://www.extradigital.co.uk/articles/arabic/social-media.html. [Accessed: 02-May-2021].
  • “Twitter: most-used languages 2013,” Statista.com. [Online]. Available: https://www.statista.com/statistics/267129/most-used-languages-on-twitter/. [Accessed: 02-May-2021].
  • Alam et al., “Fighting the COVID-19 infodemic: Modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society,” arXiv [cs.CL], 2020.
  • P. Patwa et al., “Fighting an Infodemic: COVID-19 Fake News Dataset,” arXiv [cs.CL], 2020.
  • S. M. Alzanin and A. M. Azmi, “Rumor detection in Arabic tweets using semi-supervised and unsupervised expectation–maximization,” Knowl. Based Syst., vol. 185, no. 104945, p. 104945, 2019.
  • M. Alkhair, K. Meftouh, K. Smaïli, and N. Othman, “An Arabic corpus of fake news: Collection, analysis and classification,” in Communications in Computer and Information Science, Cham: Springer International Publishing, 2019, pp. 292–302.
  • Haouari, M. Hasanain, R. Suwaileh, and T. Elsayed, “ArCOV19-Rumors: Arabic COVID-19 Twitter dataset for misinformation detection,” arXiv [cs.CL], 2020.
  • M. J. Lazer et al., “The science of fake news,” Science, vol. 359, no. 6380, pp. 1094–1096, 2018.
  • R. Vaithianathan, N. Jiang, T. Maloney, P. Nand, and E. Putnam-Hornstein, Developing predictive risk models to support child maltreatment hotline screening decisions: Allegheny County methodology and implementation [PDF. Auckland: Centre for Social Data Analytics, 2017.
  • “Working to stop misinformation and false news,” Facebook.com. [Online]. Available: https://www.facebook.com/formedia/blog/working-to-stop-misinformation-and-false-news. [Accessed: 02-May-2021].
  • Documenting the Now, Hydrator [Computer Software]. Retrieved from https://github.com/docnow/hydrator. Accessed March, 2021.
  • S. Larabi Marie-Sainte, N. Alalyani, S. Alotaibi, S. Ghouzali, and I. Abunadi, “Arabic natural language processing and machine learning-based systems,” IEEE Access, vol. 7, pp. 7011–7020, 2019.
  • S. Bird, E. Klein, and E. Loper, Natural language processing with python: Analyzing text with the natural language toolkit. O’Reilly Media, 2009.
  • Pedregosa et al., “Scikit-learn: Machine Learning in Python,” arXiv [cs.LG], 2012.
  • I. Webb et al., “Logistic Regression,” in Encyclopedia of Machine Learning, Boston, MA: Springer US, 2011, pp. 631–631.
  • M. D. Buhmann et al., “Random Forests,” in Encyclopedia of Machine Learning, Boston, MA: Springer US, 2011, pp. 828–828.
  • I. Webb, E. Keogh, R. Miikkulainen, R. Miikkulainen, and M. Sebag, “Naïve Bayes,” in Encyclopedia of Machine Learning, Boston, MA: Springer US, 2011, pp. 713–714.
  • Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, 1995.
  • B. Y.-L. Kuo, T. Hentrich, B. M. Good, and M. D. Wilkinson, “Tag clouds for summarizing web search results,” in Proceedings of the 16th international conference on World Wide Web - WWW ’07, 2007.
  • F. Heimerl, S. Lohmann, S. Lange, and T. Ertl, “Word cloud explorer: Text analytics based on word clouds,” in 47th Hawaii International Conference on System Sciences, 2014.
  • J. Berri, M. Al-Khamis, Information Exploration Using Mobile Agents, WSEAS Transactions on Computers, vol. 3, no. 3, 706-712, 2004.
  • J. Berri, R. Benlamri, Y Atif, H. Khallouki, Web Hypermedia Resources Reuse and Integration for On-Demand M-Learning, International Journal of Computer Science and Network Security, vol. 21, no. 1, pp. 125-136, 2021

Abstract Views: 147

PDF Views: 83




  • Combining Machine Learning And Semantic Analysis For Efficient Misinformation Detection Of Arabic Covid-19 Tweets

Abstract Views: 147  |  PDF Views: 83

Authors

Abdulrahim Alhaizaey
Department of Information Systems, King Saud University, Riyadh, Saudi Arabia
Jawad Berri
Department of Information Systems, King Saud University, Riyadh, Saudi Arabia

Abstract


With the spread of social media platforms and the proliferation of misleading news, misinformation detection within microblogging platforms has become a real challenge. During the Covid-19 pandemic, many fake news and rumors were broadcasted and shared daily on social media. In order to filter out these fake news, many works have been done on misinformation detection using machine learning and sentiment analysis in the English language. However, misinformation detection research in the Arabic language on social media is limited. This paper introduces a misinformation verification system for Arabic COVID-19 related news using an Arabic rumors dataset on Twitter. We explored the dataset and prepared it using multiple phases of preprocessing techniques before applying different machine learning classification algorithms combined with a semantic analysis method. The model was applied on 3.6k annotated tweets achieving 93% best overall accuracy of the model in detecting misinformation. We further build another dataset of Covid-19 related claims in Arabic to examine how our model performs with this new set of claims. Results show that the combination of machine learning techniques and linguistic analysis achieves the best scores reaching 92% best accuracy in detecting the veracity of sentences of the new dataset.


Keywords


Misinformation, machine learning, Arabic NLP, contextual exploration, rumor detection.

References