Open Access Open Access  Restricted Access Subscription Access

Combining Machine Learning And Semantic Analysis For Efficient Misinformation Detection Of Arabic Covid-19 Tweets


Affiliations
1 Department of Information Systems, King Saud University, Riyadh, Saudi Arabia
 

With the spread of social media platforms and the proliferation of misleading news, misinformation detection within microblogging platforms has become a real challenge. During the Covid-19 pandemic, many fake news and rumors were broadcasted and shared daily on social media. In order to filter out these fake news, many works have been done on misinformation detection using machine learning and sentiment analysis in the English language. However, misinformation detection research in the Arabic language on social media is limited. This paper introduces a misinformation verification system for Arabic COVID-19 related news using an Arabic rumors dataset on Twitter. We explored the dataset and prepared it using multiple phases of preprocessing techniques before applying different machine learning classification algorithms combined with a semantic analysis method. The model was applied on 3.6k annotated tweets achieving 93% best overall accuracy of the model in detecting misinformation. We further build another dataset of Covid-19 related claims in Arabic to examine how our model performs with this new set of claims. Results show that the combination of machine learning techniques and linguistic analysis achieves the best scores reaching 92% best accuracy in detecting the veracity of sentences of the new dataset.


Keywords

Misinformation, machine learning, Arabic NLP, contextual exploration, rumor detection.
User
Notifications
Font Size


  • Combining Machine Learning And Semantic Analysis For Efficient Misinformation Detection Of Arabic Covid-19 Tweets

Abstract Views: 289  |  PDF Views: 143

Authors

Abdulrahim Alhaizaey
Department of Information Systems, King Saud University, Riyadh, Saudi Arabia
Jawad Berri
Department of Information Systems, King Saud University, Riyadh, Saudi Arabia

Abstract


With the spread of social media platforms and the proliferation of misleading news, misinformation detection within microblogging platforms has become a real challenge. During the Covid-19 pandemic, many fake news and rumors were broadcasted and shared daily on social media. In order to filter out these fake news, many works have been done on misinformation detection using machine learning and sentiment analysis in the English language. However, misinformation detection research in the Arabic language on social media is limited. This paper introduces a misinformation verification system for Arabic COVID-19 related news using an Arabic rumors dataset on Twitter. We explored the dataset and prepared it using multiple phases of preprocessing techniques before applying different machine learning classification algorithms combined with a semantic analysis method. The model was applied on 3.6k annotated tweets achieving 93% best overall accuracy of the model in detecting misinformation. We further build another dataset of Covid-19 related claims in Arabic to examine how our model performs with this new set of claims. Results show that the combination of machine learning techniques and linguistic analysis achieves the best scores reaching 92% best accuracy in detecting the veracity of sentences of the new dataset.


Keywords


Misinformation, machine learning, Arabic NLP, contextual exploration, rumor detection.

References