Open Access Open Access  Restricted Access Subscription Access

Gender and Authorship Categorisation of Arabic Text from Twitter Using PPM


Affiliations
1 Department of Computer Sciences And Engineering, University of Hail, Saudi Arabia
2 Department of Computer Science, University of Bangor, Bangor, United Kingdom
 

In this paper we present gender and authorship categorisationusing the Prediction by Partial Matching (PPM) compression scheme for text from Twitter written in Arabic. The PPMD variant of the compression scheme with different orders was used to perform the categorisation. We also applied different machine learning algorithms such as Multinational Naïve Bayes (MNB), K-Nearest Neighbours (KNN), and an implementation of Support Vector Machine (LIBSVM), applying the same processing steps for all the algorithms. PPMD shows significantly better accuracy in comparison to all the other machine learning algorithms, with order 11 PPMD working best, achieving 90 % and 96% accuracy for gender and authorship respectively.

Keywords

Arabic Text Categorisation, Data Compression, Machine Learning Algorithms.
User
Notifications
Font Size


  • Gender and Authorship Categorisation of Arabic Text from Twitter Using PPM

Abstract Views: 481  |  PDF Views: 199

Authors

Mohammed Altamimi
Department of Computer Sciences And Engineering, University of Hail, Saudi Arabia
William J. Teahan
Department of Computer Science, University of Bangor, Bangor, United Kingdom

Abstract


In this paper we present gender and authorship categorisationusing the Prediction by Partial Matching (PPM) compression scheme for text from Twitter written in Arabic. The PPMD variant of the compression scheme with different orders was used to perform the categorisation. We also applied different machine learning algorithms such as Multinational Naïve Bayes (MNB), K-Nearest Neighbours (KNN), and an implementation of Support Vector Machine (LIBSVM), applying the same processing steps for all the algorithms. PPMD shows significantly better accuracy in comparison to all the other machine learning algorithms, with order 11 PPMD working best, achieving 90 % and 96% accuracy for gender and authorship respectively.

Keywords


Arabic Text Categorisation, Data Compression, Machine Learning Algorithms.

References