Gender and Authorship Categorisation of Arabic Text from Twitter Using PPM

Mohammed Altamimi; William J. Teahan

Gender and Authorship Categorisation of Arabic Text from Twitter Using PPM

Mohammed Altamimi ¹, William J. Teahan ²

Affiliations
1 Department of Computer Sciences And Engineering, University of Hail, Saudi Arabia
2 Department of Computer Science, University of Bangor, Bangor, United Kingdom

In this paper we present gender and authorship categorisationusing the Prediction by Partial Matching (PPM) compression scheme for text from Twitter written in Arabic. The PPMD variant of the compression scheme with different orders was used to perform the categorisation. We also applied different machine learning algorithms such as Multinational Naïve Bayes (MNB), K-Nearest Neighbours (KNN), and an implementation of Support Vector Machine (LIBSVM), applying the same processing steps for all the algorithms. PPMD shows significantly better accuracy in comparison to all the other machine learning algorithms, with order 11 PPMD working best, achieving 90 % and 96% accuracy for gender and authorship respectively.

Keywords

Arabic Text Categorisation, Data Compression, Machine Learning Algorithms.

I-Scholar

Journal Help

User

Notifications

Journal Content
Browse

Font Size

Information

AIRCC's International Journal of Computer Science and Information Technology

Gender and Authorship Categorisation of Arabic Text from Twitter Using PPM

Keywords

Gender and Authorship Categorisation of Arabic Text from Twitter Using PPM

Authors

Abstract

Keywords

References

Username
Password
Remember me

Username
Password
Remember me