In this paper we present gender and authorship categorisationusing the Prediction by Partial Matching (PPM) compression scheme for text from Twitter written in Arabic. The PPMD variant of the compression scheme with different orders was used to perform the categorisation. We also applied different machine learning algorithms such as Multinational Naïve Bayes (MNB), K-Nearest Neighbours (KNN), and an implementation of Support Vector Machine (LIBSVM), applying the same processing steps for all the algorithms. PPMD shows significantly better accuracy in comparison to all the other machine learning algorithms, with order 11 PPMD working best, achieving 90 % and 96% accuracy for gender and authorship respectively.
Keywords
Arabic Text Categorisation, Data Compression, Machine Learning Algorithms.
User
Font Size
Information