Open Access Open Access  Restricted Access Subscription Access

Text Classification for Arabic Words Using REP-Tree


Affiliations
1 Department of Computer Engineering, Islamic University, Gaza, Palestinian Territory, Occupied
 

The amount of text data mining in the world and in our life seems ever increasing and there's no end to it. The concept (Text Data Mining) defined as the process of deriving high-quality information from text. It has been applied on different fields including: Pattern mining, opinion mining, and web mining. The concept of Text Data Mining is based around the global Stemming of different forms of Arabic words. Stemming is defined like the method of reducing inflected (or typically derived) words to their word stem, base or ischolar_main kind typically a word kind. We use the REP-Tree to improve text representation. In addition, test new combinations of weighting schemes to be applied on Arabic text data for classification purposes. For processing, WEKA workbench is used. The results in the paper on data set of BBC-Arabic website also show the efficiency and accuracy of REP-TREE in Arabic text classification.

Keywords

Data Mining, Text Classification, Text Data Mining, Arabic Text Classification, Pre-Processing.
User
Notifications
Font Size

Abstract Views: 263

PDF Views: 138




  • Text Classification for Arabic Words Using REP-Tree

Abstract Views: 263  |  PDF Views: 138

Authors

Hamza Naji
Department of Computer Engineering, Islamic University, Gaza, Palestinian Territory, Occupied
Wesam Ashour
Department of Computer Engineering, Islamic University, Gaza, Palestinian Territory, Occupied

Abstract


The amount of text data mining in the world and in our life seems ever increasing and there's no end to it. The concept (Text Data Mining) defined as the process of deriving high-quality information from text. It has been applied on different fields including: Pattern mining, opinion mining, and web mining. The concept of Text Data Mining is based around the global Stemming of different forms of Arabic words. Stemming is defined like the method of reducing inflected (or typically derived) words to their word stem, base or ischolar_main kind typically a word kind. We use the REP-Tree to improve text representation. In addition, test new combinations of weighting schemes to be applied on Arabic text data for classification purposes. For processing, WEKA workbench is used. The results in the paper on data set of BBC-Arabic website also show the efficiency and accuracy of REP-TREE in Arabic text classification.

Keywords


Data Mining, Text Classification, Text Data Mining, Arabic Text Classification, Pre-Processing.