Feature Selection Methods for Classifying  Email Messages: Analysis, Proposal, and  Comparative Study

Sanaa Abou Elhamayed; Samah Osama M. Kamel

Feature Selection Methods for Classifying Email Messages: Analysis, Proposal, and Comparative Study

Sanaa Abou Elhamayed , Samah Osama M. Kamel

Affiliations
1 Department of Informatics Research, Electronics Research Institute, Cairo, Egypt

Spam Email messages have a big problem either for users or for the Internet serviceproviders. The content of such messages may contain viruses and bad information. The spam messages also occupya huge amount of space on the mail boxes. So, the process of Emails' classification is very important to be analyzed and discussed. This research work aims at classifying the email messages into either spam or non-spam. The E-mail messages or a dataset can be represented in a matrix form. The rows of the matrix are representing the instances (messages) while the columns are representing the features of such instances. K-Nearest Neighbor (KNN) and Naïve Bayes (NB) are two classifiers where they are used to classify the email messages. The proposed approach based on partitioning the dataset into segment and compared with the adopted approach. Moreover, feature selection methods are adopted to choose the significant features and eliminate the others to avoid processing overheads. The choice of the relevant features plays an important role of the classification accuracy. In this work, some feature selection methods are adopted, analyzed, and operated. The performance of such methods is compared. Moreover, a feature selection method is proposed and discussed. The performance of the proposed feature selection method is compared with the adopted ones. This work is operated on a chosen dataset taken from the Internet. The dataset contains about four-thousand messages with fifty-eight features. Moreover, the dataset is supported with a target feature representing the class labels. From the practical experiments it is shown that the performance of the proposed method is better than the adopted ones. It is also expected that the proposed method is applicable to other datasets for other application domains.

Keywords

Spam Messages, Classification Algorithms, Feature Selection Methods, Text Representation, and Performance Evaluation.

I-Scholar

Journal Help

User

Notifications

Journal Content
Browse

Font Size

Information

Feature Selection Methods for Classifying Email Messages: Analysis, Proposal, and Comparative Study

Abstract Views: 234 | PDF Views: 0

Authors

Sanaa Abou Elhamayed
Department of Informatics Research, Electronics Research Institute, Cairo, Egypt

Samah Osama M. Kamel
Department of Informatics Research, Electronics Research Institute, Cairo, Egypt

Abstract

Keywords

Spam Messages, Classification Algorithms, Feature Selection Methods, Text Representation, and Performance Evaluation.

Username
Password
Remember me

Username
Password
Remember me

International Journal of Advanced Networking and Applications

International Journal of Advanced Networking and Applications

Feature Selection Methods for Classifying Email Messages: Analysis, Proposal, and Comparative Study

Keywords

Feature Selection Methods for Classifying Email Messages: Analysis, Proposal, and Comparative Study

Authors

Abstract

Keywords