Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Data Mining Approaches for Web Spam Detection


Affiliations
1 Department of Computer Science and Engineering, Misrimal Navajee Munoth Jain Engineering College, Anna University, India
2 Infant Jesus College of Engineering and Technology, Anna University, Chennai, India
     

   Subscribe/Renew Journal


Web spam is a serious problem for search engines because the quality of their results can be severely degraded by the presence of this kind of page. In this paper, we present an efficient spam detection system based on a classifier that combines new link-based features with language-model (LM)-based ones. We have specifically applied the Kullback-Leibler divergence on different combinations of these sources of information in order to characterize the relationship between two linked pages. In this paper, we present an efficient spam detection system based on a Hybrid clustering that combines K-means and SVM and then classified by using C4.5 with Qualified link-based features and Language Model(LM) based once. The result is an accurate system for detecting Web spam using fewer features.

Keywords

Content Analysis, Information Retrieval, Language Models (LMs), Link Integrity, Web Spam Detection.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 297

PDF Views: 2




  • Data Mining Approaches for Web Spam Detection

Abstract Views: 297  |  PDF Views: 2

Authors

K. M. Annammal
Department of Computer Science and Engineering, Misrimal Navajee Munoth Jain Engineering College, Anna University, India
J. Sugunthan
Infant Jesus College of Engineering and Technology, Anna University, Chennai, India
A. Siva Sundari
Department of Computer Science and Engineering, Misrimal Navajee Munoth Jain Engineering College, Anna University, India
N. Jaisankar
Department of Computer Science and Engineering, Misrimal Navajee Munoth Jain Engineering College, Anna University, India

Abstract


Web spam is a serious problem for search engines because the quality of their results can be severely degraded by the presence of this kind of page. In this paper, we present an efficient spam detection system based on a classifier that combines new link-based features with language-model (LM)-based ones. We have specifically applied the Kullback-Leibler divergence on different combinations of these sources of information in order to characterize the relationship between two linked pages. In this paper, we present an efficient spam detection system based on a Hybrid clustering that combines K-means and SVM and then classified by using C4.5 with Qualified link-based features and Language Model(LM) based once. The result is an accurate system for detecting Web spam using fewer features.

Keywords


Content Analysis, Information Retrieval, Language Models (LMs), Link Integrity, Web Spam Detection.