Data Mining Approaches for Web Spam Detection

K. M. Annammal; J. Sugunthan; A. Siva Sundari; N. Jaisankar

Data Mining Approaches for Web Spam Detection

K. M. Annammal ¹, J. Sugunthan ², A. Siva Sundari ¹, N. Jaisankar ¹

Affiliations
1 Department of Computer Science and Engineering, Misrimal Navajee Munoth Jain Engineering College, Anna University, India
2 Infant Jesus College of Engineering and Technology, Anna University, Chennai, India

Web spam is a serious problem for search engines because the quality of their results can be severely degraded by the presence of this kind of page. In this paper, we present an efficient spam detection system based on a classifier that combines new link-based features with language-model (LM)-based ones. We have specifically applied the Kullback-Leibler divergence on different combinations of these sources of information in order to characterize the relationship between two linked pages. In this paper, we present an efficient spam detection system based on a Hybrid clustering that combines K-means and SVM and then classified by using C4.5 with Qualified link-based features and Language Model(LM) based once. The result is an accurate system for detecting Web spam using fewer features.

Keywords

Content Analysis, Information Retrieval, Language Models (LMs), Link Integrity, Web Spam Detection.

I-Scholar

Journal Help

User

Subscription Login to verify subscription

Notifications

Journal Content
Browse

Font Size

Information

Abstract Views: 313

PDF Views: 2

Data Mining Approaches for Web Spam Detection

Abstract Views: 313 | PDF Views: 2

Authors

K. M. Annammal
Department of Computer Science and Engineering, Misrimal Navajee Munoth Jain Engineering College, Anna University, India

J. Sugunthan
Infant Jesus College of Engineering and Technology, Anna University, Chennai, India

A. Siva Sundari
Department of Computer Science and Engineering, Misrimal Navajee Munoth Jain Engineering College, Anna University, India

N. Jaisankar
Department of Computer Science and Engineering, Misrimal Navajee Munoth Jain Engineering College, Anna University, India

Abstract

Keywords

Content Analysis, Information Retrieval, Language Models (LMs), Link Integrity, Web Spam Detection.

Username
Password
Remember me

Username
Password
Remember me

Data Mining and Knowledge Engineering

Data Mining and Knowledge Engineering

Data Mining Approaches for Web Spam Detection

Subscribe/Renew Journal

Keywords

Data Mining Approaches for Web Spam Detection

Authors

Abstract

Keywords