Reptree Classifier for Identifying Link Spam in Web Search Engines

S. K. Jayanthi; S. Sasikala

Reptree Classifier for Identifying Link Spam in Web Search Engines

Affiliations
1 Department of Computer Science, Vellalar College for Women, India
2 Department of Computer Science, KSR College of Arts and Science, India

Subscribe/Renew Journal

Abstract
References
Article Metrics
Refbacks

Search Engines are used for retrieving the information from the web. Most of the times, the importance is laid on top 10 results sometimes it may shrink as top 5, because of the time constraint and reliability on the search engines. Users believe that top 10 or 5 of total results are more relevant. Here comes the problem of spamdexing. It is a method to deceive the search result quality. Falsified metrics such as inserting enormous amount of keywords or links in website may take that website to the top 10 or 5 positions. This paper proposes a classifier based on the Reptree (Regression tree representative). As an initial step Link-based features such as neighbors, pagerank, truncated pagerank, trustrank and assortativity related attributes are inferred. Based on this features, tree is constructed. The tree uses the feature inference to differentiate spam sites from legitimate sites. WEBSPAM-UK-2007 dataset is taken as a base. It is preprocessed and converted into five datasets FEATA, FEATB, FEATC, FEATD and FEATE. Only link based features are taken for experiments. This paper focus on link spam alone. Finally a representative tree is created which will more precisely classify the web spam entries. Results are given. Regression tree classification seems to perform well as shown through experiments.

Keywords

Web Link Spam, Classification, Reptree, Decision Tree, Search Engine.

I-Scholar

Journal Help

Subscription Login to verify subscription

User

Notifications

Journal Content
Browse

Font Size

Information

Abstract Views: 227

PDF Views: 0

Reptree Classifier for Identifying Link Spam in Web Search Engines

Abstract Views: 227 | PDF Views: 0

Authors

S. K. Jayanthi
Department of Computer Science, Vellalar College for Women, India

S. Sasikala
Department of Computer Science, KSR College of Arts and Science, India

Abstract

Keywords

Web Link Spam, Classification, Reptree, Decision Tree, Search Engine.

Username
Password
Remember me

Username
Password
Remember me

ICTACT Journal on Soft Computing

ICTACT Journal on Soft Computing

Reptree Classifier for Identifying Link Spam in Web Search Engines

Subscribe/Renew Journal

Keywords

Reptree Classifier for Identifying Link Spam in Web Search Engines

Authors

Abstract

Keywords