The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).

If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.

Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.

Fullscreen Fullscreen Off


World Wide Web (WWW) is a huge, dynamic, self-organized, and strongly interlinked source of information. Search engine became a vital IR (Information Retrieval) system to retrieve the required information. Results appearing in the first few pages gain more attraction and importance. Since users believe that they were more relevant because of its top positions. Spamdexing plays a key role in making high rank and top visibility for an undeserved page. This paper focus on two aspects: new features and new classifiers. First, 27 new features which are used to commercially boost the ranking and reputation are considered for classification. Along with them 17 new features were proposed and computed. Totally 44 features were combined with the existing WEBSPAM-UK 2007 dataset which is the baseline. With all these features, feature inclusion study is carried out to elevate the performance. Second aspect considered in this paper is exploring new suite of five different machine learners for the web spam classification problem. Results are discussed. New feature inclusion improves the classification accuracy of the publicly available WEBSPAM-UK 2007 features by 22%. SVM outperforms well than the other methods in terms of accuracy.

Keywords

Decision Table, HMM, Search Engine, SVM, Web Spam.
User
Notifications
Font Size