Open Access Open Access  Restricted Access Subscription Access

Improved Parallel PageRank Algorithm for Spam Filtering


Affiliations
1 Department of Computer Science and Engineering, Maulana Azad National Institute of Technology, Bhopal - 462003, Madhya Pradesh, India
2 Adobe Systems, Noida – 201304, India
 

Background/Objectives: PageRanking algorithm is a well known link based technique given by Google for indexing of its web pages. This algorithm works on the linking structure of web pages id est inbound and outbound links of pages. The existing Page Rank algorithm follows equal distribution law that is; it distributes the Page Rank of a web page evenly among all the outgoing links. The problem with the uniform distribution of Page Rank is that sometimes uninteresting pages got high Page Rank values. Methods/Statistical Analysis: This paper proposed an improved parallel Page Rank algorithm that un-uniformly distributes the Page Rank values among all the outgoing links. The proposed work has been implemented on NVIDIA Quadro 2000 GPU architecture using CUDA programming language. Findings: The proposed algorithm mitigates spam and provides better results in terms of computational time as compared to Parallel Page Rank, because it assigns higher priority to important pages and less priority to less important web pages. By assigning values in such a fashion important pages show an increase in the Page Rank value and unrelated pages that is spam pages show a decrease in Page Rank value. Application: The proposed work performs spam filtering by classifying important as well as irrelevant web pages.

Keywords

CUDA, GPU, Non-Uniform Distribution, Parallel Page Rank, Spam Pages.
User

Abstract Views: 175

PDF Views: 0




  • Improved Parallel PageRank Algorithm for Spam Filtering

Abstract Views: 175  |  PDF Views: 0

Authors

Hema Dubey
Department of Computer Science and Engineering, Maulana Azad National Institute of Technology, Bhopal - 462003, Madhya Pradesh, India
Nilay Khare
Department of Computer Science and Engineering, Maulana Azad National Institute of Technology, Bhopal - 462003, Madhya Pradesh, India
K. K. Appu Kuttan
Department of Computer Science and Engineering, Maulana Azad National Institute of Technology, Bhopal - 462003, Madhya Pradesh, India
Shreyas Bhatia
Adobe Systems, Noida – 201304, India

Abstract


Background/Objectives: PageRanking algorithm is a well known link based technique given by Google for indexing of its web pages. This algorithm works on the linking structure of web pages id est inbound and outbound links of pages. The existing Page Rank algorithm follows equal distribution law that is; it distributes the Page Rank of a web page evenly among all the outgoing links. The problem with the uniform distribution of Page Rank is that sometimes uninteresting pages got high Page Rank values. Methods/Statistical Analysis: This paper proposed an improved parallel Page Rank algorithm that un-uniformly distributes the Page Rank values among all the outgoing links. The proposed work has been implemented on NVIDIA Quadro 2000 GPU architecture using CUDA programming language. Findings: The proposed algorithm mitigates spam and provides better results in terms of computational time as compared to Parallel Page Rank, because it assigns higher priority to important pages and less priority to less important web pages. By assigning values in such a fashion important pages show an increase in the Page Rank value and unrelated pages that is spam pages show a decrease in Page Rank value. Application: The proposed work performs spam filtering by classifying important as well as irrelevant web pages.

Keywords


CUDA, GPU, Non-Uniform Distribution, Parallel Page Rank, Spam Pages.



DOI: https://doi.org/10.17485/ijst%2F2016%2Fv9i38%2F126657