Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

PageRank Using MapReduce-An Open-Source Framework for Processing Large Data Sets


Affiliations
1 Department of MCA, Bharathiyar College of Engineering and Technology, Karaikal, Puducherry, India
2 Department of Computer Science, Avvaiyar Government College for Women, Karaikal, Puducherry, India
3 Deparment of MCA, Bharathiyar College of Engineering and Technology, Karaikal, Puducherry, India
     

   Subscribe/Renew Journal


MapReduce is simple data-parallel programming model designed for scalability and fault-tolerance and for processing and generating large data sets. It was initially created by Google for simplifying the development of large scale web search applications in data centers and has been proposed to form the basis of a ‘Data center computer’. Many real world tasks are expressible in this model. In this paper, a PageRank Algorithm is introduced for a hyperlink graph using MapReduce technique illustrated for a random web surfer. This algorithm computes the PageRank of several web pages which is distributed in the cloud. In this work, the Hyperlink Graph Page Rank (HGPR) algorithm is developed, using which the PageRanks can be computed and thereafter the most visited webpages can be traced out.
Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.
The implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable. A typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use.

Keywords

Adjacency List, Cloud Computing, Dampling Factor, HGPR Algorithm, MapReduce, PageRank (PR).
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 175

PDF Views: 2




  • PageRank Using MapReduce-An Open-Source Framework for Processing Large Data Sets

Abstract Views: 175  |  PDF Views: 2

Authors

N. Rehna
Department of MCA, Bharathiyar College of Engineering and Technology, Karaikal, Puducherry, India
N. Minni
Department of Computer Science, Avvaiyar Government College for Women, Karaikal, Puducherry, India
F. Jasmine Natchial
Deparment of MCA, Bharathiyar College of Engineering and Technology, Karaikal, Puducherry, India

Abstract


MapReduce is simple data-parallel programming model designed for scalability and fault-tolerance and for processing and generating large data sets. It was initially created by Google for simplifying the development of large scale web search applications in data centers and has been proposed to form the basis of a ‘Data center computer’. Many real world tasks are expressible in this model. In this paper, a PageRank Algorithm is introduced for a hyperlink graph using MapReduce technique illustrated for a random web surfer. This algorithm computes the PageRank of several web pages which is distributed in the cloud. In this work, the Hyperlink Graph Page Rank (HGPR) algorithm is developed, using which the PageRanks can be computed and thereafter the most visited webpages can be traced out.
Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.
The implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable. A typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use.

Keywords


Adjacency List, Cloud Computing, Dampling Factor, HGPR Algorithm, MapReduce, PageRank (PR).