Open Access Open Access  Restricted Access Subscription Access

Crawler Architecture using Grid Computing


Affiliations
1 Dept. of Computer Science, High Institute for Computers and Information Systems, Al Shorouk Academy, Egypt
2 Dept. of Computer Science, Mansoura University, Egypt
 

Crawler is one of the main components in the search engines which use URLs to fetch web pages to build a repository of web pages starting with entering URL. Each web page is parsed to extract the URLs included in it and store the extracted URLs in the URLs Queue to fetch by the crawlers in sequential. The process of crawling takes long time to collect more web pages, and it has become necessary to utilize the unused computing resources and cost/time savings in organizations. This paper deals with the crawler of search engine using grid computing. This paper presents the grid computing that has been implemented by Alchemi. Alchemi is an open source project developed at the University of Melbourne, provides middleware for creating an enterprise grid computing environment. The crawling processes are passed to Alchemi manager which distribute the processes over a number of computers as executors. The search engine crawler with the grid computing is implemented, tested and the results are analyzed. There is an increase in performance and less time over the single computer.

Keywords

Crawler, URL, Grid Computing, Alchemi, Manager, Executor, Performance, and Web Pages.
User
Notifications
Font Size

Abstract Views: 347

PDF Views: 186




  • Crawler Architecture using Grid Computing

Abstract Views: 347  |  PDF Views: 186

Authors

M. E. ElAraby
Dept. of Computer Science, High Institute for Computers and Information Systems, Al Shorouk Academy, Egypt
M. M. Sakre
Dept. of Computer Science, High Institute for Computers and Information Systems, Al Shorouk Academy, Egypt
M. Z. Rashad
Dept. of Computer Science, Mansoura University, Egypt
O. Nomir
Dept. of Computer Science, Mansoura University, Egypt

Abstract


Crawler is one of the main components in the search engines which use URLs to fetch web pages to build a repository of web pages starting with entering URL. Each web page is parsed to extract the URLs included in it and store the extracted URLs in the URLs Queue to fetch by the crawlers in sequential. The process of crawling takes long time to collect more web pages, and it has become necessary to utilize the unused computing resources and cost/time savings in organizations. This paper deals with the crawler of search engine using grid computing. This paper presents the grid computing that has been implemented by Alchemi. Alchemi is an open source project developed at the University of Melbourne, provides middleware for creating an enterprise grid computing environment. The crawling processes are passed to Alchemi manager which distribute the processes over a number of computers as executors. The search engine crawler with the grid computing is implemented, tested and the results are analyzed. There is an increase in performance and less time over the single computer.

Keywords


Crawler, URL, Grid Computing, Alchemi, Manager, Executor, Performance, and Web Pages.