Component Based Effective Web Crawler and Indexer Using Web Services

A. Vadivel; S. G. Shaila; R. Devi Mahalakshmi; J. Karthika

Component Based Effective Web Crawler and Indexer Using Web Services

A. Vadivel , S. G. Shaila , R. Devi Mahalakshmi , J. Karthika

Affiliations
1 Multimedia Information Retrieval Group, Department of Computer Applications, National Institute of Technology, Tamilnadu, India

Designing and developing an effective web crawler is a challenging role in a large search engine. This paper proposes component based web crawler along with the indexer. The WebCrawler consist of crawler services and indexer services and realized as web services. The communication between the services is sent and received using XML, SOAP and WSDL. In the crawler service, the web pages are fetched and parsed for retrieving all the hyperlinks. The process is carried out recursively using Breadth-First strategy. The extracted URLs are downloaded and those web pages are sent to the indexer services by passing the message. In the indexer service, HTML pages are parsed, stop words are removed, stemming of keywords are carried out as pre-processing steps and the result is stored in the form of inverted index. We have evaluated the performance of the proposed design specification of the crawler with indexer and found that the number of pages retrieved is notably on the higher side.

Keywords

Inverted Index, Tokenization, URL, Web Crawler, Web Service.

I-Scholar

Journal Help

User

Subscription Login to verify subscription

Notifications

Journal Content
Browse

Font Size

Information

Abstract Views: 210

PDF Views: 4

Component Based Effective Web Crawler and Indexer Using Web Services

Abstract Views: 210 | PDF Views: 4

Authors

A. Vadivel
Multimedia Information Retrieval Group, Department of Computer Applications, National Institute of Technology, Tamilnadu, India

S. G. Shaila
Multimedia Information Retrieval Group, Department of Computer Applications, National Institute of Technology, Tamilnadu, India

R. Devi Mahalakshmi
Multimedia Information Retrieval Group, Department of Computer Applications, National Institute of Technology, Tamilnadu, India

J. Karthika
Multimedia Information Retrieval Group, Department of Computer Applications, National Institute of Technology, Tamilnadu, India

Abstract

Keywords

Inverted Index, Tokenization, URL, Web Crawler, Web Service.

Username
Password
Remember me

Username
Password
Remember me

Networking and Communication Engineering

Networking and Communication Engineering

Component Based Effective Web Crawler and Indexer Using Web Services

Subscribe/Renew Journal

Keywords

Component Based Effective Web Crawler and Indexer Using Web Services

Authors

Abstract

Keywords