Open Access
Subscription Access
Open Access
Subscription Access
Component Based Effective Web Crawler and Indexer Using Web Services
Subscribe/Renew Journal
Designing and developing an effective web crawler is a challenging role in a large search engine. This paper proposes component based web crawler along with the indexer. The WebCrawler consist of crawler services and indexer services and realized as web services. The communication between the services is sent and received using XML, SOAP and WSDL. In the crawler service, the web pages are fetched and parsed for retrieving all the hyperlinks. The process is carried out recursively using Breadth-First strategy. The extracted URLs are downloaded and those web pages are sent to the indexer services by passing the message. In the indexer service, HTML pages are parsed, stop words are removed, stemming of keywords are carried out as pre-processing steps and the result is stored in the form of inverted index. We have evaluated the performance of the proposed design specification of the crawler with indexer and found that the number of pages retrieved is notably on the higher side.
Keywords
Inverted Index, Tokenization, URL, Web Crawler, Web Service.
User
Subscription
Login to verify subscription
Font Size
Information
Abstract Views: 210
PDF Views: 4