A Survey on Data Extraction from Web Pages

Deepa John; G. Naveen Sundar

A Survey on Data Extraction from Web Pages

Deepa John , G. Naveen Sundar

Affiliations
1 Karunya University, Coimbatore, India

Subscribe/Renew Journal

Abstract
References
Article Metrics
Refbacks

Internet provides huge amount of information. The amount of information on the web is growing at an astonishing rate. Web can be considered as the largest knowledge base. Web pages contain a lot of information. Extracting data from the web pages are very difficult. This is mainly because of the complex structure of the web pages. And there isn’t any uniformity when the structure of the web page is considered. Due to the lack of any uniform structure of Web information sources, access to this huge collection of information has been limited to browsing and searching. Many a times the data need to be extracted from the web pages so as to facilitate different applications. Also, extracting relevant data alone is a tedious task. Therefore, the availability of robust, flexible extraction methods that transform the Web pages into program-friendly structures such as a relational database has become a great necessity. Although many approaches for data extraction from Web pages have been developed, there has been limited effort to compare such tools. This survey paper mentions some of the techniques for web data extraction.