Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

A Survey on Data Extraction from Web Pages


Affiliations
1 Karunya University, Coimbatore, India
     

   Subscribe/Renew Journal


Internet provides huge amount of information. The amount of information on the web is growing at an astonishing rate. Web can be considered as the largest knowledge base. Web pages contain a lot of information. Extracting data from the web pages are very difficult. This is mainly because of the complex structure of the web pages. And there isn’t any uniformity when the structure of the web page is considered. Due to the lack of any uniform structure of Web information sources, access to this huge collection of information has been limited to browsing and searching. Many a times the data need to be extracted from the web pages so as to facilitate different applications. Also, extracting relevant data alone is a tedious task. Therefore, the availability of robust, flexible extraction methods that transform the Web pages into program-friendly structures such as a relational database has become a great necessity. Although many approaches for data extraction from Web pages have been developed, there has been limited effort to compare such tools. This survey paper mentions some of the techniques for web data extraction.

Keywords

Semi-Structured Data, Data Extraction, Web Database, Web Mining, Wrapper Generation.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 238

PDF Views: 3




  • A Survey on Data Extraction from Web Pages

Abstract Views: 238  |  PDF Views: 3

Authors

Deepa John
Karunya University, Coimbatore, India
G. Naveen Sundar
Karunya University, Coimbatore, India

Abstract


Internet provides huge amount of information. The amount of information on the web is growing at an astonishing rate. Web can be considered as the largest knowledge base. Web pages contain a lot of information. Extracting data from the web pages are very difficult. This is mainly because of the complex structure of the web pages. And there isn’t any uniformity when the structure of the web page is considered. Due to the lack of any uniform structure of Web information sources, access to this huge collection of information has been limited to browsing and searching. Many a times the data need to be extracted from the web pages so as to facilitate different applications. Also, extracting relevant data alone is a tedious task. Therefore, the availability of robust, flexible extraction methods that transform the Web pages into program-friendly structures such as a relational database has become a great necessity. Although many approaches for data extraction from Web pages have been developed, there has been limited effort to compare such tools. This survey paper mentions some of the techniques for web data extraction.

Keywords


Semi-Structured Data, Data Extraction, Web Database, Web Mining, Wrapper Generation.