Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

A Survey on Visual Cue Based Data Area Identification in Unsupervised Web Data Extraction


Affiliations
1 Bharathiyar College of Engg. and Tech., Karaikal, India
2 Bharathiyar College of Engg. & Tech., Karaikal, India
     

   Subscribe/Renew Journal


STRUCTURED data in Web pages usually contain important information. Such data are often retrieved from underlying databases and displayed in Web pages using fixed templates. In this paper, we call these structured data objects data records. There are two main approaches to data extraction, wrapper induction and automatic extraction. In wrapper induction, a set of data extraction rules are learnt from a set of manually labeled pages. However, manual labeling is labor intensive and time consuming. For different sites or even pages in the same site, manual labeling needs to be repeated because different sites may follow different templates.

Keywords

Data Extraction, Data Record Alignment, Visual Cues.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 249

PDF Views: 2




  • A Survey on Visual Cue Based Data Area Identification in Unsupervised Web Data Extraction

Abstract Views: 249  |  PDF Views: 2

Authors

M. Priya
Bharathiyar College of Engg. and Tech., Karaikal, India
S. Jamuna Rani
Bharathiyar College of Engg. & Tech., Karaikal, India

Abstract


STRUCTURED data in Web pages usually contain important information. Such data are often retrieved from underlying databases and displayed in Web pages using fixed templates. In this paper, we call these structured data objects data records. There are two main approaches to data extraction, wrapper induction and automatic extraction. In wrapper induction, a set of data extraction rules are learnt from a set of manually labeled pages. However, manual labeling is labor intensive and time consuming. For different sites or even pages in the same site, manual labeling needs to be repeated because different sites may follow different templates.

Keywords


Data Extraction, Data Record Alignment, Visual Cues.