Open Access
Subscription Access
Open Access
Subscription Access
A Survey on Visual Cue Based Data Area Identification in Unsupervised Web Data Extraction
Subscribe/Renew Journal
STRUCTURED data in Web pages usually contain important information. Such data are often retrieved from underlying databases and displayed in Web pages using fixed templates. In this paper, we call these structured data objects data records. There are two main approaches to data extraction, wrapper induction and automatic extraction. In wrapper induction, a set of data extraction rules are learnt from a set of manually labeled pages. However, manual labeling is labor intensive and time consuming. For different sites or even pages in the same site, manual labeling needs to be repeated because different sites may follow different templates.
Keywords
Data Extraction, Data Record Alignment, Visual Cues.
User
Subscription
Login to verify subscription
Font Size
Information
Abstract Views: 249
PDF Views: 2