Open Access
Subscription Access
Open Access
Subscription Access
Linguistic Analysis and Extraction Tool for Online News Articles
Subscribe/Renew Journal
Information extraction has become an important technology to help users locate desired information on the Web. Designing a generalized method for extracting Web information is complicated due to the heterogeneity of Web information. Because of this, domain specific characteristics are often considered for effective Web information extracting. One such domain is on-line news websites. With thousands of new websites to provide daily news in today’s Web, it is critical to provide a tool that can automatically extract online news information for users. Most of previous approaches use manually or automatically constructed wrappers to extract news information. Several problems exist in previous approaches for online news extraction which requires a training stage to derive software. Extraction results may not be satisfactory when training set is too small. Second, even with these prerequisites satisfied, the extraction results may still be unstable and domain/site dependent. The motivation of our research is to identify and recognize news content, and provide an effective news extraction algorithm that is stable across any presentation designs and news domains.
Keywords
Feature Extraction, Localtion Named Entity, MINIPAR Parser, Sentence Level Classification, Subject Named Entity.
User
Subscription
Login to verify subscription
Font Size
Information
Abstract Views: 242
PDF Views: 4