Linguistic Analysis and Extraction Tool for Online News Articles

Vijayta Patil

Linguistic Analysis and Extraction Tool for Online News Articles

Vijayta Patil

Affiliations
1 Gyanganga Institute of Technology & Management, Bhopal, India

Subscribe/Renew Journal

Abstract
References
Article Metrics
Refbacks

Information extraction has become an important technology to help users locate desired information on the Web. Designing a generalized method for extracting Web information is complicated due to the heterogeneity of Web information. Because of this, domain specific characteristics are often considered for effective Web information extracting. One such domain is on-line news websites. With thousands of new websites to provide daily news in today’s Web, it is critical to provide a tool that can automatically extract online news information for users. Most of previous approaches use manually or automatically constructed wrappers to extract news information. Several problems exist in previous approaches for online news extraction which requires a training stage to derive software. Extraction results may not be satisfactory when training set is too small. Second, even with these prerequisites satisfied, the extraction results may still be unstable and domain/site dependent. The motivation of our research is to identify and recognize news content, and provide an effective news extraction algorithm that is stable across any presentation designs and news domains.