Open Access Open Access  Restricted Access Subscription Access

Rule-Based Metadata Extraction for Heterogeneous References


Affiliations
1 Department of Computer Science and Technology, School of Electronics Engineering and Computer Science, Peking University, Beijing, 100 871, China
 

References form an essential part of electronic scholarly publications. Accurate and automatic reference metadata generation provides scalability, interoperability and usability for digital libraries and their collections. This paper deals with automatic metadata extraction from the references of general digital documents using rule-based approach. It encompasses automatic extraction of metadata from book and journal references. The system consists of four major components: a means of providing reference input (by uploading the file or providing the set of references in the window provided by the browser), the text converter for converting documents into standard text format, the parser for automatically extracting metadata such as reference style, author, title, journal, volume, number (issue), year, and page information and author, title, publisher, place of publication, year and pages information from book and journal references of the converted documents using pre-defined regular expressions, and the browser for displaying the results. The experimental results show that the proposed framework can be used to extract metadata from different reference styles of book and journal references effectively.

Keywords

Metadata, Implementation, Experiment, References, Digital Libraries, Information Extraction.
User
Notifications
Font Size

Abstract Views: 203

PDF Views: 0




  • Rule-Based Metadata Extraction for Heterogeneous References

Abstract Views: 203  |  PDF Views: 0

Authors

Bolanle Ojokoh
Department of Computer Science and Technology, School of Electronics Engineering and Computer Science, Peking University, Beijing, 100 871, China

Abstract


References form an essential part of electronic scholarly publications. Accurate and automatic reference metadata generation provides scalability, interoperability and usability for digital libraries and their collections. This paper deals with automatic metadata extraction from the references of general digital documents using rule-based approach. It encompasses automatic extraction of metadata from book and journal references. The system consists of four major components: a means of providing reference input (by uploading the file or providing the set of references in the window provided by the browser), the text converter for converting documents into standard text format, the parser for automatically extracting metadata such as reference style, author, title, journal, volume, number (issue), year, and page information and author, title, publisher, place of publication, year and pages information from book and journal references of the converted documents using pre-defined regular expressions, and the browser for displaying the results. The experimental results show that the proposed framework can be used to extract metadata from different reference styles of book and journal references effectively.

Keywords


Metadata, Implementation, Experiment, References, Digital Libraries, Information Extraction.