Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Automatic Extraction of Keywords from Web Resources


Affiliations
1 Department of Information Science, University of Madras, Chennai 600 005, India
     

   Subscribe/Renew Journal


This paper describes an experiment for the automatic identification and extraction of keywords from web resources, specifically HTML document. A natural language parser capable of extracting Iceywords from HTML texts in a domain would be useful in analyzing document content to support and facilitate information retrieval. A computer program - Autolndex -was developed to process and extract keywords from HTML texts. Evaluation shows that Autolndex works fairly well in terms of identification, recall and accuracy. The processing speed of the software is also at acceptable level.
User
Subscription Login to verify subscription
Notifications
Font Size

  • Chtn,}i&nchun. Medical Text Mining: A DLI-2 Status Report.2001.
  • Teach, Daniel. Text mining Technology turning information into knowledge: a white paper from IBM, The Forrester Report, 1995.
  • Mukhopdhyay, Bikash; Mukhopadhyay, Sripati. Text mining techniques for analyzing news from Internet and New paper. In Library Progess (International). Vol.23 (1); 2003; p35-41
  • Neelameghan, A. Content Summarization and Indexing of Texts: use of text mining software. In Information Management: trends and issues (Festschrift in honour of Prof S. Seetharama). 2003, p203-2l0.
  • Mukhopdhyay, B. The Role of Text mining in the document warehousing, In SIS 2004, Digital Information Exchange: pathways to build global information society, papers presented at the 22nd Annual Convention and Conference IIT Madras, Chennai, 22-23 January, 2004. p96-101
  • Panigrahi, Pijushkanti; Prasad, A.R.D; Basu, A. NLP based Automatic classification system for Analytico synthetic Scheme, In SRELS Journal of Information Management. Vol. 40(4); December 2003; p289-312.

Abstract Views: 327

PDF Views: 0




  • Automatic Extraction of Keywords from Web Resources

Abstract Views: 327  |  PDF Views: 0

Authors

G. Velumani
Department of Information Science, University of Madras, Chennai 600 005, India
K. S. Raghavan
Department of Information Science, University of Madras, Chennai 600 005, India

Abstract


This paper describes an experiment for the automatic identification and extraction of keywords from web resources, specifically HTML document. A natural language parser capable of extracting Iceywords from HTML texts in a domain would be useful in analyzing document content to support and facilitate information retrieval. A computer program - Autolndex -was developed to process and extract keywords from HTML texts. Evaluation shows that Autolndex works fairly well in terms of identification, recall and accuracy. The processing speed of the software is also at acceptable level.

References