Automatic Extraction of Significant Terms From the Title and Abstract of Scientific Papers Using the Machine Learning Algorithm: A Multiple Module Approach

Bhaskar Mukherjee; Debasis Majhi

Automatic Extraction of Significant Terms From the Title and Abstract of Scientific Papers Using the Machine Learning Algorithm: A Multiple Module Approach

Affiliations
1 Professor, Department of Library & Information Science, Banaras Hindu University, Varanasi., India
2 Junior Research Fellow, Department of Library & Information Science, Banaras Hindu University, Varanasi., India

Keyword extraction is the task of identifying important terms or phrase that are most representative of the source document. Although the process of automatic extraction of keywords from title is an old method, it was mainly for extraction from a single web document. Our approach differs from previous research works on keyword extraction in several aspects. For those who are non-expert of the scientific fields, understating scientific research trends is difficult. The purpose of this study is to develop an automatic method of obtaining overviews of a scientific field for non-experts by capturing research trends. This empirical study excavates significant term extraction using Natural Language Processing (NLP) tools. More than 15000 titles saved in a .csv file was our dataset and scripts written in Python were our process to compare how far significant terms of scientific title corpus are similar or different to the terms available in the abstract of that same scientific article corpus. A light-weight unsupervised title extractor, Yet Another Keyword Extractor (YAKE) was used to extract the results. Based on our analysis, it can be concluded that these algorithms can be used for other fields too by the non-experts of that subject field to perform automatic extraction of significant words and understanding trends. Our algorithm could be a solution to reduce the labour-intensive manual indexing process.

Keywords

Data mining, Title extraction, Natural Language Processing, YAKE, NLTK, Keyword Extraction-NLP.

I-Scholar

Journal Help

User

Notifications

Journal Content
Browse

Font Size

Information

Automatic Extraction of Significant Terms From the Title and Abstract of Scientific Papers Using the Machine Learning Algorithm: A Multiple Module Approach

Abstract Views: 266 | PDF Views: 136

Authors

Bhaskar Mukherjee
Professor, Department of Library & Information Science, Banaras Hindu University, Varanasi., India

Debasis Majhi
Junior Research Fellow, Department of Library & Information Science, Banaras Hindu University, Varanasi., India

Abstract

Keywords

Data mining, Title extraction, Natural Language Processing, YAKE, NLTK, Keyword Extraction-NLP.

Annals of Library and Information Studies

Automatic Extraction of Significant Terms From the Title and Abstract of Scientific Papers Using the Machine Learning Algorithm: A Multiple Module Approach

Keywords

Automatic Extraction of Significant Terms From the Title and Abstract of Scientific Papers Using the Machine Learning Algorithm: A Multiple Module Approach

Authors

Abstract

Keywords

References

Username
Password
Remember me

Username
Password
Remember me