Open Access Open Access  Restricted Access Subscription Access

An Algorithm To Self - Extract Secondary Keywords and Their Combinations Based On Abstracts Collcted Using Primary Keywords From Online Digital Libraries


Affiliations
1 Department of Computer Science, Jackson State University, 1400 John Lynch St, Jackson, MS 39217, United States
2 Department of Biology, Jackson State University, 1400 John Lynch St, Jackson, MS 39217, United States
 

The high-level contribution of this paper is the development and implementation of an algorithm to selfextract secondary keywords and their combinations (combo words) based on abstracts collected using standard primary keywords for research areas from reputed online digital libraries like IEEE Explore, PubMed Central and etc. Given a collection of N abstracts, we arbitrarily select M abstracts (M<< N; M/N as low as 0.15) and parse each of the M abstracts, word by word. Upon the first-time appearance of a word, we query the user for classifying the word into an Accept-List or non-Accept-List. The effectiveness of the training approach is evaluated by measuring the percentage of words for which the user is queried for classification when the algorithm parses through the words of each of the M abstracts. We observed that as M grows larger, the percentage of words for which the user is queried for classification reduces drastically. After the list of acceptable words is built by parsing the M abstracts, we now parse all the N abstracts, word by word, and count the frequency of appearance of each of the words in Accept-List in these N abstracts. We also construct a Combo-Accept-List comprising of all possible combinations of the single keywords in Accept-List and parse all the N abstracts, two successive words (combo word) at a time, and count the frequency of appearance of each of the combo words in the Combo-Accept-List in these N abstracts.

Keywords

Self-Extraction, Abstracts, Secondary Keywords, Combo Keywords, Frequency, Training.
User
Notifications
Font Size

Abstract Views: 391

PDF Views: 169




  • An Algorithm To Self - Extract Secondary Keywords and Their Combinations Based On Abstracts Collcted Using Primary Keywords From Online Digital Libraries

Abstract Views: 391  |  PDF Views: 169

Authors

Natarajan Meghanathan
Department of Computer Science, Jackson State University, 1400 John Lynch St, Jackson, MS 39217, United States
Nataliya Kostyuk
Department of Biology, Jackson State University, 1400 John Lynch St, Jackson, MS 39217, United States
Raphael Isokpehi
Department of Biology, Jackson State University, 1400 John Lynch St, Jackson, MS 39217, United States
Hari Cohly
Department of Biology, Jackson State University, 1400 John Lynch St, Jackson, MS 39217, United States

Abstract


The high-level contribution of this paper is the development and implementation of an algorithm to selfextract secondary keywords and their combinations (combo words) based on abstracts collected using standard primary keywords for research areas from reputed online digital libraries like IEEE Explore, PubMed Central and etc. Given a collection of N abstracts, we arbitrarily select M abstracts (M<< N; M/N as low as 0.15) and parse each of the M abstracts, word by word. Upon the first-time appearance of a word, we query the user for classifying the word into an Accept-List or non-Accept-List. The effectiveness of the training approach is evaluated by measuring the percentage of words for which the user is queried for classification when the algorithm parses through the words of each of the M abstracts. We observed that as M grows larger, the percentage of words for which the user is queried for classification reduces drastically. After the list of acceptable words is built by parsing the M abstracts, we now parse all the N abstracts, word by word, and count the frequency of appearance of each of the words in Accept-List in these N abstracts. We also construct a Combo-Accept-List comprising of all possible combinations of the single keywords in Accept-List and parse all the N abstracts, two successive words (combo word) at a time, and count the frequency of appearance of each of the combo words in the Combo-Accept-List in these N abstracts.

Keywords


Self-Extraction, Abstracts, Secondary Keywords, Combo Keywords, Frequency, Training.