Open Access Open Access  Restricted Access Subscription Access

Designing of an Efficient Algorithm for Identifying Abbreviation Definitions in Biomedical Text


Affiliations
1 IBM India Private limited, Bangalore, India
2 School of Biochemical Engineering, IIT (BHU), Varanasi-221005, India
3 NIT, Calicut, Kerala, India
 

The size and growth rate of biomedical literature creates new challenges for researchers who need to keep up to date. The objective of the present study was to design a pattern matching method for mining acronyms and their definitions from biomedical text by considering the space reduction heuristic constraints have been proposed and implemented. The constraints mentioned are spacious-reduction heuristic constraints which will reduce the search space and will extract most of the true positive cases. The evaluation has been done on MEDLINE abstracts. The results show that the proposed algorithm is faster and more efficient than the previous approaches, in term of space and time complexities. The algorithm has a very good Recall (92%), Precision (97%) and F-factor (94%). One improvement that can be done is to consider all kinds of acronyms definition patterns. This algorithm only considers acronym−definition pairs of the form Acronym (Definition) Definition (Acronym) pairs. Improving the algorithm requires additional study and may reduce the precision even though it may increase the recall. The Algorithm is space efficient too. Input text of any large size can be mined using this algorithm because it requires less memory space to execute.

Keywords

Biomedical Text, Medline, Recall, Precision, F-Factor, Acronym.
User
Notifications
Font Size

Abstract Views: 344

PDF Views: 191




  • Designing of an Efficient Algorithm for Identifying Abbreviation Definitions in Biomedical Text

Abstract Views: 344  |  PDF Views: 191

Authors

Shashank Singh
IBM India Private limited, Bangalore, India
Gaurav Sharma
School of Biochemical Engineering, IIT (BHU), Varanasi-221005, India
K. A. Abdul Nazeer
NIT, Calicut, Kerala, India
Shalini Singh
School of Biochemical Engineering, IIT (BHU), Varanasi-221005, India

Abstract


The size and growth rate of biomedical literature creates new challenges for researchers who need to keep up to date. The objective of the present study was to design a pattern matching method for mining acronyms and their definitions from biomedical text by considering the space reduction heuristic constraints have been proposed and implemented. The constraints mentioned are spacious-reduction heuristic constraints which will reduce the search space and will extract most of the true positive cases. The evaluation has been done on MEDLINE abstracts. The results show that the proposed algorithm is faster and more efficient than the previous approaches, in term of space and time complexities. The algorithm has a very good Recall (92%), Precision (97%) and F-factor (94%). One improvement that can be done is to consider all kinds of acronyms definition patterns. This algorithm only considers acronym−definition pairs of the form Acronym (Definition) Definition (Acronym) pairs. Improving the algorithm requires additional study and may reduce the precision even though it may increase the recall. The Algorithm is space efficient too. Input text of any large size can be mined using this algorithm because it requires less memory space to execute.

Keywords


Biomedical Text, Medline, Recall, Precision, F-Factor, Acronym.