Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Machine Learning and Bibliographic Data Universe: Assessing Efficacy of Backend Algorithms in Annif through Retrieval Metrics


Affiliations
1 Department of Library and Information Science, University of Kalyani, Kalyani − 741235, West Bengal, India
     

   Subscribe/Renew Journal


This research study utilizes an open source AI/ML framework named Annif, developed by the National Library of Finland, to explore the feasibility of automated subject indexing. The framework loads the linked open data format of LCSH and trains the model with a comprehensive training dataset comprising MARC records downloaded from different libraries all over the world. It then compares a set of selected machine learning backends of Annif, namely TF-IDF, Omikuji, and Neural Network, against a set of retrieval metrics to measure the suitability of these backends for the bibliographic data universe. The study concludes that the fusion backend in Annif named Neural Network has the potential to provide support for an automated subject indexing system.

Keywords

Annif, Automated Indexing, Machine Learning, NDCG, Neural Network Model, Retrieval Metrics.
User
About The Author

Parthasarathi Mukhopadhyay
Department of Library and Information Science, University of Kalyani, Kalyani − 741235, West Bengal
India


Notifications

  • Machine Learning and Bibliographic Data Universe: Assessing Efficacy of Backend Algorithms in Annif through Retrieval Metrics

Abstract Views: 313  |  PDF Views: 7

Authors

Parthasarathi Mukhopadhyay
Department of Library and Information Science, University of Kalyani, Kalyani − 741235, West Bengal, India

Abstract


This research study utilizes an open source AI/ML framework named Annif, developed by the National Library of Finland, to explore the feasibility of automated subject indexing. The framework loads the linked open data format of LCSH and trains the model with a comprehensive training dataset comprising MARC records downloaded from different libraries all over the world. It then compares a set of selected machine learning backends of Annif, namely TF-IDF, Omikuji, and Neural Network, against a set of retrieval metrics to measure the suitability of these backends for the bibliographic data universe. The study concludes that the fusion backend in Annif named Neural Network has the potential to provide support for an automated subject indexing system.

Keywords


Annif, Automated Indexing, Machine Learning, NDCG, Neural Network Model, Retrieval Metrics.

References





DOI: https://doi.org/10.17821/srels%2F2023%2Fv60i1%2F170891