Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Machine Learning and Bibliographic Data Universe: Assessing Efficacy of Backend Algorithms in Annif through Retrieval Metrics


Affiliations
1 Department of Library and Information Science, University of Kalyani, Kalyani − 741235, West Bengal, India
     

   Subscribe/Renew Journal


This research study utilizes an open source AI/ML framework named Annif, developed by the National Library of Finland, to explore the feasibility of automated subject indexing. The framework loads the linked open data format of LCSH and trains the model with a comprehensive training dataset comprising MARC records downloaded from different libraries all over the world. It then compares a set of selected machine learning backends of Annif, namely TF-IDF, Omikuji, and Neural Network, against a set of retrieval metrics to measure the suitability of these backends for the bibliographic data universe. The study concludes that the fusion backend in Annif named Neural Network has the potential to provide support for an automated subject indexing system.

Keywords

Annif, Automated Indexing, Machine Learning, NDCG, Neural Network Model, Retrieval Metrics.
User
About The Author

Parthasarathi Mukhopadhyay
Department of Library and Information Science, University of Kalyani, Kalyani − 741235, West Bengal
India


Notifications

  • Anderson, J. D., & Pérez-Carballo, J. (2001). The nature of indexing: How humans and machines analyze messages and texts for retrieval. Part I: Research, and the nature of human indexing. Information Processing & Management, 37(2), 231-254. https://doi.org/10.1016/S0306-4573(00)00026-1
  • Asula, M., Makke, J., Freienthal, L., Kuulmets, H.-A., & Sirel, R. (2021). Kratt: Developing an automatic subject indexing tool for the national library of Estonia. Cataloging and Classification Quarterly, 59(8), 775-793. https://doi.org/10.1080/01639374.2021.1998283
  • Frank, E., & Paynter, G. W. (2004). Predicting Library of Congress classifications from Library of Congress subject headings. Journal of the American Society for Information Science and Technology, 55(3), 214-227. https://doi.org/10.1002/asi.10360
  • Golub, K. (2021). Automated subject indexing: An overview. Cataloging and Classification Quarterly, 59(8), 702-719. https://doi.org/10.1080/01639374.2021.2012311
  • Hahn, J. (2021). Semi-automated methods for bibframe work entity description. Cataloging and Classification Quarterly, 59(8), 853-867. https://doi.org/10.1080/0163 9374.2021.2014011
  • Hahn, J. (2022). Cataloger acceptance and use of semiautomated subject recommendations for web scale linked data systems. IFLA WLIC, 2022, 10. Available from: https://repository.ifla.org/bitstream/123456789/1955/1/062-hahn-en.pdf
  • Joorabchi, A., & Mahdi, A. E. (2013). Classification of scientific publications according to library controlled vocabularies: A new concept matching-based approach. Library Hi Tech, 31(4). https://doi.org/10.1108/LHT-03-2013-0030
  • Junger, U. (2017). Automation first - the subject cataloguing policy of the Deutsche Nationalbibliothek. Available from: http://library.ifla.org/id/eprint/2213/
  • Möller, G., Carstensen, K.-U., Diekmann, B., & Wätjen, H. (1999). Automatic classification of the world-wide web using the universal decimal classification. National Agricultural Library. (2014). NFAIS Webinar: automated indexing: A case study from the national agricultural library | ISSN. https://www.issn.org/ newsletter_issn/nfais-webinar-automated-indexing-acase-study-from-the-national-agricultural-library/
  • National Library of Medicine (NLM). (2002). NLM Medical Text Indexer (MTI). https://lhncbc.nlm.nih.gov/ii/tools/MTI.html
  • Mukhopadhyay, P. (2022). AI/ML applications for knowledge organization in libraries: Designing a semi-automated subject indexing system based on LCSH. In M. Visakaruban et al., (Eds.), Proceedings of the Etakam Research Conference: enhancing library system to best engage with Global Change 2022 (ERC 2022) (pp. 11-19). University of Jafna.
  • OCLC. (2022, June 8). Scorpion. OCLC. Available from: https://www.oclc.org/research/activities/scorpion.html
  • Oliver, C. (2021). Leveraging KOS to extend our reach with automated processes. Cataloging and Classification Quarterly, 59(8), 868-874. https://doi.org/10.1080/01639374.2021.2023717
  • Silvester, J. P. (1997). Computer supported indexing: A history and evaluation of NASA’s MAI system. Supplement 24. Undefined. Available from: https://www.semanticscholar.org/paper/Computer-Supported-Indexing%3A-A-History-and-of-MAI-Silvester/14e5f28 26fd1bd245edaf26a67c5f696a65b5032
  • Suominen, O. (2019). Annif: DIY automated subject indexing using multiple algorithms. LIBER Quarterly: The Journal of the Association of European Research Libraries, 29(1), 1-25. https://doi.org/10.18352/lq.10285
  • Suominen, O., & Koskenniemi, I. (2022). Annif analyzer shootout: Comparing text lemmatization methods for automated subject indexing. The Code4Lib Journal, 54. Available from: https://journal.code4lib.org/articles/16719
  • Suominen, O., Inkinen, J., & Lehtinen, M. (2022). Annif and Finto AI: Developing and implementing automated subject indexing. JLIS.It, 13(1), 265-282. https://doi.org/10.4403/jlis.it-12740
  • Toepfer, M., & Seifert, C. (2020). Fusion architectures for automatic subject indexing under concept drift: Analysis and empirical results on short texts. International Journal on Digital Libraries, 21, 169-189. https://doi.org/10.1007/s00799-018-0240-3

Abstract Views: 250

PDF Views: 7




  • Machine Learning and Bibliographic Data Universe: Assessing Efficacy of Backend Algorithms in Annif through Retrieval Metrics

Abstract Views: 250  |  PDF Views: 7

Authors

Parthasarathi Mukhopadhyay
Department of Library and Information Science, University of Kalyani, Kalyani − 741235, West Bengal, India

Abstract


This research study utilizes an open source AI/ML framework named Annif, developed by the National Library of Finland, to explore the feasibility of automated subject indexing. The framework loads the linked open data format of LCSH and trains the model with a comprehensive training dataset comprising MARC records downloaded from different libraries all over the world. It then compares a set of selected machine learning backends of Annif, namely TF-IDF, Omikuji, and Neural Network, against a set of retrieval metrics to measure the suitability of these backends for the bibliographic data universe. The study concludes that the fusion backend in Annif named Neural Network has the potential to provide support for an automated subject indexing system.

Keywords


Annif, Automated Indexing, Machine Learning, NDCG, Neural Network Model, Retrieval Metrics.

References





DOI: https://doi.org/10.17821/srels%2F2023%2Fv60i1%2F170891