Designing Question-Answer Based Search System in Libraries: Application of Open Source Retrieval Augmented Generation (RAG) Pipeline

Jhantu Mazumder; Parthasarathi Mukhopadhyay

doi:10.17821/srels/2024/v61i5/171583

Vol 61, No 5 (2024)
Pages: 255-260
Published: 2024-10-21
https://doi.org/10.17821/srels%2F2024%2Fv61i5%2F171583
Cited by 0 articles

Designing Question-Answer Based Search System in Libraries: Application of Open Source Retrieval Augmented Generation (RAG) Pipeline

Jhantu Mazumder , Parthasarathi Mukhopadhyay

Affiliations
1 Department of Library and Information Science, Kalyani University, Kalyani – 741235, West Bengal, India

This study primarily aims to prepare a prototype and demonstrate that libraries can develop a low-cost conversation-al search system using open-source software tools and Large Language Models (LLMs) through a Retrieval-Augmented Generation (RAG) framework. LLMs often hallucinate and provide outdated and non-contextualized responses. However, this experiment shows that LLMs can deliver contextualized, relevant responses when augmented with a set of relevant documents. Augmenting LLMs with relevant documents before generating answers is known as retrieval-augmented gen-eration. The methodology involved creating a RAG pipeline using tools like LangChain, vector databases like ChromaDB, and open-source LLMs like Llama3 (a 70-billion parameter-based model). The prototype developed includes a dataset of 250+ relevant documents on the Chandrayaan-3 mission that was collected, processed, and ingested into the pipe-line. Finally, the study compared responses from standard LLMs and LLMs with RAG augmentation. Key findings revealed that standard LLMs (without RAG) produced confidently incorrect, hallucinated responses against queries related to Chandrayaan-3, while LLMs with RAG consistently provided accurate, informative, and contextualized answers when sup-plied with a set of relevant documents before generating the response. The study concluded that open-source RAG-based systems offer a cost-effective solution for libraries to enhance information retrieval and transform libraries into dynamic information services.

Keywords

Conversational AI, ChatGPT, Gemini, Generative AI, LangChain, Large Language Models (LLMs), Llama3, LlamaIndex, Mistral, NLP, Retrieval Augmented Generation (RAG)

User

About The Authors

Jhantu Mazumder
Department of Library and Information Science, Kalyani University, Kalyani – 741235, West Bengal
India

Parthasarathi Mukhopadhyay
Department of Library and Information Science, Kalyani University, Kalyani – 741235, West Bengal
India

Notifications

Information

Journal Content
Browse

Abstract Views: 70

Designing Question-Answer Based Search System in Libraries: Application of Open Source Retrieval Augmented Generation (RAG) Pipeline

Abstract Views: 70 |

Authors

Jhantu Mazumder
Department of Library and Information Science, Kalyani University, Kalyani – 741235, West Bengal, India

Parthasarathi Mukhopadhyay
Department of Library and Information Science, Kalyani University, Kalyani – 741235, West Bengal, India

Abstract

Keywords

Conversational AI, ChatGPT, Gemini, Generative AI, LangChain, Large Language Models (LLMs), Llama3, LlamaIndex, Mistral, NLP, Retrieval Augmented Generation (RAG)

DOI: https://doi.org/10.17821/srels%2F2024%2Fv61i5%2F171583

Username
Password
Remember me

Username
Password
Remember me

Journal of Information and Knowledge (Formerly SRELS Journal of Information Management)

Designing Question-Answer Based Search System in Libraries: Application of Open Source Retrieval Augmented Generation (RAG) Pipeline

Subscribe/Renew Journal

Keywords

Designing Question-Answer Based Search System in Libraries: Application of Open Source Retrieval Augmented Generation (RAG) Pipeline

Authors

Abstract

Keywords