Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Experiments with Different Indexing Techniques for Text Retrieval Tasks on Gujarati Language using Bag-Of-Words Approach


Affiliations
1 Department of Computer Science, Gujarat University, India
2 LogiCeil Solutions, Ahmedabad, India
3 Canada Technology Partners Ltd., India
     

   Subscribe/Renew Journal


This paper presents results of various experiments carried out to improve text retrieval of gujarati text documents. Text retrieval involves searching and ranking of text documents for a given set of query terms. We have tested various retrieval models that uses bag-of-words approach. Bag-of-words approach is a traditional approach that is being used till date where the text document is represented as collection of words. Measures like frequency count, inverse document frequency etc. are used to signify and rank relevant documents for user queries. Different ranking models have been used to quantify ranking performance using the metric of mean average precision. Gujarati is a morphologically rich language, we have compared techniques like stop word removal, stemming and frequent case generation against baseline to measure the improvements in information retrieval tasks. Most of the techniques are language dependent and requires development of language specific tools. We used plain unprocessed word index as the baseline, we have seen significant improvements in comparison of MAP values after applying different indexing techniques when compared to the baseline.


Keywords

Information Retrieval (IR), Frequent Case Generation (FCG), Gujarati Language, Mean Average Precision (MAP), Stemming, Stop Words, Text Mining, Text Retrieval.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 265

PDF Views: 1




  • Experiments with Different Indexing Techniques for Text Retrieval Tasks on Gujarati Language using Bag-Of-Words Approach

Abstract Views: 265  |  PDF Views: 1

Authors

Jyoti Pareek
Department of Computer Science, Gujarat University, India
Hardik Joshi
Department of Computer Science, Gujarat University, India
Krunal Chauhan
LogiCeil Solutions, Ahmedabad, India
Rushikesh Patel
Canada Technology Partners Ltd., India

Abstract


This paper presents results of various experiments carried out to improve text retrieval of gujarati text documents. Text retrieval involves searching and ranking of text documents for a given set of query terms. We have tested various retrieval models that uses bag-of-words approach. Bag-of-words approach is a traditional approach that is being used till date where the text document is represented as collection of words. Measures like frequency count, inverse document frequency etc. are used to signify and rank relevant documents for user queries. Different ranking models have been used to quantify ranking performance using the metric of mean average precision. Gujarati is a morphologically rich language, we have compared techniques like stop word removal, stemming and frequent case generation against baseline to measure the improvements in information retrieval tasks. Most of the techniques are language dependent and requires development of language specific tools. We used plain unprocessed word index as the baseline, we have seen significant improvements in comparison of MAP values after applying different indexing techniques when compared to the baseline.


Keywords


Information Retrieval (IR), Frequent Case Generation (FCG), Gujarati Language, Mean Average Precision (MAP), Stemming, Stop Words, Text Mining, Text Retrieval.