Open Access Open Access  Restricted Access Subscription Access

Language Independent Document Retrieval Using Unicode Standard


Affiliations
1 Department of Computer Science, University of Kerala, Thiruvananthapuram, Kerala, India
 

In this paper, we presented a method to retrieve documents with unstructured text data written in different languages. Apart from the ordinary document retrieval systems, the proposed system can also process queries with terms in more than one language. Unicode, the universally accepted encoding standard is used to present the data in a common platform while converting the text data into Vector Space Model. We got notable F measure values in the experiments irrespective of languages used in documents and queries.

Keywords

Language Independent Searching, Information Retrieval, Multilingual Searching, Unicode, QR Factorization, Vector Space Model.
User
Notifications
Font Size

Abstract Views: 310

PDF Views: 154




  • Language Independent Document Retrieval Using Unicode Standard

Abstract Views: 310  |  PDF Views: 154

Authors

M. Vidhya
Department of Computer Science, University of Kerala, Thiruvananthapuram, Kerala, India
S. Aji
Department of Computer Science, University of Kerala, Thiruvananthapuram, Kerala, India

Abstract


In this paper, we presented a method to retrieve documents with unstructured text data written in different languages. Apart from the ordinary document retrieval systems, the proposed system can also process queries with terms in more than one language. Unicode, the universally accepted encoding standard is used to present the data in a common platform while converting the text data into Vector Space Model. We got notable F measure values in the experiments irrespective of languages used in documents and queries.

Keywords


Language Independent Searching, Information Retrieval, Multilingual Searching, Unicode, QR Factorization, Vector Space Model.