Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Adaptive Content Based Textual Information Source Prioritization


Affiliations
1 Division of Instrumentation and Control Engineering, Netaji Subhas Institute of Technology, India
2 Division of Computer Engineering, Netaji Subhas Institute of Technology, India
     

   Subscribe/Renew Journal


The world-wide-web offers a posse of textual information sources which are ready to be utilized for several applications. In fact, given the rapidly evolving nature of online data, there is a real risk of information overload unless we continue to develop and refine techniques to meaningfully segregate these information sources. Specifically, there is a dearth of content-oriented and intelligent techniques which can learn from past search experiences and also adapt to a user's specific requirements during her current search. In this paper, we tackle the core issue of prioritizing textual information sources on the basis of the relevance of their content to the central theme that a user is currently exploring. We propose a new Source Prioritization Algorithm that adopts an iterative learning approach to assess the proclivity of given information sources towards a set of user-defined seed words in order to prioritise them. The final priorities obtained serve as initial priorities for the next search request. This serves a dual purpose. Firstly, the system learns incrementally from several users' cumulative search experiences and re-adjusts the source priorities to reflect the acquired knowledge. Secondly, the refreshed source priorities are utilized to direct a user's current search towards more relevant sources while adapting also to the new set of keywords acquired from that user. Experimental results show that the proposed algorithm progressively improves the system's ability to discern between different sources, even in the presence of several random sources. Further, it is able to scale well to identify the augmented information source when a new enriched information source is generated by clubbing existing ones.

Keywords

Textual Information Source Prioritization, Search Engines, Domain Specificity, Term-Source Matrix, Text Information Density.
Subscription Login to verify subscription
User
Notifications
Font Size

Abstract Views: 243

PDF Views: 0




  • Adaptive Content Based Textual Information Source Prioritization

Abstract Views: 243  |  PDF Views: 0

Authors

Nikhil Mitra
Division of Instrumentation and Control Engineering, Netaji Subhas Institute of Technology, India
Nilanjana Goel
Division of Computer Engineering, Netaji Subhas Institute of Technology, India
S. Chakraverty
Division of Computer Engineering, Netaji Subhas Institute of Technology, India
Gurmeet Singh
Division of Computer Engineering, Netaji Subhas Institute of Technology, India

Abstract


The world-wide-web offers a posse of textual information sources which are ready to be utilized for several applications. In fact, given the rapidly evolving nature of online data, there is a real risk of information overload unless we continue to develop and refine techniques to meaningfully segregate these information sources. Specifically, there is a dearth of content-oriented and intelligent techniques which can learn from past search experiences and also adapt to a user's specific requirements during her current search. In this paper, we tackle the core issue of prioritizing textual information sources on the basis of the relevance of their content to the central theme that a user is currently exploring. We propose a new Source Prioritization Algorithm that adopts an iterative learning approach to assess the proclivity of given information sources towards a set of user-defined seed words in order to prioritise them. The final priorities obtained serve as initial priorities for the next search request. This serves a dual purpose. Firstly, the system learns incrementally from several users' cumulative search experiences and re-adjusts the source priorities to reflect the acquired knowledge. Secondly, the refreshed source priorities are utilized to direct a user's current search towards more relevant sources while adapting also to the new set of keywords acquired from that user. Experimental results show that the proposed algorithm progressively improves the system's ability to discern between different sources, even in the presence of several random sources. Further, it is able to scale well to identify the augmented information source when a new enriched information source is generated by clubbing existing ones.

Keywords


Textual Information Source Prioritization, Search Engines, Domain Specificity, Term-Source Matrix, Text Information Density.