Open Access Open Access  Restricted Access Subscription Access

Automated Stopwords Identification in Punjabi Documents


Affiliations
1 Punjab Technical University, Kapurthala Road, Jalandhar, India
2 Dept of Comp. Sc, Punjabi University, Patiala, India
 

Many information retrieval tasks deal with the classification of huge amount of data before giving final results. The data being processed in IR tasks may or may not be useful for the researchers. There has to be some method to identify such data (called stop words) and remove it from data set before beginning with the IR task. This gives dual benefits – Reducing the overall vector space, thereby leading to performance improvements in terms of execution speed and the relevance of results. The purpose of this paper is to find a suitable, automated method for identification of stop words in Punjabi Text.

Keywords

Punjabi Stop Words List, Statistical Modeling, Borda Count, Information Processing, Text Classification.
User
Notifications
Font Size

Abstract Views: 129

PDF Views: 1




  • Automated Stopwords Identification in Punjabi Documents

Abstract Views: 129  |  PDF Views: 1

Authors

Rajeev Puri
Punjab Technical University, Kapurthala Road, Jalandhar, India
R. P. S. Bedi
Punjab Technical University, Kapurthala Road, Jalandhar, India
Vishal Goyal
Dept of Comp. Sc, Punjabi University, Patiala, India

Abstract


Many information retrieval tasks deal with the classification of huge amount of data before giving final results. The data being processed in IR tasks may or may not be useful for the researchers. There has to be some method to identify such data (called stop words) and remove it from data set before beginning with the IR task. This gives dual benefits – Reducing the overall vector space, thereby leading to performance improvements in terms of execution speed and the relevance of results. The purpose of this paper is to find a suitable, automated method for identification of stop words in Punjabi Text.

Keywords


Punjabi Stop Words List, Statistical Modeling, Borda Count, Information Processing, Text Classification.