Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

A Survey on Pre-Processing Techniques for Text Mining


Affiliations
1 Marwadi Education Foundation Group of Institutions, Gujarat Technological University, Ahmedabad, Gujarat, India
2 Department of Computer Engineering, Marwadi Education Foundation Group of Institutions, Gujarat Technological University, Rajkot, India
     

   Subscribe/Renew Journal


Text mining is the process of obtaining interesting patterns or knowledge from text documents. The most often used type of data in the WWW is text. Text mining is used to extract interesting knowledge from unstructured text data. Pre-processing is a very important phase in the text mining process. Text mining framework includes two components, text refining and knowledge distillation. This paper is about pre-processing for text mining in English and Gujarati language. There is very less work done for text mining in Gujarati language. It is very challenging task as Gujarati is very rich in morphology, it gives rise to a very large number of word forms and feature spaces. Some pre-processing techniques in Gujarati are introduced in this paper.

Keywords

Pre-Processing, Stop-Words, Stemming, Text Mining.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 251

PDF Views: 2




  • A Survey on Pre-Processing Techniques for Text Mining

Abstract Views: 251  |  PDF Views: 2

Authors

Manthan J. Vyas
Marwadi Education Foundation Group of Institutions, Gujarat Technological University, Ahmedabad, Gujarat, India
Sanjay D. Bhanderi
Department of Computer Engineering, Marwadi Education Foundation Group of Institutions, Gujarat Technological University, Rajkot, India

Abstract


Text mining is the process of obtaining interesting patterns or knowledge from text documents. The most often used type of data in the WWW is text. Text mining is used to extract interesting knowledge from unstructured text data. Pre-processing is a very important phase in the text mining process. Text mining framework includes two components, text refining and knowledge distillation. This paper is about pre-processing for text mining in English and Gujarati language. There is very less work done for text mining in Gujarati language. It is very challenging task as Gujarati is very rich in morphology, it gives rise to a very large number of word forms and feature spaces. Some pre-processing techniques in Gujarati are introduced in this paper.

Keywords


Pre-Processing, Stop-Words, Stemming, Text Mining.