Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

A Spam Detection Study of Tweets in Indian Healthcare


Affiliations
1 Institute of Engineering and Management, Kolkata, India
2 University of Calcutta, India
     

   Subscribe/Renew Journal


One of the rapidly growing social network, twitter has been infiltrated by large amounts of spam. Twitter has many potential applications across diverse areas, however the signal to noise ratio is very high because of spam, which is a major obstacle of meaningful analysis and action. It is a well-studied problem in emails; however, for tweets, it is relatively less researched. In this paper we have a set up a focused study consisting of nearly 5000 Tweets related to Indian Healthcare. An extensive study has been conducted where six classifiers have been evaluated and compared for spam detection. A simple term frequency based feature selection technique has been shown to reduce the model building time significantly. Ensemble method based on top five classifiers improve the accuracy as well as the stability of the results.

Keywords

Spam Detection, Twitter, Healthcare, Ensemble Learning.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 230

PDF Views: 3




  • A Spam Detection Study of Tweets in Indian Healthcare

Abstract Views: 230  |  PDF Views: 3

Authors

Sramana Mukherjee
Institute of Engineering and Management, Kolkata, India
Arijit Sarkar
Institute of Engineering and Management, Kolkata, India
Saptarsi Goswami
Institute of Engineering and Management, Kolkata, India
Amit Kumar Das
University of Calcutta, India

Abstract


One of the rapidly growing social network, twitter has been infiltrated by large amounts of spam. Twitter has many potential applications across diverse areas, however the signal to noise ratio is very high because of spam, which is a major obstacle of meaningful analysis and action. It is a well-studied problem in emails; however, for tweets, it is relatively less researched. In this paper we have a set up a focused study consisting of nearly 5000 Tweets related to Indian Healthcare. An extensive study has been conducted where six classifiers have been evaluated and compared for spam detection. A simple term frequency based feature selection technique has been shown to reduce the model building time significantly. Ensemble method based on top five classifiers improve the accuracy as well as the stability of the results.

Keywords


Spam Detection, Twitter, Healthcare, Ensemble Learning.