A Spam Detection Study of Tweets in Indian Healthcare

Sramana Mukherjee; Arijit Sarkar; Saptarsi Goswami; Amit Kumar Das

A Spam Detection Study of Tweets in Indian Healthcare

Sramana Mukherjee ¹, Arijit Sarkar ¹, Saptarsi Goswami ¹, Amit Kumar Das ²

Affiliations
1 Institute of Engineering and Management, Kolkata, India
2 University of Calcutta, India

Subscribe/Renew Journal

Abstract
References
Article Metrics
Refbacks

One of the rapidly growing social network, twitter has been infiltrated by large amounts of spam. Twitter has many potential applications across diverse areas, however the signal to noise ratio is very high because of spam, which is a major obstacle of meaningful analysis and action. It is a well-studied problem in emails; however, for tweets, it is relatively less researched. In this paper we have a set up a focused study consisting of nearly 5000 Tweets related to Indian Healthcare. An extensive study has been conducted where six classifiers have been evaluated and compared for spam detection. A simple term frequency based feature selection technique has been shown to reduce the model building time significantly. Ensemble method based on top five classifiers improve the accuracy as well as the stability of the results.