Sentiment Analysis of Twitter Data In Hadoop Using Naive Bayes And Fuzzy C Means Clustering
Social networking is one of the main platforms responsible for the increased amounts of data generation from the users. People across different geographical regions share their thoughts and put their opinions on the micro-blogging sites. One of the most popular micro blogging sites is Twitter where people share their reviews in the form of tweets. Due to concise and short limits of tweets, it is easier to analyze and thus extract valuable outcomes from tweets. The tweets also provide varied content of sentiments and opinions about the current technologies and affairs. Sentiment analysis is the process of analyzing various opinions and reviews given by people. Sentiment Analysis is the process which tends to understand these opinions and categorize them into positive, negative and neutral categories.
In this paper, the authors propose a concept for sentiment analysis that will help to classify various tweets on the basis of sentiment polarity. The streaming dataset from twitter will be stored in HDFS clusters which will be mined later using Naïve Bayes and Fuzzy C Means Algorithms to improve scalability and accuracy of various performance metrics and thus help any organization to formulate various strategies to promote their work process.
Keywords
- Jansen, B.J., M. Zhang, K. Sobel, and A. Chowdury, “Twitter power: Tweets as electronic word of mouth”, Journal of the American Society for Information Science and Technology 60:2169–2188,2009.
- Saif, H., Y. He., H. Alani, “Semantic sentiment analysis of twitter”, In The Semantic Web–ISWC, Springer, 508–524, 2012.
- Gaurav D Rajurkar, Rajeshwari M Goudar, "A speedy data uploading approach for Twitter Trend And Sentiment Analysis using HADOOP", HADOOP, 2015 International Conference on Computing Communication Control and Automation
- Amolik, A., Jivane, N., Bhandari, M., & Venkatesan, Dr. M, “Twitter sentiment Analysis of Movie Reviews using Machine Learning Techniques”, International Journal of Engineering and Technology (IJET),7(6), 2038-2044, 2016.
- Effrosynidis D., Symeonidis S., Arampatzis A, “ A Comparison of Pre-processing Techniques for Twitter Sentiment Analysis”, In: Kamps J., Tsakonas G., Manolopoulos Y., Iliadis L., Karydis I. (eds) Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science, Springer, Cham, vol 10450, 2017.
- Uysal, A.K., G¨unal, S, “The impact of preprocessing on text classification. Inf. Process. Manage”, 50(1), doi:10.1016/j.ipm.2013.08.006, 104–112, 2014.
- Saif, H., Fern´andez, M., He, Y., Alani, H, “Evaluation datasets for twitter sentiment analysis: a survey and a new dataset, the STS-gold”, In Proceedings of the First International Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI, Turin, Italy, pp. 9–21, 2013.
- A. Pak and P. Paroubek. “Twitter as a Corpus for Sentiment Analysis and Opinion Mining”, In Proceedings of the Seventh Conference on International Language Resources and Evaluation, pp.1320–1326, 2010.
- Singh, T., Kumari, M, “Role of text pre-processing in twitter sentiment analysis”, Proc. Comput.Sci.89, pp 549–554, 2016.
- H. Cui, V. Mittal, and M. Datar, “Comparative Experiments on Sentiment Classification for Online Product Reviews.”, In Proceedings of AAAI-06, pp.1265-1270,2006.
- Z. Niu, Z. Yin, and X. Kong, “Sentiment classification for micro blog by machine learning,” in Computational and Information Sciences (ICCIS), 2012,Fourth International Conference on, pp. 286–289, IEEE, 2012.
- Karanasou, M., Ampla, A., Doulkeridis, C., & Halkidi, M, “Scalable and Real-Time Sentiment Analysis of Twitter Data”, IEEE 16th International Conference on Data Mining Workshops (ICDMW). doi:10.1109/icdmw.2016.0138, 2016.
- Liu, B., Blasch, E., Chen, Y., Shen, D., & Chen, G, “Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier”, IEEE International Conference on Big Data. doi:10.1109/bigdata.2013.6691740, 2013.
- Sheela, L. J, “A Review of Sentiment Analysis in Twitter Data Using Hadoop”, International Journal of Database Theory and Application,9(1), 77-86. doi:10.14257, 2016.
- Duan, G., Hu, W., & Zhang, Z, “A Novel Multilayer Data Clustering Framework based on Feature Selection and Modified K-Means Algorithm”, International Journal of Signal Processing, Image Processing and Pattern Recognition,9(4), 81-90. doi:10.14257/ijsip.2016.9.4.08, 2016.
- Edison, M., & Aloysius, A, “Concepts and Methods of Sentiment Analysis on Big Data”, International Journal of Innovative Research in Science, Engineering and Technology,5(9), 16288-16296. doi:10.15680/IJIRSET.2016.0509102, 2016.
- P., A., N., N., & Rao, A, “Sentiment Analysis of Social Media Data using Hadoop Framework: A Survey”, International Journal of Computer Applications,151(6), 7-10. doi:10.5120/ijca2016911833, 2016.
- Pak, A., & Paroubek, P, “Twitter as a Corpus for Sentiment Analysis and Opinion Mining”, In Proceedings of the International Conference on Language Resources and Evaluation, pp. 1320-1326, 2010.
- Ajinkya Ingle, Anjali Kante, Shriya Samak, Anita Kumari, “Sentiment Analysis of Twitter Data Using Hadoop”, International Journal of Engineering Research and General Science Volume 3, Issue 6, November-December, 2015.
Abstract Views: 241
PDF Views: 0