Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Text Analytics Framework Using Apache Spark and Combination of Lexical and Machine Learning Techniques


Affiliations
1 Computer Science and Engineering, Visvesvaraya Technological University, Karnataka, India
2 Computer Science and Engineering, Gogte Institute of Technology, Belgaum, Karnataka, India
     

   Subscribe/Renew Journal


Today, we live in a 'data age'. The sudden increase in the amount of user-generated data on social media platforms like Twitter, has led to new opportunities and challenges for companies that strive hard to keep an eye on customer reviews and opinions about their products. Twitter is a huge fast emergent micro-blogging social networking platform for users to express their views about politics, products sports etc. These views are useful for businesses, government and individuals. Hence, tweets are used in this framework for mining public's opinion. Sentiment analysis is a process of naturally recognizing whether a user-generated content expresses positive, negative or neutral opinion about an entity (i.e. product, people, topic, event etc). The traditional analytics tools are costly and are not built to handle Big data. Hadoop, though being a popular framework for data intensive applications, does not perform well on iterative process (like data analysis) due to the cost paid for data reloading from disk for each iteration. This paper proposes a Text analysis framework for twitter data using Apache spark and hence is more flexible, fast and scalable. The proposed framework is also domain independent as it uses a hybrid approach by combining supervised machine learning algorithms (Naïve Bayes and decision tree machine learning algorithms) and lexicon approach (pattern analyzer) for sentiment classification thereby comparing various supervised learning models and using the one with highest accuracy for predicting sentiment.

Keywords

Sentiment Analysis, Machine Learning, Lexical Approach, Apache Spark, Natural Language Processing, Twitter.
Subscription Login to verify subscription
User
Notifications
Font Size


  • Hassan, A., & Medhat, W. (2014). Sentiment analysis algorithms and applications: A survey. Shams Engineering Journal, 5, 1093-1113.
  • Andrea, A. D., & Ferri, F. (2015). Approaches, tools and applications for sentiment analysis implementation. International Journal of Computer Applications, 125(3).
  • Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., & Liu, B. (2011). Combining Lexicon-based and Learningbased Methods for Twitter Sentiment Analysis. Hewlett-Packard Development Company.
  • Vohra, S. M., & Teraiya, J. B. (2013). A comparative study of sentiment analysis techniques. Journal of Information, Knowledge and Research in Computer Engineering, 2(2), 313-317.
  • Zhang, M.-L., Pe˜na, J. M., & Robles, V. (2009). Feature selection for multi-label naive Bayes classification. Elsevier, Information Science Journal.
  • Neethu, M. S., & Rajashree, R. (2013). Sentiment Analysis in Twitter using Machine Learning Techniques. Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT).
  • Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In proceedings of 4th annual meetings for computational linguistics, 417-424.
  • Milstein, S., Chowdhury, A., Hochmuth, G., Lorica, B., & Magoulas, R. (2008). Twitter and the micro-messaging revolution: Communication, connections. An OReilly Radar Report, p. 54.
  • Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 79& 86.
  • Ko, E. H., & Klabjan, D. (2014). Semantic Properties of Customer Sentiment in Tweets, 28th International Conference on Advanced Information Networking and Applications Workshops.
  • Liu, B. (2012). Sentiment analysis and opinion mining (18-19, 27-28, 44-45, 47, and 90-101). Morgan and Claypool Publishers.
  • Cho, S. H., & Kang, H.-B. (2012). Text Sentiment Classification for SNS-based Marketing Using Domain Sentiment Dictionary. IEEE International Conference on Conference on consumer Electronics (ICCE), 717-718, 2012.
  • Kanakaraj, M., & Guddeti, R. R. (2015). NLP Based Sentiment Analysis on Twitter Data Using Ensemble Classifiers. 3rd International Conference on Signal Processing, Communication and Networking.
  • Mane, S. B., Sawant, Y., Kazi, S., & Shinde, V. (2014). Real time sentiment analysis of twitter data using Hadoop. International Journal of Computer Science and Information Technologies, 5(3), 3098-3100.
  • Rajurkar, G. D., & Goudar, R. M. (2015). A speedy data uploading approach for Twitter Trend And Sentiment Analysis using Hadoop. International Conference on Computing Communication Control and Automation.
  • Bhuvan, M. S., & Rao, V. D. (2015). Semantic Sentiment Analysis Using Context Specific Grammar. International Conference on Computing, Communication and Automation (ICCCA2015).

Abstract Views: 196

PDF Views: 1




  • Text Analytics Framework Using Apache Spark and Combination of Lexical and Machine Learning Techniques

Abstract Views: 196  |  PDF Views: 1

Authors

Anuja Prakash Jain
Computer Science and Engineering, Visvesvaraya Technological University, Karnataka, India
Padma Dandannavar
Computer Science and Engineering, Gogte Institute of Technology, Belgaum, Karnataka, India

Abstract


Today, we live in a 'data age'. The sudden increase in the amount of user-generated data on social media platforms like Twitter, has led to new opportunities and challenges for companies that strive hard to keep an eye on customer reviews and opinions about their products. Twitter is a huge fast emergent micro-blogging social networking platform for users to express their views about politics, products sports etc. These views are useful for businesses, government and individuals. Hence, tweets are used in this framework for mining public's opinion. Sentiment analysis is a process of naturally recognizing whether a user-generated content expresses positive, negative or neutral opinion about an entity (i.e. product, people, topic, event etc). The traditional analytics tools are costly and are not built to handle Big data. Hadoop, though being a popular framework for data intensive applications, does not perform well on iterative process (like data analysis) due to the cost paid for data reloading from disk for each iteration. This paper proposes a Text analysis framework for twitter data using Apache spark and hence is more flexible, fast and scalable. The proposed framework is also domain independent as it uses a hybrid approach by combining supervised machine learning algorithms (Naïve Bayes and decision tree machine learning algorithms) and lexicon approach (pattern analyzer) for sentiment classification thereby comparing various supervised learning models and using the one with highest accuracy for predicting sentiment.

Keywords


Sentiment Analysis, Machine Learning, Lexical Approach, Apache Spark, Natural Language Processing, Twitter.

References