Text Analytics Framework Using Apache Spark and Combination of Lexical and Machine Learning Techniques

Anuja Prakash Jain; Padma Dandannavar

Text Analytics Framework Using Apache Spark and Combination of Lexical and Machine Learning Techniques

Anuja Prakash Jain ¹, Padma Dandannavar ²

Affiliations
1 Computer Science and Engineering, Visvesvaraya Technological University, Belgaum, Karnataka, India
2 Computer Science and Engineering, Gogte Institute of Technology, Belgaum, Karnataka, India

Today, we live in a 'data age'. The sudden increase in the amount of user-generated data on social media platforms like Twitter, has led to new opportunities and challenges for companies that strive hard to keep an eye on customer reviews and opinions about their products. Twitter is a huge fast emergent micro-blogging social networking platform for users to express their views about politics, products sports etc. These views are useful for businesses, government and individuals. Hence, tweets are used in this framework for mining public's opinion. Sentiment analysis is a process of naturally recognising whether a user-generated content expresses positive, negative or neutral opinion about an entity (i.e. product, people, topic, event etc). The traditional analytics tools are costly and are not built to handle Big data. Hadoop, though being a popular framework for data intensive applications, does not perform well on iterative process (like data analysis) due to the cost paid for data reloading from disk for each iteration. This paper proposes a text analysis framework for twitter data using Apache spark and hence is more flexible, fast, and scalable. The proposed framework is also domain independent as it uses a hybrid approach by combining supervised machine learning algorithms (Naïve Bayes and decision tree machine learning algorithms) and lexicon approach (pattern analyser) for sentiment classification thereby comparing various supervised learning models and using the one with highest accuracy for predicting sentiment.

Keywords

Sentiment Analysis, Machine Learning, Lexical Approach, Apache Spark, Natural Language Processing, Twitter.

I-Scholar

Journal Help

Subscription Login to verify subscription

User

Notifications

Journal Content
Browse

Font Size

Information

Abstract Views: 309

PDF Views: 0

Text Analytics Framework Using Apache Spark and Combination of Lexical and Machine Learning Techniques

Abstract Views: 309 | PDF Views: 0

Authors

Anuja Prakash Jain
Computer Science and Engineering, Visvesvaraya Technological University, Belgaum, Karnataka, India

Padma Dandannavar
Computer Science and Engineering, Gogte Institute of Technology, Belgaum, Karnataka, India

Abstract

Keywords

Sentiment Analysis, Machine Learning, Lexical Approach, Apache Spark, Natural Language Processing, Twitter.

Username
Password
Remember me

Username
Password
Remember me

Journal of Applied Information Science

Journal of Applied Information Science

Text Analytics Framework Using Apache Spark and Combination of Lexical and Machine Learning Techniques

Subscribe/Renew Journal

Keywords

Text Analytics Framework Using Apache Spark and Combination of Lexical and Machine Learning Techniques

Authors

Abstract

Keywords