Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Text Analytics Framework Using Apache Spark and Combination of Lexical and Machine Learning Techniques


Affiliations
1 Computer Science and Engineering, Visvesvaraya Technological University, Belgaum, Karnataka, India
2 Computer Science and Engineering, Gogte Institute of Technology, Belgaum, Karnataka, India
     

   Subscribe/Renew Journal


Today, we live in a 'data age'. The sudden increase in the amount of user-generated data on social media platforms like Twitter, has led to new opportunities and challenges for companies that strive hard to keep an eye on customer reviews and opinions about their products. Twitter is a huge fast emergent micro-blogging social networking platform for users to express their views about politics, products sports etc. These views are useful for businesses, government and individuals. Hence, tweets are used in this framework for mining public's opinion. Sentiment analysis is a process of naturally recognising whether a user-generated content expresses positive, negative or neutral opinion about an entity (i.e. product, people, topic, event etc). The traditional analytics tools are costly and are not built to handle Big data. Hadoop, though being a popular framework for data intensive applications, does not perform well on iterative process (like data analysis) due to the cost paid for data reloading from disk for each iteration. This paper proposes a text analysis framework for twitter data using Apache spark and hence is more flexible, fast, and scalable. The proposed framework is also domain independent as it uses a hybrid approach by combining supervised machine learning algorithms (Naïve Bayes and decision tree machine learning algorithms) and lexicon approach (pattern analyser) for sentiment classification thereby comparing various supervised learning models and using the one with highest accuracy for predicting sentiment.

Keywords

Sentiment Analysis, Machine Learning, Lexical Approach, Apache Spark, Natural Language Processing, Twitter.
Subscription Login to verify subscription
User
Notifications
Font Size


Abstract Views: 308

PDF Views: 0




  • Text Analytics Framework Using Apache Spark and Combination of Lexical and Machine Learning Techniques

Abstract Views: 308  |  PDF Views: 0

Authors

Anuja Prakash Jain
Computer Science and Engineering, Visvesvaraya Technological University, Belgaum, Karnataka, India
Padma Dandannavar
Computer Science and Engineering, Gogte Institute of Technology, Belgaum, Karnataka, India

Abstract


Today, we live in a 'data age'. The sudden increase in the amount of user-generated data on social media platforms like Twitter, has led to new opportunities and challenges for companies that strive hard to keep an eye on customer reviews and opinions about their products. Twitter is a huge fast emergent micro-blogging social networking platform for users to express their views about politics, products sports etc. These views are useful for businesses, government and individuals. Hence, tweets are used in this framework for mining public's opinion. Sentiment analysis is a process of naturally recognising whether a user-generated content expresses positive, negative or neutral opinion about an entity (i.e. product, people, topic, event etc). The traditional analytics tools are costly and are not built to handle Big data. Hadoop, though being a popular framework for data intensive applications, does not perform well on iterative process (like data analysis) due to the cost paid for data reloading from disk for each iteration. This paper proposes a text analysis framework for twitter data using Apache spark and hence is more flexible, fast, and scalable. The proposed framework is also domain independent as it uses a hybrid approach by combining supervised machine learning algorithms (Naïve Bayes and decision tree machine learning algorithms) and lexicon approach (pattern analyser) for sentiment classification thereby comparing various supervised learning models and using the one with highest accuracy for predicting sentiment.

Keywords


Sentiment Analysis, Machine Learning, Lexical Approach, Apache Spark, Natural Language Processing, Twitter.