Open Access
Subscription Access
Open Access
Subscription Access
Text Analytics Framework Using Apache Spark and Combination of Lexical and Machine Learning Techniques
Subscribe/Renew Journal
Today, we live in a 'data age'. The sudden increase in the amount of user-generated data on social media platforms like Twitter, has led to new opportunities and challenges for companies that strive hard to keep an eye on customer reviews and opinions about their products. Twitter is a huge fast emergent micro-blogging social networking platform for users to express their views about politics, products sports etc. These views are useful for businesses, government and individuals. Hence, tweets are used in this framework for mining public's opinion. Sentiment analysis is a process of naturally recognising whether a user-generated content expresses positive, negative or neutral opinion about an entity (i.e. product, people, topic, event etc). The traditional analytics tools are costly and are not built to handle Big data. Hadoop, though being a popular framework for data intensive applications, does not perform well on iterative process (like data analysis) due to the cost paid for data reloading from disk for each iteration. This paper proposes a text analysis framework for twitter data using Apache spark and hence is more flexible, fast, and scalable. The proposed framework is also domain independent as it uses a hybrid approach by combining supervised machine learning algorithms (Naïve Bayes and decision tree machine learning algorithms) and lexicon approach (pattern analyser) for sentiment classification thereby comparing various supervised learning models and using the one with highest accuracy for predicting sentiment.
Keywords
Sentiment Analysis, Machine Learning, Lexical Approach, Apache Spark, Natural Language Processing, Twitter.
Subscription
Login to verify subscription
User
Font Size
Information
Abstract Views: 308
PDF Views: 0