Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Prediction of Box Office for Bollywood Movies Using State-of-the-Art SentiDraw Lexicon for Twitter Analysis


Affiliations
1 Research Scholar, Indian Institute of Foreign Trade, IIFT Bhawan, B-21, NRPC Colony, Block B, Qutab Institutional Area, New Delhi - 110 016, India
2 Professor, Indian Institute of Foreign Trade, IIFT Bhawan, B-21, NRPC Colony, Block B, Qutab Institutional Area, New Delhi - 110 016, India
     

   Subscribe/Renew Journal


Films are a high-risk industry. Accurate prediction of movie box-office revenues can reduce this market risk and inform the investment decisions regarding promotion of the movie closer to a film’s release or right after release. Studies have shown that chatter on social media platforms like Twitter along with certain movie-related factors can be useful in predicting success of movies. Sentiment of tweets for any movie gives important information about the consumer’s reaction and the polarity of these sentiments has been shown to have an impact on prediction of box-office revenues. This paper presented a novel Bollywood domain specific sentiment lexicon that delivered state-of-the-art performance for polarity determination of reviews. SentiDraw lexicon was built on movie reviews scraped from IMDB and calculated the sentiment orientation of these words by calculating the probability distribution of words across reviews with different star ratings. The results showed that SentiDraw lexicon delivered a superior performance compared to any other lexicon-based method. This significantly contributed in enhancing the prediction accuracy of box office for movies using textual data from Twitter for analysis. In fact, this study demonstrated an extremely parsimonious regression model that used only budget, hype factor, tweet volume, and polarity of tweets for a robust prediction of box office revenues even before the release of a movie.

Keywords

Sentiment Lexicon, Box Office Prediction, Sentidraw Method, Movie Reviews, Bollywood, Twitter.

Paper Submission Date : February 17, 2020 ; Paper Sent Back for Revision : October 17, 2020 ; Paper Acceptance Date : November 12, 2020 ; Paper Published Online : June 25, 2021.

User
Subscription Login to verify subscription
Notifications
Font Size

  • Abbasi, A., France, S., Zhang, Z., & Chen, H. (2011). Selecting attributes for sentiment classification using feature relation networks. IEEE Transactions on Knowledge and Data Engineering, 23(3), 447–462. https://doi.org/10.1109/tkde.2010.110
  • Almatarneh, S., & Gamallo, P. (2018). Automatic construction of domain-specific sentiment lexicons for polarity classification. In, F. De la Prieta et al. (eds), Trends in cyber-physical multi-agent systems. The PAAMS Collection - 15th International Conference, PAAMS 2017. Advances in Intelligent Systems and Computing (Vol. 619). Springer, Cham. https://doi.org/10.1007/978-3-319-61578-3_17
  • Baccianella, S., Esuli, A., & Sebastiani, F. (2010). SentiWordNet 3.0 : An enhanced lexical resource for sentiment analysis and opinion mining. In, Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10) (Vol. 10, No. 2010, pp. 2200–2204). https://doi.org/10.1109/mis.2010.94
  • Bai, X. (2011). Predicting consumer sentiments from online text. Decision Support Systems, 50(4), 732–742. https://doi.org/10.1016/j.dss.2010.08.024
  • Bhāle, S., & Tongare, K. (2018). A conceptual model of helpfulness of online reviews in a blink. Indian Journal of Marketing, 48(2), 7–22. https://doi.org/10.17010/ijom/2018/v48/i2/121331
  • Chintagunta, P. K., Gopinath, S., & Venkataraman, S. (2010). The effects of online user reviews on movie box office performance : Accounting for sequential rollout and aggregation across local markets. Marketing Science, 29(5), 944–957. https://doi.org/10.1287/mksc.1100.0572
  • Dastidar, S. G., & Elliott, C. (2019). The Indian film industry in a changing international market. Journal of Cultural Economics, 44(1), 97–116. https://doi.org/10.1007/s10824-019-09351-6
  • Dellarocas, C., Zhang, X. (Michael), & Awad, N. F. (2007). Exploring the value of online product reviews in forecasting sales : The case of motion pictures. Journal of Interactive Marketing, 21(4), 23–45. https://doi.org/10.1002/dir.20087
  • Dhir, R., & Raj, A. (2018). Movie success prediction using machine learning algorithms and their comparison. 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), 385–390. https://doi.org/10.1109/icsccc.2018.8703320
  • Du, Y., Zhao, X., He, M., & Guo, W. (2019). A novel capsule based hybrid neural network for sentiment classification. IEEE Access, 7, 39321–39328. https://doi.org/10.1109/access.2019.2906398
  • EY India. (2019, January 14). The Indian film tourism industry has potential to generate US$3b by 2022 [press release]. https://www.ey.com/en_in/news/2019/01/indian-film-tourism-industry-has-potential-to-generate-usd-3-billion-by-2022
  • Gatti, L., Guerini, M., & Turchi, M. (2016). SentiWords : Deriving a high precision and high coverage lexicon for sentiment analysis. IEEE Transactions on Affective Computing, 7(4), 409–421. https://doi.org/10.1109/taffc.2015.2476456
  • Ghose, A., & Ipeirotis, P. G. (2011). Estimating the helpfulness and economic impact of product reviews : Mining text and reviewer characteristics. IEEE Transactions on Knowledge and Data Engineering, 23(10), 1498–1512. https://doi.org/10.1109/tkde.2010.188
  • Iqbal, F., Hashmi, J. M., Fung, B. C., Batool, R., Khattak, A. M., Aleem, S., & Hung, P. C. (2019). A hybrid framework for sentiment analysis using genetic algorithm based feature reduction. IEEE Access, 7, 14637–14652. https://doi.org/10.1109/access.2019.2892852
  • Jaiswal, S. R., & Sharma, D. (2017). Predicting success of Bollywood movies using machine learning techniques. In, Proceedings of the 10th Annual ACM India Compute Conference (Compute’17). Association for Computing Machinery. https://doi.org/10.1145/3140107.3140126
  • Jiménez - Zafra, S. M., Martin, M., Molina - González, M. D., & Urena - Lopez, L. A. (2016). Domain adaptation of polarity lexicon combining term frequency and bootstrapping. In, Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (pp. 137–146). Association for Computational Linguistics. https://doi.org/10.18653/v1/w16-0422
  • Khan, F. H., Qamar, U., & Bashir, S. (2015). SentiMI : Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection. Applied Soft Computing, 39, 140–153. https://doi.org/10.1016/j.asoc.2015.11.016
  • Khan, F. H., Qamar, U., & Bashir, S. (2016). Senti - CS : Building a lexical resource for sentiment analysis using subjective feature selection and normalized chi - square based feature weight generation. Expert Systems, 33(5), 489–500. https://doi.org/10.1111/exsy.12161
  • Khoo, C. S., & Johnkhan, S. B. (2018). Lexicon-based sentiment analysis : Comparative evaluation of six sentiment lexicons. Journal of Information Science, 44(4), 491–511. https://doi.org/10.1177/0165551517703514
  • Labille, K., Gauch, S., & Alfarhood, S. (2017, August). Creating domain-specific sentiment lexicons via text mining. WISDOM' 17. http://www.csce.uark.edu/~sgauch/5543/F17/notes/wisdom17.pdf
  • Lee, H., Han, Y., & Kim, K. (2014). Sentiment analysis on online social network using probability Model. In, AFIN 2014 : Proceedings of the Sixth International Conference on Advances in Future Internet (pp.14–19). https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.671.6392&rep=rep1&type=pdf
  • Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June). Learning word vectors for sentiment analysis. In, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics : Human Language Technologies (Vol. 1, pp. 142–150). https://www.aclweb.org/anthology/P11-1015.pdf
  • Musto, C., Semeraro, G., & Polignano, M. (2014, December). A comparison of lexicon-based approaches for sentiment analysis of microblog posts. Information Filtering and Retrieval. In, DART@ AI* IA (pp. 59–68). https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.664.7765&rep=rep1&type =pdf#page=66
  • Narayanaperumal, M. (2020). Deep neural networks for sentiment analysis in tweets with emoticons (Doctoral Dissertation). Nova Southeastern University. https://nsuworks.nova.edu/gscis_etd/1117
  • Niraj, R., & Singh, J. (2015). Impact of user-generated and professional critics reviews on Bollywood movie success. Australasian Marketing Journal, 23(3), 179–187. https://doi.org/10.1016/j.ausmj.2015.02.001
  • Pang, B., & Lee, L. (2004). A sentimental education : Sentiment analysis using subjectivity summarization based on minimum cuts. In, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL ’04). Association for Computational Linguistics, USA. https://doi.org/10.3115/1218955.1218990
  • Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up ? Sentiment classification using machine learning techniques. In, Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10 (EMNLP ’02). Association for Computational Linguistics, USA. https://doi.org/10.3115/1118693.1118704
  • Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count : LIWC 2001. Mahway : Lawrence Erlbaum Associates.
  • Prabowo, R., & Thelwall, M. (2009). Sentiment analysis : A combined approach. Journal of Informetrics, 3(2), 143–157. https://doi.org/10.1016/j.joi.2009.01.003
  • Reddy, A. S., Kasat, P., & Jain, A. (2012). Box - office opening prediction of movies based on hype analysis through data mining. International Journal of Computer Applications, 56(1), 1–5. https://doi.org/10.5120/8852-2794
  • Saif, H., Fernandez, M., He, Y., & Alani, H. (2014). SentiCircles for contextual and conceptual semantic sentiment analysis of Twitter. In, V. Presutti, C. D’Amato, F. Gandon, M. D’Aquin, S. Staab, & A. Tordai (eds), The semantic web : Trends and challenges. ESWC 2014. Lecture Notes in Computer Science (Vol. 8465). Springer, Cham. https://doi.org/10.1007/978-3-319-07443-6_7
  • Sharma, S. S., & Dutta, G. (2018). Polarity determination of movie reviews : A systematic literature review. International Journal of Innovative Knowledge Concepts, 6(12), 43–55.
  • Shaukat, Z., Zulfiqar, A. A., Xiao, C., Azeem, M., & Mahmood, T. (2020). Sentiment analysis on IMDB using lexicon and neural networks. SN Applied Sciences, 2(2), 1–10. https://doi.org/10.1007/s42452-019-1926-x
  • Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational Linguistics, 37(2), 267–307. https://doi.org/10.1162/coli_a_00049
  • Thelwall, M. (2017). The heart and soul of the web ? Sentiment strength detection in the social web with SentiStrength. In, J. Holyst (eds), Cyberemotions. Understanding complex systems. Springer, Cham. https://doi.org/10.1007/978-3-319-43639-5_7
  • Thomas, F. C., & Patel, N. K. (2020). Determining the effectiveness of promotion and reviews of Bollywood films from audiences : An empirical study. Indian Journal of Marketing, 50(4), 7–24. https://doi.org/10.17010/ijom/2020/v50/i4/151570
  • Utomo, T. S., Sarno, R., & Suhariyanto. (2018, September). Emotion label from ANEW dataset for searching best definition from WordNet. In, 2018 International Seminar on Application for Technology of Information and Communication (pp. 249–252). IEEE. https://doi.org/10.1109/isemantic.2018.8549769
  • Venkataraman, N., & Raman, S. (2016). Impact of user-generated content on purchase intention for fashion products : A study on women consumers in Bangalore. Indian Journal of Marketing, 46(7), 23–35. https://doi.org/10.17010/ijom/2016/v46/i7/97125

Abstract Views: 299

PDF Views: 4




  • Prediction of Box Office for Bollywood Movies Using State-of-the-Art SentiDraw Lexicon for Twitter Analysis

Abstract Views: 299  |  PDF Views: 4

Authors

Shashank Shekhar Sharma
Research Scholar, Indian Institute of Foreign Trade, IIFT Bhawan, B-21, NRPC Colony, Block B, Qutab Institutional Area, New Delhi - 110 016, India
Gautam Dutta
Professor, Indian Institute of Foreign Trade, IIFT Bhawan, B-21, NRPC Colony, Block B, Qutab Institutional Area, New Delhi - 110 016, India

Abstract


Films are a high-risk industry. Accurate prediction of movie box-office revenues can reduce this market risk and inform the investment decisions regarding promotion of the movie closer to a film’s release or right after release. Studies have shown that chatter on social media platforms like Twitter along with certain movie-related factors can be useful in predicting success of movies. Sentiment of tweets for any movie gives important information about the consumer’s reaction and the polarity of these sentiments has been shown to have an impact on prediction of box-office revenues. This paper presented a novel Bollywood domain specific sentiment lexicon that delivered state-of-the-art performance for polarity determination of reviews. SentiDraw lexicon was built on movie reviews scraped from IMDB and calculated the sentiment orientation of these words by calculating the probability distribution of words across reviews with different star ratings. The results showed that SentiDraw lexicon delivered a superior performance compared to any other lexicon-based method. This significantly contributed in enhancing the prediction accuracy of box office for movies using textual data from Twitter for analysis. In fact, this study demonstrated an extremely parsimonious regression model that used only budget, hype factor, tweet volume, and polarity of tweets for a robust prediction of box office revenues even before the release of a movie.

Keywords


Sentiment Lexicon, Box Office Prediction, Sentidraw Method, Movie Reviews, Bollywood, Twitter.

Paper Submission Date : February 17, 2020 ; Paper Sent Back for Revision : October 17, 2020 ; Paper Acceptance Date : November 12, 2020 ; Paper Published Online : June 25, 2021.


References





DOI: https://doi.org/10.17010/ijom%2F2021%2Fv51%2Fi5-7%2F161644