Open Access Open Access  Restricted Access Subscription Access

Machine Learning Approach to Improve Data Connectivity in Text-Based Personality Prediction Using Multiple Data Sources Mapping


Affiliations
1 Department of Computer Science and Engineering, Anil Neerukonda Institute of Technology and Sciences (A), Visakhapatnam 531 162, India
 

This paper considers the task of personality prediction using social media text data. Personality datasets with conventional personality labels are few, and collecting them is challenging due to privacy concerns and the high expense of hiring expert psychologists to label them. Pertaining to a smaller number of labelled samples available, existing studies usually adds a sentiment, statistical NLP features to the text data to improve the accuracy of the personality detection model. To overcome these concerns, this research proposes a new methodology to generate a large amount of labelled data that can be used by deep learning algorithms. The model has three components: general data representation, data mapping and classification. The model applies Personality correlation descriptors to incorporate correlation information and further use this information in generating dataset mapping algorithm. Experimental results clearly demonstrate that the proposed method beats strong baselines across a variety of evaluation metrics. The results had the highest accuracy of 86.24% and 0.915 F1 measure score on the combined MBTI and Essays dataset. Moreover, the new dataset constructed contains 3,84,089 labelled samples on the combined dataset and can be further considered for personality prediction using the famous Five Factor Model thereby alleviating the problem of limited labelled samples for the purpose of personality detection.

Keywords

BERT, Deep Learning, Natural Language Processing, Personality Detection, Social Media.
User
Notifications
Font Size

  • https://en.wikipedia.org/wiki/Graphology (22 December 2022)
  • Plamondon P, Neuromuscular studies of handwriting generation and representation, 12th Int Conf Front Handwrit Recognit (IEEE) 2010, 261–261.
  • Gavrilescu M & Vizireanu N, Predicting the big five personality traits from handwriting, EURASIP J Image Video Process, 1 (2018) 1–17.
  • Violino B, Social media trends, Asso Comput Machinery, Commun ACM, 54(2) (2020) 17.
  • Alam F, Stepanov E A & Riccardi G, Personality traits recognition on social network—Facebook [AAAI workshop], 13 (2013) 6–9.
  • Dalvi-Esfahani M, Niknafs A, Alaedini Z, Barati A, Kuss D J & Ramayah T, Social media addiction and empathy: Moderating impact of personality traits among high school students, Telemat Inform, 57 (2021) 1–31.
  • Han S, Huang H & Tang Y, Knowledge of words: An interpretable approach for personality recognition from social media, Knowl Syst, 194 (2020) 105550. https://doi.org/10.1016/j.knosys.2020.105550
  • Howlader P, Pal K K, Cuzzocrea A & Kumar S D M, Predicting Facebook-users’ personality based on status and linguistic features via fexible regression analysis techniques, Proc ACM Symp Appl Comput, 18 (2018) 339–345.
  • Khurana D, Koli A, Khatter K & Singh S, Natural language processing: State of the art, current trends and arXiv: 1708.05148, (2017) 1– 25, https://doi.org/10.48550/arXiv.1708.05148
  • Kircaburun K, Alhabash S, Tosuntas S B & Griffiths M D, Uses and gratifications of problematic social media use among university students: A simultaneous examination of the big five of personality traits, social media platforms, and social media use motives, Int J Ment Health Addict, 3 (2020) 525–547.
  • Taramigkou M, Apostolou D & Mentzas G, Leveraging exploratory search with personality traits and interactional context, Inf Process Manag, 4 (2018) 609–629.
  • Al-Samarraie H, Eldenfria A & Dawoud H, The impact of personality traits on users’ information-seeking behavior, Inf Process Manag, 1 (2017) 237–247.
  • Aung Z M M & Myint P H, Personality prediction based on the content of Facebook users: A literature review, Proc-20th IEEE/ACIS Int Conf on Softw Eng, Artif Intell, Netw Parallel/Distrib Comput (IEEE), 2019, 34– 38.
  • Dandannavar P S, Mangalwede S R & Kulkarni P M, Social media text—A source for personality prediction, Proc Int Conf Comput Tech, Electron Mech Syst (IEEE) 2018, 62– 65.
  • https://www.truity.com/page/16-personality-types-myers-briggs (23 December 2022)
  • https://courses.lumenlearning.com/wmopen-psychology/chapter/personality-assessment/
  • Marouf A A, Hasan M K & Mahmud H, Comparative analysis of feature selection algorithms for computational personality prediction from social media, IEEE Trans Comput Soc Syst, 3 (2020) 587–599.
  • Tandera T, Hendro S, Suhartono D, Wongso R & Prasetio Y L, Personality prediction system from Facebook users, Procedia Comput Sci, 116 (2017) 604– 611.
  • Ren Z, Shen Q, Diao X & Xu H, A sentiment-aware deep learning approach for personality detection from text, Inf Process Manag, 3 (2021) 2411–2502.
  • Christian H, Suhartono D, Chowanda A & Zamli K Z, Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging, J Big Data, 1 (2021) 1– 20.
  • Shiva kumar G & Vijaya P, Facial expression based human emotion recognition with live computer response, Int J Comput Sci Inf Technol, 4 (2011) 81– 84.
  • Chaudhary S, Sing R, Hasan S T & Kaur I, A comparative study of different classifiers for Myers-Brigg personality prediction model, 05 (2018) 1410– 1413.
  • Arroju M, Hassan A & Farnadi G, Age, gender and personality recognition using tweets in a multilingual setting, In 6th Conf Labs Eval Forum (CLEF 2015): Experi Meet Multiling Multimod Interact, 23 (2015) 23– 31.
  • Ezpeleta E, Velez de M, Hidalgo J M G & Zurutuza U, Novel email spam detection method using sentiment analysis and personality recognition, Logic J IGPL, 1 (2020) 83– 94.
  • Lee J & Bastos N, Finding characteristics of users in sensory information: From activities to personality traits, Sensors 5 (2020) 1383.
  • Thomas S, Goel M & Agrawal D, A framework for analysing financial behavior using machine learning classification of personality through handwriting analysis, J Behav Exp Finance, 26 (2020) 100315.
  • Liu L, Preotiuc-Pietro D, Samani Z R, Moghaddam M E & Ungar L H, Analyzing personality through social media profile picture choice, Proc ICWSM 2016, 31 March.
  • Rahman M A, Faisal A A, Khanam T, Amjad M & Siddik M S, Proc 1st Int Conf Adv Sci Eng Robot Tech (ICASERT), (2019).
  • https://www.kaggle.com/datasnaek/mbti-type (24 December 2022).
  • Majumder N, Poria S, Gelbukh A & Cambria E, Deep learning-based document modeling for personality detection from text, IEEE Intell Syst, 2 (2017) 74– 79.
  • https://www.myersbriggs.org/my-mbti-personality-type/mbti-basics/home.htm (24 December 2022)
  • Mukaka M M, Statistics corner: A guide to appropriate use of correlation coefficient in medical research, Malawi Med J, 3 (2012) 69–71.
  • Pennebaker J W & King L A, Linguistic styles: Language use as an individual difference, J Pers Soc Psychol, 6 (1999) 1296– 1312.
  • https://sites.google.com/michalkosinski.com/mypersonality (24 December 2022)
  • Furnham A, The Big five versus the Big Four: The relationship between the Myers–Briggs Type Indicator and the NEO-PI five-factor model of personality, Pers Individ Differ, 2 (1996) 303– 307.
  • Furnham A, Moutafi J & Crump J, The Relationship between the revised NEO-Personality Inventory and the Myers–Briggs Type Indicator, Soc Behav Pers, 6 (2003) 577– 584. 3
  • McCrae R R & Costa P T Jr, Reinterpreting the Myers–Briggs Type Indicator from the perspective of the five-factor model of personality, J Pers, 1 (1989) 17– 40.
  • Zheng H & Wu C, Predicting personality using Facebook status based on semi-supervised learning, ACM Int Conf Proc Ser (2019), 59– 64, https://doi.org/10.1145/ 3318299.3318363
  • Tadesse M M, Lin H, Xu B & Yang L, Personality predictions based on user behavior on the Facebook social media platform, IEEE Access, 6 (2016) 61959– 61969.
  • Yuan C, Wu J, Li H & Wang L, Personality recognition based on user generated content, 15th Int Conf Serv Syst Serv Manag ICSSSM (IEEE) 2018, 1– 6.
  • Gjurković M & Šnajder J, Reddit: A gold mine for personality prediction, Proc 2nd Workshop Comput Model people’s Opin, Person, Emot Soc Med (Association for Computational Linguistics, New Orleans, Louisiana, USA) 2018, 87– 97.
  • Peters M E, Neumann M, Zettlemoyer L & Yih W T, Dissecting contextual word embeddings: Architecture and representation, Proc Conf Empir Methods Natur Language (Proc EMNLP) 2020, 1499–1509, https://doi.org/ 10.48550/arXiv.1808.08949

Abstract Views: 43

PDF Views: 42




  • Machine Learning Approach to Improve Data Connectivity in Text-Based Personality Prediction Using Multiple Data Sources Mapping

Abstract Views: 43  |  PDF Views: 42

Authors

Sirasapalli Joshua Johnson
Department of Computer Science and Engineering, Anil Neerukonda Institute of Technology and Sciences (A), Visakhapatnam 531 162, India
M Ramakrishna Murty
Department of Computer Science and Engineering, Anil Neerukonda Institute of Technology and Sciences (A), Visakhapatnam 531 162, India

Abstract


This paper considers the task of personality prediction using social media text data. Personality datasets with conventional personality labels are few, and collecting them is challenging due to privacy concerns and the high expense of hiring expert psychologists to label them. Pertaining to a smaller number of labelled samples available, existing studies usually adds a sentiment, statistical NLP features to the text data to improve the accuracy of the personality detection model. To overcome these concerns, this research proposes a new methodology to generate a large amount of labelled data that can be used by deep learning algorithms. The model has three components: general data representation, data mapping and classification. The model applies Personality correlation descriptors to incorporate correlation information and further use this information in generating dataset mapping algorithm. Experimental results clearly demonstrate that the proposed method beats strong baselines across a variety of evaluation metrics. The results had the highest accuracy of 86.24% and 0.915 F1 measure score on the combined MBTI and Essays dataset. Moreover, the new dataset constructed contains 3,84,089 labelled samples on the combined dataset and can be further considered for personality prediction using the famous Five Factor Model thereby alleviating the problem of limited labelled samples for the purpose of personality detection.

Keywords


BERT, Deep Learning, Natural Language Processing, Personality Detection, Social Media.

References