Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

LD- SMO Algorithm for Determining Trust in Skewed Social Media Data


Affiliations
1 University of Kashmir, India
     

   Subscribe/Renew Journal


Collaborative Web Applications (CWAs) have become a pervasive part of internet. Social media (such as Face book, twitter), topical forums, wikis are all examples of CWAs – that enable a community of end users to interact or cooperate towards a common goal. Some key characteristics of CWAs such as low entry barrier, instant updates, large number of friends, open platform, anonymity make it vulnerable to activities of ill intentions thereby providing a medium for nefarious persons to operate. One such collaborative system is Twitter which has experienced enormous growth in a small amount of time and users do spend respectable amount of their everyday time interacting about various topics with other peers. Twitter has evolved from being a conversation or opinion sharing medium among friends into a platform to share and disseminate information about current events. Events in the real world create a corresponding spur of posts (tweets) on Twitter. In other words it has evolved from a micro blogging service to a major news source. Although a large volume of content is posted on Twitter, not all information is trustworthy or useful in providing information about the event. Gossips, fake news etc. are also a part of genuine news. The main aim of this paper is to tackle the issue of accuracy paradox, a major problem when dealing with social media research, were the data extracted by us was highly imbalanced. This high imbalance in the data was solved by designing an LD-SMO algorithm which achieved an accuracy of ~96% with an equally comparable sensitivity and specificity.  


Keywords

LD-SMO: Linear Discriminant-Sequential Minimal Optimization, NB: NaiveBayes, R: Reliable, Se: Sensitivity, Sp: Specificity, UR: Unreliable, SVM: Support Vector Machines.
User
Subscription Login to verify subscription
Notifications
Font Size

  • Andrew G. West, Jian Chang, Krishna K. Venkatasubramanian, Insup Lee,Trust in collaborative web applications, in: Future generation Computer Systems, 28(2012) 1238-1251
  • Huan Liu (ASU), Jiawei Han (UIUC), Hiroshi Motoda (Osaka University), Uncovering deception in Social Media, a special issue in Springer Journal Social Network Analysis and Mining
  • S. Pogatchnik, Student hoaxes world’s media on Wikipedia. http://www.msnbc.msn.com/id/30699302/.
  • T. P. R. Center. Internet overtakes newspapers as news outlet, December 2008. http://pewresearch.org/pubs/1066/internet-overtakes-newspapers-as-newssource [pewresearch.org; posted 23-December-2008].
  • A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 56–65. ACM, 2007.
  • M. Naaman, J. Boase, and C.-H. Lai. Is it really about me? Message content in social awareness streams. In Proceedings of the 2010 ACM conference on Computer supported cooperative work, pages 189–192. ACM, 2010.
  • S. Laird. ”how social media is taking over the news industry”, April 2012. http://mashable.com/2012/04/18/social-media-and-the-news/[mashable.com; posted 18-April-2012].
  • H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World Wide Web, pages 591–600. ACM, 2010.
  • W. Stassen. Your news in 140 characters: exploring the role of social media in journalism. Global Media Journal-African Edition, 4(1):116–131, 2010.
  • S. Vieweg. Microblogged contributions to the emergency arena: Discovery, interpretation and implications. Computer Supported Collaborative Work, pages 515–516, 2010.
  • CNBC. False rumor of explosion at white house causes stocks to briefly plunge; ap confirms its twitter feed was hacked. http://www.cnbc.com/id/100646197 (2013).
  • CBS. Brits get 4 years prison for facebook riot posts. http://www.cbsnews.com/2100-202 162-20093364.html (2011).
  • Richards, J., and Lewis, P. How twitter was used to spread and knock down rumours during the riots. http://www.guardian.co.uk/uk/2011/dec/07/how-twitter-spread-rumours- riots (2011).
  • Zeldin, W. Venezuela: Twitter users arrested on charges of spreading rumors. http://www.loc.gov/lawweb/servlet/lloc news? disp3 l205402106 text (2010).
  • Vosoughi, Soroush. Automatic detection and verification of rumors on Twitter. Diss. Massachusetts Institute of Technology, 2015.
  • http://www.pewinternet.org/fact-sheet/social-media/.
  • A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD2007 workshop on Web mining and social network analysis, pages 56–65. ACM, 2007.
  • M. Naaman, J. Boase, and C.-H. Lai. Is it really about me?: message content in social awareness streams. In Proceedings of the 2010 ACM conference on Computer supported cooperative work, pages 189–192. ACM, 2010.
  • P. Analytics. Twitter study–august 2009. San Antonio, TX: Pear Analytics. Available at: www. pearanalytics. com/blog/wp-content/uploads/2010/05/Twitter-Study- August-2009. Pdf, 2009.
  • H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World Wide Web, pages 591–600. ACM, 2010.
  • T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World Wide Web, pages 851–860. ACM, 2010.
  • J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D. Lieberman, and J. Sperling. Twitterstand: news in tweets. In Proceedings of the 17th acm sigspatial international conference on advances in geographic information systems, pages 42–51. ACM, 2009.
  • V. Lampos, T. De Bie, and N. Cristianini. Flu detector-tracking epidemics on twitter. In Machine Learning and Knowledge Discovery in Databases, pages 599–602. Springer, 2010.
  • K. Starbird, L. Palen, A. L. Hughes, and S. Vieweg. Chatter on the red: what hazards threat reveals about theocial life of microblogged information. In Proceedings of the 2010 ACM conference on Computer supported cooperative work, pages 241– 250. ACM, 2010.
  • S. Vieweg. Microblogged contributions to the emergency arena: Discovery, interpretation and implications. Computer Supported Collaborative Work, pages 515–516, 2010.
  • S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen. Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In Proceedings of the SIGCHI conference on human factors in computing systems, pages 1079–1088. ACM, 2010.
  • K. Kireyev, L. Palen, and K. Anderson. Applications of topics models to analysis of disaster-related twitter data. In NIPS Workshop on Applications for Topic Models: Text and Beyond, volume 1, 2009.
  • P. Earle, M. Guy, R. Buckmaster, C. Ostrum, S. Horvath, and A. Vaughan. Omg earthquake! Can twitter improve earthquake response? Seismological Research Letters, 81(2):246–251, 2010.
  • A. L. Hughes and L. Palen. Twitter adoption and use in mass convergence and emergency events. International Journal of Emergency Management, 6(3):248–260, 2009.
  • B. De Longueville, R. S. Smith, and G. Luraschi. Omg, from here, i can see the flames!: a use case of mining location based social networks to acquire spatiotemporal data on forest fires. In Proceedings of the 2009 international workshop on location based social networks, pages 73–80. ACM, 2009.
  • K. Poulsen. Firsthand reports from california wildfires pour through twitter. Retrieved February, 15:2009, 2007.
  • Gupta, Aditi, and Ponnurangam Kumaraguru. "Credibility ranking of tweets during high impact events." Proceedings of the 1st workshop on privacy and security in online social media. ACM, 2012
  • Zhao, W. X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., and Li, X. Comparing twitter and traditional media using topic models. In Proceedings of the 33rd European conference on Advances in information retrieval (Berlin, Heidelberg, 2011), ECIR’11, Springer-Verlag, pp. 338–349.
  • Castillo, C., Mendoza, M., and Poblete, B. Information credibility on twitter. In Proceedings of the 20th international conference on World Wide Web (New York, NY, USA, 2011), WWW ’11, ACM, pp. 675–684.
  • Gupta, M., Zhao, P., & Han, J. (2012, April). Evaluating event credibility on twitter. In Proceedings of the 2012 SIAM International Conference on Data Mining (pp. 153-164). Society for Industrial and Applied Mathematics.
  • Ratkiewicz, Jacob, et al. "Truthy: mapping the spread of astroturf in microblog streams." Proceedings of the 20th international conference companion on World Wide Web. ACM, 2011.
  • Mendoza, M., Poblete, B., and Castillo, C. Twitter under crisis: can we trust what we rt? In Proceedings of the First Workshop on Social Media Analytics (New York, NY, USA, 2010), SOMA ’10, ACM, pp. 71–79.
  • G. Barbier and H. Liu. Information provenance in social media. Social Computing, Behavioral-Cultural Modeling and Prediction, pages 276–283, 2011.
  • R. Guha, R. Kumar, P. Raghavan, and A. Tomkins. Propagation of trust and distrust. In Proceedings of the 13th international conference on World Wide Web, pages 403–412. ACM, 2004.
  • Saikaew, Kanda Runapongsa, and Chaluemwut Noyunsan. "Features for measuring credibility on facebook information." International Scholarly and Scientific Research & Innovation 9.1 (2015): 174-177.
  • Platt, John. "Sequential minimal optimization: A fast algorithm for training support vector machines." (1998).
  • Morris, Meredith Ringel, et al. "Tweeting is believing?: understanding microblog credibility perceptions." Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work. ACM, 2012.

Abstract Views: 267

PDF Views: 0




  • LD- SMO Algorithm for Determining Trust in Skewed Social Media Data

Abstract Views: 267  |  PDF Views: 0

Authors

Shifaa Basharat Fazili
University of Kashmir, India
Manzoor Ahmad
University of Kashmir, India

Abstract


Collaborative Web Applications (CWAs) have become a pervasive part of internet. Social media (such as Face book, twitter), topical forums, wikis are all examples of CWAs – that enable a community of end users to interact or cooperate towards a common goal. Some key characteristics of CWAs such as low entry barrier, instant updates, large number of friends, open platform, anonymity make it vulnerable to activities of ill intentions thereby providing a medium for nefarious persons to operate. One such collaborative system is Twitter which has experienced enormous growth in a small amount of time and users do spend respectable amount of their everyday time interacting about various topics with other peers. Twitter has evolved from being a conversation or opinion sharing medium among friends into a platform to share and disseminate information about current events. Events in the real world create a corresponding spur of posts (tweets) on Twitter. In other words it has evolved from a micro blogging service to a major news source. Although a large volume of content is posted on Twitter, not all information is trustworthy or useful in providing information about the event. Gossips, fake news etc. are also a part of genuine news. The main aim of this paper is to tackle the issue of accuracy paradox, a major problem when dealing with social media research, were the data extracted by us was highly imbalanced. This high imbalance in the data was solved by designing an LD-SMO algorithm which achieved an accuracy of ~96% with an equally comparable sensitivity and specificity.  


Keywords


LD-SMO: Linear Discriminant-Sequential Minimal Optimization, NB: NaiveBayes, R: Reliable, Se: Sensitivity, Sp: Specificity, UR: Unreliable, SVM: Support Vector Machines.

References