Open Access Open Access  Restricted Access Subscription Access

Detection and Handling of Different Types of Concept Drift in News Recommendation Systems


Affiliations
1 Informatics Department, Electronics Research Institute, Giza, Egypt
2 Computer Engineering Department, Cairo University, Giza, Egypt
 

To address the increase in volume of data streams online users interact with, there are a growing number of tools and models to summarize and extract information. These tools use prediction models to personalize and extract useful information. However, data streams are highly prone to the phenomena of concept drift, in which the data distribution changes over time. To maintain the performance level of these models, models should adapt to handle the existence of adrift. In this work, we present the Incremental Knowledge Concept Drift (IKCD) algorithm, an adaptive unsupervised learning algorithm for recommendation systems in news data stream. Data modelling in IKCD uses k-means clustering to determine the occurrence of a drift while avoiding the dependency on the availability of data labels. Once a drift is detected, new retraining data is composed from the old and new concept. IKCD is tested using synthetic and real benchmark datasets from various domains, which demonstrate the different drift types and with different rate of change. Experimental results illustrate an enhanced performance with respect to (a) reducing model sensitivity to noise, (b) reducing model rebuilding frequency up to 50% in case of re-occurring drift and (c) increasing accuracy of the model by about 10% with respect the accuracy of confidence distribution batch detection algorithm.

Keywords

Concept Drift, Change Detection, Recommendation Systems.
User
Notifications
Font Size

  • F. Ricci, B. Shapira, and L. Rokach, Recommender systems handbook, Second edition. 2015.
  • J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez, “Recommender systems survey,” Knowledge-Based Syst., vol. 46, pp. 109–132, 2013.
  • A. Adomavicius, Gediminas and Tuzhilin, “Towards the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions,” IEEE Trans. Knowl. &Data Eng., no. 6, pp. 734--749, 2005.
  • I. Žliobaitė, M. Pechenizkiy, and J. Gama, “An Overview of Concept Drift Applications,” pp. 91– 114, 2016.
  • J. D. Leskovec, Jure and Rajaraman, Anand and Ullman, Mining of massive datasets. 2014.
  • J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey on concept drift adaptation,” ACM Comput. Surv., vol. 46, no. 4, pp. 1–37, 2014.
  • A. Karpatne, “Predictive Learning with Heterogeneity in Populations,” 2017.
  • S. Wang, B. Zou, C. Li, K. Zhao, Q. Liu, and H. Chen, “CROWN: A Context-aware RecOmmender for Web News,” Proc. - Int. Conf. Data Eng., vol. 2015–May, pp. 1420–1423, 2015.
  • Y. Kadwe and V. Suryawanshi, “A Review on Concept Drift,” IOSR J. Comput. Eng., vol. 17, no. 1, pp. 20–26, 2015.
  • A. and B. Šili´c, “Exploring classification concept drift on a large news text corpus,” in International Conference on Intelligent Text Processing and Computational Linguistics, 2012, pp. 428--437.
  • M. M. {Gaber, “Advances in data stream mining,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 2, no. 1, pp. 79--85, 2012.
  • E. Lughofer, “On-line active learning: A new paradigm to improve practical useability of data stream modeling methods,” Inf. Sci. (Ny)., vol. 415, pp. 356--376, 2017.
  • B. J. Hammer, Hugo Lewi and Yazidi, Anis and Oommen, “On the classification of dynamical data streams using novel ‘Anti-Bayesian’ technique,” Pattern Recognit., vol. 76, pp. 108--124, 2018.
  • S.-L. Nguyen, Thi Thu Thuy and Nguyen, Tien Thanh and Liew, Alan Wee-Chung and Wang, “Variational inference based bayes online classifiers with concept drift adaptation,” Pattern Recognit., vol. 81, pp. 280--293, 2018.
  • G. Desrosiers, Christian and Karypis, A comprehensive survey of neighborhood-based recommendation methods. 2011.
  • I. Žliobaitė, “Learning under Concept Drift: an Overview,” pp. 1–36, 2010.
  • D. Brzezinski and J. Stefanowski, “Reacting to Different Types of Concept Drift :,” vol. 25, no. 1, pp. 81–94, 2014.
  • A. Tsymbal, “The problem of concept drift: definitions and related work,” Comput. Sci. Dep. Trinity Coll. Dublin, vol. 106, no. 2, 2004.
  • G. Widmer and M. Kubat, “Effective learning in dynamic environments by explicit context tracking,” Eur. Conf. Mach. Learn. (ECML 1993), vol. 667, pp. 227–243, 1993.
  • D. Klinkenberg, Ralf & Renz, Ingrid & Ag, “Adaptive Information Filtering: Learning in the Presence of Concept Drifts,” 1999.
  • P. R. L. Almeida, L. S. Oliveira, A. S. Britto, and R. Sabourin, “Adapting dynamic classifier selection for concept drift,” Expert Syst. Appl., vol. 104, pp. 67–85, 2018.
  • Y. Sun, K. Tang, Z. Zhu, and X. Yao, “Concept Drift Adaptation by Exploiting Historical Knowledge,” IEEE Trans. Neural Networks Learn. Syst., vol. 29, no. 10, pp. 4822–4832, 2018.
  • I. Khamassi, M. Sayed-Mouchaweh, M. Hammami, and K. Ghédira, “Self-Adaptive Windowing Approach for Handling Complex Concept Drift,” Cognit. Comput., vol. 7, no. 6, pp. 772–790, 2015.
  • L. I. Kuncheva, “Classifier ensembles for changing environments,” in International Workshop on Multiple Classifier Systems, 2004, pp. 1--15.
  • C. J. Tsai, C. I. Lee, and W. P. Yang, “Mining decision rules on data streams in the presence of concept drifts,” Expert Syst. Appl., vol. 36, no. 2 PART 1, pp. 1164–1178, 2009.
  • W. F. Hsiao and T. M. Chang, “An incremental cluster-based approach to spam filtering,” Expert Syst. Appl., vol. 34, no. 3, pp. 1599–1608, 2008.
  • R. Bifet, Albert and Gavalda, “Learning from time-changing data with adaptive windowing,” in Proceedings of the 2007 SIAM international conference on data mining, 2007, pp. 443--448.
  • P. Lindstrom, B. Mac Namee, and S. J. Delany, “Drift detection using uncertainty distribution divergence,” Evol. Syst., vol. 4, no. 1, pp. 13–25, 2013.
  • Y. Kim and C. H. Park, “An efficient concept drift detection method for streaming data under limited labeling,” IEICE Trans. Inf. Syst., vol. E100D, no. 10, pp. 2537–2546, 2017.
  • A. Liu, J. Lu, F. Liu, and G. Zhang, “Accumulating regional density dissimilarity for concept drift detection in data streams,” Pattern Recognit., vol. 76, pp. 256–272, 2018.
  • F. Ricci, L. Rokach, and B. Shapira, Recommender Systems Handbook. 2015.
  • K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, “Feature-rich part-of-speech tagging with a cyclic dependency network,” Proc. 2003 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - NAACL ’03, vol. 1, no. June, pp. 173–180, 2003.
  • J. O. JOSEPHSEN, “Hypertensjon og hjertets st??rrelse.,” Nord. Med., vol. 56, no. 37, pp. 1335– 1339, 1956.
  • Wael H. Gomaa and Aly A. Fahmy, “A Survey of Text Similarity Approaches,” Int. J. Comput. Appl., vol. 68, no. 13, pp. 13–18, 2013.
  • A. Bifet et al., “Early Drift Detection Method,” 4th ECML PKDD Int. Work. Knowl. Discov. from Data Streams, vol. 6, pp. 77–86, 2006.
  • P. Gama, Joao and Medas, Pedro and Castillo, Gladys and Rodrigues, “Learning with drift detection,” in Brazilian symposium on artificial intelligence, 2004, pp. 286--295.
  • B. M. Sundheim, “Tipster/MUC-5 information extraction system evaluation,” Proc. a Work. held Fredericksburg, Virginia Sept. 19-23, 1993 -, p. 147, 1993.

Abstract Views: 416

PDF Views: 223




  • Detection and Handling of Different Types of Concept Drift in News Recommendation Systems

Abstract Views: 416  |  PDF Views: 223

Authors

Nayer Wanas
Informatics Department, Electronics Research Institute, Giza, Egypt
Ahmed Farouk
Informatics Department, Electronics Research Institute, Giza, Egypt
Dina Said
Informatics Department, Electronics Research Institute, Giza, Egypt
Nabila Khodeir
Informatics Department, Electronics Research Institute, Giza, Egypt
Magda Fayek
Computer Engineering Department, Cairo University, Giza, Egypt

Abstract


To address the increase in volume of data streams online users interact with, there are a growing number of tools and models to summarize and extract information. These tools use prediction models to personalize and extract useful information. However, data streams are highly prone to the phenomena of concept drift, in which the data distribution changes over time. To maintain the performance level of these models, models should adapt to handle the existence of adrift. In this work, we present the Incremental Knowledge Concept Drift (IKCD) algorithm, an adaptive unsupervised learning algorithm for recommendation systems in news data stream. Data modelling in IKCD uses k-means clustering to determine the occurrence of a drift while avoiding the dependency on the availability of data labels. Once a drift is detected, new retraining data is composed from the old and new concept. IKCD is tested using synthetic and real benchmark datasets from various domains, which demonstrate the different drift types and with different rate of change. Experimental results illustrate an enhanced performance with respect to (a) reducing model sensitivity to noise, (b) reducing model rebuilding frequency up to 50% in case of re-occurring drift and (c) increasing accuracy of the model by about 10% with respect the accuracy of confidence distribution batch detection algorithm.

Keywords


Concept Drift, Change Detection, Recommendation Systems.

References