Open Access Open Access  Restricted Access Subscription Access

Effective Handling of Recurring Concept Drifts in Data Streams


Affiliations
1 Department of CSE,School of Engineering and Technology, K. R. Mangalam University, Gurgaon – 122103, Haryana, India
2 Division of CoE, Netaji Subhas Institute of Technology, Dwarka, New Delhi - 110078, India
 

Background: Nowadays, many applications involve huge amounts of data with variations in underlying concept. This large data needs to be handled with high accuracy, even in a resource-constrained environment. Objectives: In order to achieve better generalization accuracy while handling data with drifting concepts mainly recurrent drifts, we proposed an ensemble system called Recurring Dynamic Weighted Majority (RDWM). Methods: Our system maintains a primary online ensemble consisting of experts that represent the present concepts and a secondary ensemble that maintains experts representing the old concepts, since the beginning of learning. An effective pruning methodology helps to remove redundant and old classifiers from the system. Findings: Experimental analysis using Stagger dataset shows that our system proves to be the best system for handling dataset containing abrupt as well as recurrent drifts, achieving the best prequential accuracy using an optimal window size. RDWM proves to be highly resource effective as compared to EDDM approach. Experimental evaluation using a real world electricity pricing dataset proves RDWM to be the best system, performing very accurately even in a resource-constrained environment. Improvements: We can further enhance our system to handle novelty detection in data streams.

Keywords

Concept Drift, Data Streams, Recurring, Recurring Concept
User

Abstract Views: 168

PDF Views: 0




  • Effective Handling of Recurring Concept Drifts in Data Streams

Abstract Views: 168  |  PDF Views: 0

Authors

Parneeta Dhaliwal
Department of CSE,School of Engineering and Technology, K. R. Mangalam University, Gurgaon – 122103, Haryana, India
M. P. S. Bhatia
Division of CoE, Netaji Subhas Institute of Technology, Dwarka, New Delhi - 110078, India

Abstract


Background: Nowadays, many applications involve huge amounts of data with variations in underlying concept. This large data needs to be handled with high accuracy, even in a resource-constrained environment. Objectives: In order to achieve better generalization accuracy while handling data with drifting concepts mainly recurrent drifts, we proposed an ensemble system called Recurring Dynamic Weighted Majority (RDWM). Methods: Our system maintains a primary online ensemble consisting of experts that represent the present concepts and a secondary ensemble that maintains experts representing the old concepts, since the beginning of learning. An effective pruning methodology helps to remove redundant and old classifiers from the system. Findings: Experimental analysis using Stagger dataset shows that our system proves to be the best system for handling dataset containing abrupt as well as recurrent drifts, achieving the best prequential accuracy using an optimal window size. RDWM proves to be highly resource effective as compared to EDDM approach. Experimental evaluation using a real world electricity pricing dataset proves RDWM to be the best system, performing very accurately even in a resource-constrained environment. Improvements: We can further enhance our system to handle novelty detection in data streams.

Keywords


Concept Drift, Data Streams, Recurring, Recurring Concept



DOI: https://doi.org/10.17485/ijst%2F2017%2Fv10i30%2F158439