Open Access Open Access  Restricted Access Subscription Access

Intelligent Methods of Fusing the Knowledge During Incremental Learning via Clustering in a Distributed Environment


Affiliations
1 Bangalore Technological Institute (BTI) and Bangalore Educational Society for Technology Advancement and Research (BESTAR), Bengaluru-560035, India
2 Department of Studies in Computer Science, University of Mysore, Mysore-570006, India
3 Amphisoft Technologies Private Limited, Coimbatore, India
 

One of the ways of learning from the data which is physically distributed over multiple locations is to have a common learning mechanism at each of the source and knowledge of each of the learnt concepts has to be transmitted to a centralized location for assimilation. In this research, clustering is employed as a mechanism of learning and a cluster is viewed as a concept which is described by a set of variables. The set of variables which describes each of the clusters is being referred to as a knowledge packet (KP). As histograms have the generic ability to characterize any type of data, a histogram based regression line has been used as one of the variable to describe a KP. For online monitoring of the progression in learning apart from achieving computational ease and efficacy, the KPs at the centralized location are fused incrementally to get the overall knowledge. If learning mechanisms employed are data sequence sensitive, different combinations of merging the thus generated KPs may result in altogether a different overall knowledge. Further, the distance measure employed to find distance between the KPs in obtaining the optimal sequence of merging, may also result in a different overall knowledge. This phenomenon is being referred to as the problem of order effect. To minimize or avoid the order effect, a density based spatial clustering of applications with noise (DBSCAN) algorithm, which is insensitive to the order of presentation of data samples is used to learn from the data chunks and a novel methodology of finding the distance between the batches of data and there by finding the more optimal sequence of merging the KPs is presented. A specially designed distance measure for histogram based objects (histo-objects) is employed to find distance between the KPs and the nearest KPs are merged incrementally till certain conditions are satisfied. The proposed methods provide a robust mechanism of avoiding order effects. Since it is difficult to get the real distributed datasets, effectiveness of the proposed approaches is demonstrated with a carefully designed synthetic dataset. Some of the bench mark datasets were modified to simulate the distributed environment and experimentations with some of them show an accuracy of up to 100%.

Keywords

Cluster Analysis, Incremental Augmentation of Knowledge, Order Effect, Regression Analysis.
User
Notifications
Font Size

Abstract Views: 159

PDF Views: 0




  • Intelligent Methods of Fusing the Knowledge During Incremental Learning via Clustering in a Distributed Environment

Abstract Views: 159  |  PDF Views: 0

Authors

P. Nagabhushan
Bangalore Technological Institute (BTI) and Bangalore Educational Society for Technology Advancement and Research (BESTAR), Bengaluru-560035, India
Syed Zakir Ali
Department of Studies in Computer Science, University of Mysore, Mysore-570006, India
R. Pradeep Kumar
Amphisoft Technologies Private Limited, Coimbatore, India

Abstract


One of the ways of learning from the data which is physically distributed over multiple locations is to have a common learning mechanism at each of the source and knowledge of each of the learnt concepts has to be transmitted to a centralized location for assimilation. In this research, clustering is employed as a mechanism of learning and a cluster is viewed as a concept which is described by a set of variables. The set of variables which describes each of the clusters is being referred to as a knowledge packet (KP). As histograms have the generic ability to characterize any type of data, a histogram based regression line has been used as one of the variable to describe a KP. For online monitoring of the progression in learning apart from achieving computational ease and efficacy, the KPs at the centralized location are fused incrementally to get the overall knowledge. If learning mechanisms employed are data sequence sensitive, different combinations of merging the thus generated KPs may result in altogether a different overall knowledge. Further, the distance measure employed to find distance between the KPs in obtaining the optimal sequence of merging, may also result in a different overall knowledge. This phenomenon is being referred to as the problem of order effect. To minimize or avoid the order effect, a density based spatial clustering of applications with noise (DBSCAN) algorithm, which is insensitive to the order of presentation of data samples is used to learn from the data chunks and a novel methodology of finding the distance between the batches of data and there by finding the more optimal sequence of merging the KPs is presented. A specially designed distance measure for histogram based objects (histo-objects) is employed to find distance between the KPs and the nearest KPs are merged incrementally till certain conditions are satisfied. The proposed methods provide a robust mechanism of avoiding order effects. Since it is difficult to get the real distributed datasets, effectiveness of the proposed approaches is demonstrated with a carefully designed synthetic dataset. Some of the bench mark datasets were modified to simulate the distributed environment and experimentations with some of them show an accuracy of up to 100%.

Keywords


Cluster Analysis, Incremental Augmentation of Knowledge, Order Effect, Regression Analysis.