Open Access Open Access  Restricted Access Subscription Access

CIODD:Cluster Identification and Outlier Detection in Distributed Data


Affiliations
1 Department of Computer Science, Sri Ganganagar Engineering College, Sri Ganganagar, India
2 SBCET, Jaipur, India
 

Clustering has become an increasingly important task in modern application domains such as marketing and purchasing assistance, multimedia, molecular biology etc. The goal of clustering is to decompose or partition a data set into groups such that both the intra-group similarity and the intergroup dissimilarity are maximized. In many applications, the size of the data that needs to be clustered is much more than what can be processed at a single site. Further, the data to be clustered could be inherently distributed. The increasing demand to scale up to these massive data sets which are inherently distributed over networks with limited bandwidth and computational resources has led to methods for parallel and distributed data clustering. In this thesis, we present CIODD, a cohesive framework for cluster identification and outlier detection for distributed data. The core idea is to generate independent local models and combine the local models at a central server to obtain global clusters. A feedback loop is then provided from the central site to the local sites to complete and refine the global clusters obtained. Our experimental results show the efficiency and accuracy of the CIODD approach. 

Keywords

Cluster, Data Mining, Data Warehousing.
User
Notifications
Font Size

Abstract Views: 229

PDF Views: 0




  • CIODD:Cluster Identification and Outlier Detection in Distributed Data

Abstract Views: 229  |  PDF Views: 0

Authors

Eena Gilhotra
Department of Computer Science, Sri Ganganagar Engineering College, Sri Ganganagar, India
Saroj Hiranwal
SBCET, Jaipur, India

Abstract


Clustering has become an increasingly important task in modern application domains such as marketing and purchasing assistance, multimedia, molecular biology etc. The goal of clustering is to decompose or partition a data set into groups such that both the intra-group similarity and the intergroup dissimilarity are maximized. In many applications, the size of the data that needs to be clustered is much more than what can be processed at a single site. Further, the data to be clustered could be inherently distributed. The increasing demand to scale up to these massive data sets which are inherently distributed over networks with limited bandwidth and computational resources has led to methods for parallel and distributed data clustering. In this thesis, we present CIODD, a cohesive framework for cluster identification and outlier detection for distributed data. The core idea is to generate independent local models and combine the local models at a central server to obtain global clusters. A feedback loop is then provided from the central site to the local sites to complete and refine the global clusters obtained. Our experimental results show the efficiency and accuracy of the CIODD approach. 

Keywords


Cluster, Data Mining, Data Warehousing.