Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

A Survey on Efficient Big Data Clustering Using MapReduce


Affiliations
1 Pune Institute of Computer Technology, Pune, Maharashtra, India
2 Information Technology Department, Pune Institute of Computer Technology, Pune, Maharashtra, India
     

   Subscribe/Renew Journal


Clustering analysis is key point used by data processing algorithms in Data Mining. The primary aim of Clustering is to segment the data into more diminutive subsets called clusters, such that the data belonging to the same cluster are similar with some similarity metric. Clustering is imperative idea in data investigation and data mining applications. Over years, K-means has been popular clustering algorithm because of its ease of use and simplicity. Now days, as data size is continuously increasing, some researchers started working over distributed environment such as MapReduce to get high performance for big data clustering. In this paper, we explore the current works on efficient big data clustering algorithm using MapReduce framework.

Keywords

Clustering, Map-Reduce, K-Means, Distributed-Environment.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 345

PDF Views: 3




  • A Survey on Efficient Big Data Clustering Using MapReduce

Abstract Views: 345  |  PDF Views: 3

Authors

Avinash Dhanshetti
Pune Institute of Computer Technology, Pune, Maharashtra, India
Tushar Rane
Information Technology Department, Pune Institute of Computer Technology, Pune, Maharashtra, India

Abstract


Clustering analysis is key point used by data processing algorithms in Data Mining. The primary aim of Clustering is to segment the data into more diminutive subsets called clusters, such that the data belonging to the same cluster are similar with some similarity metric. Clustering is imperative idea in data investigation and data mining applications. Over years, K-means has been popular clustering algorithm because of its ease of use and simplicity. Now days, as data size is continuously increasing, some researchers started working over distributed environment such as MapReduce to get high performance for big data clustering. In this paper, we explore the current works on efficient big data clustering algorithm using MapReduce framework.

Keywords


Clustering, Map-Reduce, K-Means, Distributed-Environment.