Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Towards Efficient Distributed Algorithm with Minimum Communication Overhead


Affiliations
1 Marwadi Education Foundation Group of Institute, Gujarat Technological University, Ahmedabad, Gujarat, India
2 Department of Computer Engineering, Marwadi Education Foundation Group of Institutions, Rajkot, India
     

   Subscribe/Renew Journal


Currently, organizations are distributed geographically. Normally, all the sites locally store its day-to-day data, which is being updated. Centralized data mining algorithms can’t be used in such type of organizations for discovering useful patterns as merging of datasets from different sites is not feasible as well as it causes large network communication costs. Data mining in distributed form has emerged as an active sub-domain of data mining research. In distributed association rule mining algorithm, one of the major challenges is to reduce the communication overhead. Data sites are required to exchange lot of information in the data mining process which may generates communication overhead. This report proposes an association rule mining algorithm which minimizes the communication overhead among the participating data sites. Instead of transmitting all itemsets and their counts, The algorithm transmits a binary vector of frequently large itemsets using Message Passing Interface (MPI) technique. Another challenge is to reduce number of database scan and generate the frequent itemsets from the database. Hence an algorithm term as "Efficient Distributed dynamic itemset counting" is proposed. This algorithm reduces the time of scan of partition database which increases the performance of the algorithm.

Keywords

Association Rules, Distributed Environment, Minimum Communication Cost, Dynamic Itemset Counting, Frequent Pattern Growth, Support and Confidence.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 251

PDF Views: 2




  • Towards Efficient Distributed Algorithm with Minimum Communication Overhead

Abstract Views: 251  |  PDF Views: 2

Authors

Anil Pandya
Marwadi Education Foundation Group of Institute, Gujarat Technological University, Ahmedabad, Gujarat, India
Sahista Machchhar
Department of Computer Engineering, Marwadi Education Foundation Group of Institutions, Rajkot, India
Glory Shah
Department of Computer Engineering, Marwadi Education Foundation Group of Institutions, Rajkot, India

Abstract


Currently, organizations are distributed geographically. Normally, all the sites locally store its day-to-day data, which is being updated. Centralized data mining algorithms can’t be used in such type of organizations for discovering useful patterns as merging of datasets from different sites is not feasible as well as it causes large network communication costs. Data mining in distributed form has emerged as an active sub-domain of data mining research. In distributed association rule mining algorithm, one of the major challenges is to reduce the communication overhead. Data sites are required to exchange lot of information in the data mining process which may generates communication overhead. This report proposes an association rule mining algorithm which minimizes the communication overhead among the participating data sites. Instead of transmitting all itemsets and their counts, The algorithm transmits a binary vector of frequently large itemsets using Message Passing Interface (MPI) technique. Another challenge is to reduce number of database scan and generate the frequent itemsets from the database. Hence an algorithm term as "Efficient Distributed dynamic itemset counting" is proposed. This algorithm reduces the time of scan of partition database which increases the performance of the algorithm.

Keywords


Association Rules, Distributed Environment, Minimum Communication Cost, Dynamic Itemset Counting, Frequent Pattern Growth, Support and Confidence.