Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

A Survey of Data Cleansing Algorithms for Detecting Duplicate Records


Affiliations
1 Department of Information & Technology, Tamilnadu College of Engineering, Coimbatore, Tamilnadu, India
2 Department of Computer Science & Engineering, Tamilnadu College of Engineering, Coimbatore, Tamilnadu, India
     

   Subscribe/Renew Journal


In today's competitive environment, there is a need for more precise information for a better decision making. Yet the inconsistency in the data submitted makes it difficult to aggregate data and analyze results which may delays or data compromises in the reporting of results. The purpose of this article is to study the different algorithms available to clean the data to meet the growing demand of industry and the need for more standardised data. The data cleaning algorithms can increase the quality of data while at the same time reduce the overall efforts of data collection.

Keywords

Record Matching, Duplicate Detection, Data Cleaning, Data Integration, Data Deduplication, Entity Matching.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 267

PDF Views: 2




  • A Survey of Data Cleansing Algorithms for Detecting Duplicate Records

Abstract Views: 267  |  PDF Views: 2

Authors

R. Muthunagai
Department of Information & Technology, Tamilnadu College of Engineering, Coimbatore, Tamilnadu, India
A. Benaseer
Department of Computer Science & Engineering, Tamilnadu College of Engineering, Coimbatore, Tamilnadu, India

Abstract


In today's competitive environment, there is a need for more precise information for a better decision making. Yet the inconsistency in the data submitted makes it difficult to aggregate data and analyze results which may delays or data compromises in the reporting of results. The purpose of this article is to study the different algorithms available to clean the data to meet the growing demand of industry and the need for more standardised data. The data cleaning algorithms can increase the quality of data while at the same time reduce the overall efforts of data collection.

Keywords


Record Matching, Duplicate Detection, Data Cleaning, Data Integration, Data Deduplication, Entity Matching.