Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

An Analysis of Various Record Matching Approaches and Similarity Computations


Affiliations
1 Karunya University, India
     

   Subscribe/Renew Journal


Linking or matching databases is becoming increasingly important in many data mining projects, as linked data can contain information that is not available otherwise, or that would be too expensive to collect manually. Record matching refers to the task of finding similar entities in two or more records. Performing record matching solves the duplication detection problems; hence the needs for identifying the suitable record matching technique follow. This paper presents a survey on record matching techniques highlighting what approaches are utilized, the number of classifiers used, multiple stages of duplication detection performed, thus comparing each technique with other. This paper also exhibits the various matching metrics available. Further, we want to point out potential pitfalls as well as challenging issues need to be addressed by a record matching technique. And then we exhibit an unsupervised method to perform record matching on a web database scenario. We believe that the results of this evaluation will help analyst to come with more easier and feasible methods for record matching. This is a real challenging task particularly in Web scenario.

Keywords

Duplication Detection, Record Matching, Similarity Calculation, Unsupervised.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 239

PDF Views: 3




  • An Analysis of Various Record Matching Approaches and Similarity Computations

Abstract Views: 239  |  PDF Views: 3

Authors

Cyju Varghese
Karunya University, India
Naveen Sundar
Karunya University, India

Abstract


Linking or matching databases is becoming increasingly important in many data mining projects, as linked data can contain information that is not available otherwise, or that would be too expensive to collect manually. Record matching refers to the task of finding similar entities in two or more records. Performing record matching solves the duplication detection problems; hence the needs for identifying the suitable record matching technique follow. This paper presents a survey on record matching techniques highlighting what approaches are utilized, the number of classifiers used, multiple stages of duplication detection performed, thus comparing each technique with other. This paper also exhibits the various matching metrics available. Further, we want to point out potential pitfalls as well as challenging issues need to be addressed by a record matching technique. And then we exhibit an unsupervised method to perform record matching on a web database scenario. We believe that the results of this evaluation will help analyst to come with more easier and feasible methods for record matching. This is a real challenging task particularly in Web scenario.

Keywords


Duplication Detection, Record Matching, Similarity Calculation, Unsupervised.