Identification of Duplicate Records Over Query Results from Real Time Web Databases

J. Aruna; J. Jeysree

Identification of Duplicate Records Over Query Results from Real Time Web Databases

J. Aruna ¹, J. Jeysree ²

Affiliations
1 B. S. Abdur Rahman University, Chennai, India
2 SRM University, Chennai, India

Subscribe/Renew Journal

Abstract
References
Article Metrics
Refbacks

Detecting database records that are approximate duplicates is an important task. A database having unintentional duplication of records created from the millions of data from other sources can hardly be avoided. Databases may contain duplicate records that represent the same real world entity because of data entry errors, abbreviations, detailed schemas of records from multiple databases. Supervised methods are the current techniques used for duplication detection, which requires trained data. These methods are not applicable for the real time database scenario, where the records to match are query results dynamically generated on the fly. To address the problem of record matching in such database scenario, we present a Unsupervised Duplication Detection (UDD), for a given query the algorithm can effectively identify duplicates from the query result records of multiple databases. In the algorithm proposed, we start from the non duplicate set and use a weighted component similarity summing classifier and an OSVM classifier, to iteratively identify duplicates in the query results from multiple databases.

Keywords

Record Matching, Duplication Detection, SVM, UDD.

I-Scholar

Journal Help

User

Subscription Login to verify subscription

Notifications

Journal Content
Browse

Font Size

Information

Abstract Views: 262

PDF Views: 2

Data Mining and Knowledge Engineering

Identification of Duplicate Records Over Query Results from Real Time Web Databases

Subscribe/Renew Journal

Keywords

Identification of Duplicate Records Over Query Results from Real Time Web Databases

Authors

Abstract

Keywords

Username
Password
Remember me

Username
Password
Remember me