Detecting and Removing Duplicate Records from Multiple Web Databases

Tapashi Paul; V. Ulagamuthalvi

Detecting and Removing Duplicate Records from Multiple Web Databases

Tapashi Paul , V. Ulagamuthalvi

Affiliations
1 Sathyabama University, Chennai, India

Subscribe/Renew Journal

The basic scenario is that the queries3 given by the users are matched and the results are got from multiple web databases. The complexity in this concept arises due to the presence of duplicate and redundant records. To solve this, an unsupervised online record matching method, UDD is used which is mainly for identifying the duplicates from the query results of multiple web databases. The duplicate records are identified and ignored. Only the original records are displayed to the user. For these purpose two classifiers namely Weighted Component Similarity Summing classifier (WCSS) & Support Vector Machine (SVM) classifier are used iteratively to find the duplicates and filter those records from multiple web databases. The other concept is to avoid duplication of websites. Generally static weightage were allocated for URLs. Instead of static weightage, the idea of dynamic weightage is introduced here. Dynamic weightage is allocated to the respective URLs to avoid unauthorized users to create duplicate sites. This proves that the UDD works much better than the existing methods.

Keywords

Duplicate, Record Matching, Weightage, Database and Multiple Result.

I-Scholar

Journal Help

User

Subscription Login to verify subscription

Notifications

Journal Content
Browse

Font Size

Information

Software Engineering

Detecting and Removing Duplicate Records from Multiple Web Databases

Subscribe/Renew Journal

Keywords

Detecting and Removing Duplicate Records from Multiple Web Databases

Authors

Abstract

Keywords

Username
Password
Remember me

Username
Password
Remember me