Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

An Evaluation under Concealment of Duplication Entities in XML Documents


Affiliations
1 Kathir College of Engineering, Coimbatore, Tamilnadu, India
     

   Subscribe/Renew Journal


Detecting duplicates is a significant of data cleaning; the mission is to recognize multiple representations of a same real-world data or business data and necessary to improve the value of data. Number of approaches both for relational and XML data are exist.  As   XML    is popularly used for data exchange and data publishing on the Web, algorithms to detect duplicates in XML documents are required.   XML is a language used for publish data on web so the possibility of error and noise will occur. Hence, the data should be cleaned, which requires solutions for fuzzy duplicate detection in XML. The hierarchical and semi-structured nature of XML strongly differs from the flat and structured relational model, which has received the main attention in duplicate detection so far.  We consider the challenges of detecting duplicates in XML to develop valuable, well-organized solutions to the problem. We present a comparison of algorithms, which are used to perform duplicate detection effectively for all kinds of XML objects, given dependencies between different XML elements.


Keywords

Revelation of Duplication, Data Cleaning, XML Data, Similar Objects.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 239

PDF Views: 2




  • An Evaluation under Concealment of Duplication Entities in XML Documents

Abstract Views: 239  |  PDF Views: 2

Authors

R. Thiyagarajan
Kathir College of Engineering, Coimbatore, Tamilnadu, India
S. Priyanka
Kathir College of Engineering, Coimbatore, Tamilnadu, India
T. K. P. Rajagopal
Kathir College of Engineering, Coimbatore, Tamilnadu, India

Abstract


Detecting duplicates is a significant of data cleaning; the mission is to recognize multiple representations of a same real-world data or business data and necessary to improve the value of data. Number of approaches both for relational and XML data are exist.  As   XML    is popularly used for data exchange and data publishing on the Web, algorithms to detect duplicates in XML documents are required.   XML is a language used for publish data on web so the possibility of error and noise will occur. Hence, the data should be cleaned, which requires solutions for fuzzy duplicate detection in XML. The hierarchical and semi-structured nature of XML strongly differs from the flat and structured relational model, which has received the main attention in duplicate detection so far.  We consider the challenges of detecting duplicates in XML to develop valuable, well-organized solutions to the problem. We present a comparison of algorithms, which are used to perform duplicate detection effectively for all kinds of XML objects, given dependencies between different XML elements.


Keywords


Revelation of Duplication, Data Cleaning, XML Data, Similar Objects.