A Survey on Duplicate Detection Approaches in Hierarchical Data

Kiran Lokhande; Tushar Rane; S. T. Patil

A Survey on Duplicate Detection Approaches in Hierarchical Data

Kiran Lokhande ¹, Tushar Rane ¹, S. T. Patil ²

Affiliations
1 Pune Institute of Computer Technology, Pune, Maharashtra, India
2 Vishwakarma Institute of Technology, Pune, Maharashtra, India

Duplicate detection is the process of finding the duplicate objects in the data. This is the important part of data cleansing step of data mining. Significant amount of work has been done in duplicate detection of relational data, but only recently the researchers have shifted their focus towards duplicate detection in hierarchical and semi-structured data e.g. XML. In this paper we provide an overview of different methods for duplicate detection in hierarchical data and semi-structured data.