Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Clustering Approach in Context Free Data Cleaning


     

   Subscribe/Renew Journal


In this era of Knowledge, organizations can gain competitive advantage only by proficient data analysis. This paper emphasizes on application of clustering in context free data cleaning by correcting values of attributes, using various sequence similarity metrics, where reference data set is not available, to improve the quality of data which in turn lead to eminent data analysis. Authors propose an algorithm to examine suitability of value to correct other values of attributes. Various sequence similarity metrics were used, to find distance of two values of attributes, to test the data and generate results. Experimental results show how the approach can effectively clean the data without reference data.

Keywords

Clustering, Context Free Data Cleaning, Sequence Similarity Metrics
Subscription Login to verify subscription
User
Notifications
Font Size


  • Hui Xiong, Gaurav Pandey, Michael Steinbach, Vipin Kumar “Enhancing Data Analysis with Noise Removal” in IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 3, pp. 304-319, March 2006.
  • Lukasz Ciszak “Application of Clustering and Association Methods in Data Cleaning”, in Proc. of Int. Multiconference on Computer Science and Information Technology, Vol. 3, pp. 97-103, 2008.
  • Sohil D Pandya, Dr. Paresh V Virparia “Data Cleaning in Knowledge Discovery in Databases: Various Approaches”, in Proc. of National Seminar on Current Trends in IT (CTICT) – 2009, February 2009.
  • W Cohen, P Ravikumar, S Fienberg “A Comparison of String Distance Metrics for Name-Matching Tasks” in Proc. of the IJCAI-2003
  • http://en.wikipedia.org/
  • http://www. dcs.shef.ac.uk/~sam/simmetric.html

Abstract Views: 328

PDF Views: 2




  • Clustering Approach in Context Free Data Cleaning

Abstract Views: 328  |  PDF Views: 2

Authors

Abstract


In this era of Knowledge, organizations can gain competitive advantage only by proficient data analysis. This paper emphasizes on application of clustering in context free data cleaning by correcting values of attributes, using various sequence similarity metrics, where reference data set is not available, to improve the quality of data which in turn lead to eminent data analysis. Authors propose an algorithm to examine suitability of value to correct other values of attributes. Various sequence similarity metrics were used, to find distance of two values of attributes, to test the data and generate results. Experimental results show how the approach can effectively clean the data without reference data.

Keywords


Clustering, Context Free Data Cleaning, Sequence Similarity Metrics

References