Open Access
Subscription Access
Open Access
Subscription Access
Clustering Approach in Context Free Data Cleaning
Subscribe/Renew Journal
In this era of Knowledge, organizations can gain competitive advantage only by proficient data analysis. This paper emphasizes on application of clustering in context free data cleaning by correcting values of attributes, using various sequence similarity metrics, where reference data set is not available, to improve the quality of data which in turn lead to eminent data analysis. Authors propose an algorithm to examine suitability of value to correct other values of attributes. Various sequence similarity metrics were used, to find distance of two values of attributes, to test the data and generate results. Experimental results show how the approach can effectively clean the data without reference data.
Keywords
Clustering, Context Free Data Cleaning, Sequence Similarity Metrics
Subscription
Login to verify subscription
User
Font Size
Information
- Hui Xiong, Gaurav Pandey, Michael Steinbach, Vipin Kumar “Enhancing Data Analysis with Noise Removal” in IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 3, pp. 304-319, March 2006.
- Lukasz Ciszak “Application of Clustering and Association Methods in Data Cleaning”, in Proc. of Int. Multiconference on Computer Science and Information Technology, Vol. 3, pp. 97-103, 2008.
- Sohil D Pandya, Dr. Paresh V Virparia “Data Cleaning in Knowledge Discovery in Databases: Various Approaches”, in Proc. of National Seminar on Current Trends in IT (CTICT) – 2009, February 2009.
- W Cohen, P Ravikumar, S Fienberg “A Comparison of String Distance Metrics for Name-Matching Tasks” in Proc. of the IJCAI-2003
- http://en.wikipedia.org/
- http://www. dcs.shef.ac.uk/~sam/simmetric.html
Abstract Views: 375
PDF Views: 2