Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Data Tuner for Effective Data Pre-Processing


Affiliations
1 Department of Information Technology, Thiagarajar College of Engineering, Madurai, India
     

   Subscribe/Renew Journal


In real world datasets, lots of redundant and conflicting data exists. The performance of a classification algorithm in data mining is greatly affected by noisy information (i.e. redundant and conflicting data). These parameters not only increase the cost of mining process, but also degrade the detection performance of the classifiers. They have to be removed to increase the efficiency and accuracy of the classifiers. This process is called as the tuning of the dataset. The redundancy check will be performed on the original dataset and the resultant is to be preserved. This resultant dataset is to be then checked for conflicting data and if they will be corrected and updated to the original dataset. This updated dataset is to be then classified using a variety of classifiers like Multilayer perceptron, SVM, Decision stump, Kstar, LWL, Rep tree, Decision table, ID3, J48 and Naïve Bayes. The performance of the updated datasets on these classifiers is to be found. The results will show a significant improvement in the classification accuracy when redundancy and conflicts are to be removed. The conflicts after correction ate be updated to the original dataset, and when the performance of the classifier is to be evaluated, great improvement is to be witnessed.

Keywords

Data Mining, Classification Algorithm, Redundancy, Conflicting Data.
User
Subscription Login to verify subscription
Notifications
Font Size

Abstract Views: 196

PDF Views: 2




  • Data Tuner for Effective Data Pre-Processing

Abstract Views: 196  |  PDF Views: 2

Authors

S. Appavu Alias Balamurugan
Department of Information Technology, Thiagarajar College of Engineering, Madurai, India
A. B. Arockia Christopher
Department of Information Technology, Thiagarajar College of Engineering, Madurai, India

Abstract


In real world datasets, lots of redundant and conflicting data exists. The performance of a classification algorithm in data mining is greatly affected by noisy information (i.e. redundant and conflicting data). These parameters not only increase the cost of mining process, but also degrade the detection performance of the classifiers. They have to be removed to increase the efficiency and accuracy of the classifiers. This process is called as the tuning of the dataset. The redundancy check will be performed on the original dataset and the resultant is to be preserved. This resultant dataset is to be then checked for conflicting data and if they will be corrected and updated to the original dataset. This updated dataset is to be then classified using a variety of classifiers like Multilayer perceptron, SVM, Decision stump, Kstar, LWL, Rep tree, Decision table, ID3, J48 and Naïve Bayes. The performance of the updated datasets on these classifiers is to be found. The results will show a significant improvement in the classification accuracy when redundancy and conflicts are to be removed. The conflicts after correction ate be updated to the original dataset, and when the performance of the classifier is to be evaluated, great improvement is to be witnessed.

Keywords


Data Mining, Classification Algorithm, Redundancy, Conflicting Data.