Open Access
Subscription Access
Open Access
Subscription Access
Data Tuner for Effective Data Pre-Processing
Subscribe/Renew Journal
In real world datasets, lots of redundant and conflicting data exists. The performance of a classification algorithm in data mining is greatly affected by noisy information (i.e. redundant and conflicting data). These parameters not only increase the cost of mining process, but also degrade the detection performance of the classifiers. They have to be removed to increase the efficiency and accuracy of the classifiers. This process is called as the tuning of the dataset. The redundancy check will be performed on the original dataset and the resultant is to be preserved. This resultant dataset is to be then checked for conflicting data and if they will be corrected and updated to the original dataset. This updated dataset is to be then classified using a variety of classifiers like Multilayer perceptron, SVM, Decision stump, Kstar, LWL, Rep tree, Decision table, ID3, J48 and Naïve Bayes. The performance of the updated datasets on these classifiers is to be found. The results will show a significant improvement in the classification accuracy when redundancy and conflicts are to be removed. The conflicts after correction ate be updated to the original dataset, and when the performance of the classifier is to be evaluated, great improvement is to be witnessed.
Keywords
Data Mining, Classification Algorithm, Redundancy, Conflicting Data.
User
Subscription
Login to verify subscription
Font Size
Information
Abstract Views: 220
PDF Views: 2