Open Access Open Access  Restricted Access Subscription Access

Comparision between Accuracy and MSE, RMSE by using Proposed Method with Imputation Technique


Affiliations
1 P.E.S. College of Engineering, Aurangabad. (M.S.), India
2 Dr. Babasaheb Ambedkar Marathwada University, Aurangabad. (M.S.), India
 

Presence of missing values in the dataset leads to difficult for data analysis in data mining task. In this research work, student dataset is taken contains marks of four different subjects in engineering college. Mean, Mode, Median Imputation were used to deal with challenges of incomplete data. By using MSE and RMSE on dataset using with proposed Method and imputation methods like Mean, Mode, and Median Imputation on the dataset and found out to be values of Mean Squared Error and Root Mean Squared Error for the dataset. Accuracy also found out to be using Proposed Method with Imputation Technique. Experimental observation it was found that, MSE and RMSE gradually decreases when size of the databases is gradually increases by using proposed Method. Also MSE and RMSE gradually increase when size of the databases is gradually increases by using simple imputation technique. Accuracy is also increases with increases size of the databases.

Keywords

Incomplete Data, Missing Values, Imputation, Mean Imputation, Median Imputation and Mode Imputation, MSE(Mean Squared Error), RMSE(Root Mean Squared Error )etc.
User
Notifications
Font Size

  • Dinesh J. Prajapati, Jagruti H. Prajapati, “Handling Missing Values: Application to University Data set”. Issue 1, Vol. 1(August2011), ISSN 2249-6149
  • Shamsher Singh, Prof. Jagdish Prasad, “Estimation of Missing Values in the Data Mining and comparison of Imputation Methods”. Mathemat ical Journal of Interdisciplinary Sciences Vol. 1, Issue 1, March 2013, pp. 75–90
  • Xiao Feng Zhu, Shichao Zhang, Senior Member, IEEE, Zhi Jin, Senior Member, IEEE, Zili Zhang, and Zhuoming Xu, “Missing Value Estimation for Mixed-Attribute Data Sets”. IEEE Transactions on Knowledge And Data Engineering, Vol. 23, No. 1, January 2011.
  • T.R.Sivapriya, V. Thavavel, A.R.Nadira Banu Kamal, “Imputation and classification of Missing Data Using Least Square Support Vector Machines, A New Approach in Dementia Diagnosis”, International Journal of Advanced Research in Artificial Intelligence, Vol.1, No.4, 2012
  • Yann-Yann Shieh, “Imputation Methods on General Linear Mixed Models of Longitudinal Studies”, American Institutes for Research
  • Edgar AcuNa And Caroline Rodriguez, “The Treatment Of Missing Values And Its Effect In The Classifier Accuracy Studies In Classification”, Data Analysis, And Knowledge Organization, 2004, Springer.Com
  • MS. R. Malarvizhi, Dr. Antony Thanamani, “Comparision of Imputation Techniques after Classifying the Dataset Using KNN Classifier for the Imputation of Missing Data”, International Journal of Computational Engineering Research (IJCER online.com) ISSN 2250-3005, Janaury-2013
  • Anjana Sharma, Naina Mehta, Iti Sharma, “Reasoning With Missing Values in Multi Attribute Datasets”. International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue5, May 2013 ISSN: 2277 128X
  • Luai Al Shalabi, “A comparative study of techniques to deal with missing data in data sets”, In Proceedings of the 4th International Multiconference on Computer Science and Information Technology CSIT 2006.
  • A. Pujar i, “Data Mining Techniques”, Universities Press, India, 2001.
  • Ragel, A. and Cremilleux, B., “MVC A preprocessing method to deal with missing values”, In Proceedings of Knowledge Based Systems 1999, 285-291.
  • Chih-Hung Wu, Chian-Huei Wun, HungJu Chou, “Using Associat ion Rules for Completing Missing Data”, Four th International Conference on Hybrid Intelligent Systems (HIS'04), 2004 pp.236-241.
  • Lakshminarayan K., Harp S., Goldman, R. and Samad, “Imputation of missing data using machine learning techniques, In Proceedings of the Second International Conference on Knowledge Discovery in Databases and Data Mining. T. 1996
  • Zhang, S.C., “Information Enhancement for Data Mining”, IEEE Intelligent Systems, 2004, Vol. 19(2): 12-13, (2004).
  • Qin, Y.S.,“Semi-parametric Optimization for Missing Data Imputation”, Applied Intelligence, 2007, 27(1): 79-88.
  • Zhang, C.Q.,” An Imputation Method for Missing Values”, PAKDD, LNAI, 4426, 2007: 1080-1087.
  • H an J. and Kamber M., “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, 2006, 2nd edition.
  • A. Dempster, N.M. Laird and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm”, J. Royal Statistical Soc., vol. 39, pp. 1-38, 1977.
  • R. Little and D. Rubin, “Statistical Analysis with Missing Data”, second ed. John Wiley and Sons, 2002.
  • D. Rubin, “Mul t iple Imputat ions for Nonresponsive in Surveys”, Wiley, 1987.
  • J.R. Quinlan, “C4.5: Programs for Machine Learning”. Morgan Kaufmann, 1993.
  • Q.H. Wang and R. Rao, “Empirical LikelihoodBased Inference under Imputation for Missing Response Data”, Annals of Statistics, vol. 30, pp. 896-924, 2002.
  • S.C. Zhang, “Par imputation: From Imputation and Null-Imputation to Partially Imputation”, IEEE Intelligent Informatics Bull., vol. 9, no. 1, pp. 32-38, Nov. 2008.
  • V.B.Kamble, S.N.Deshmukh, “Comparison of Percentage Error by using Imputation Method On Mid Term Examination Data”, International Journal of Innovations in Engineering Research and Technology (IJIERT),Impact Factor 2.77, Volume 2, Issue 12,2015
  • V.B.Kamble, S.N.Deshmukh,”Comparative Analysis Of Standard Error Using Imputation Method”, ICITDCEME’15 Conference Proceedings on International Conference on Innovations and Technological Developments in Computer, Electronics and Mechanical Engineering, 28-29, December 2015, VACOE Ahmednagar.ISSN N0.2394-3696
  • V.B.Kamble, S.N.Deshmukh,” A Novel Hybrid Approach for Prediction of Missing Values In Numeric Dataset” Global Journal of Engineering Science and Research Management, Impact Factor: 2.265, ISSN 2349-4506

Abstract Views: 227

PDF Views: 0




  • Comparision between Accuracy and MSE, RMSE by using Proposed Method with Imputation Technique

Abstract Views: 227  |  PDF Views: 0

Authors

V. B. Kamble
P.E.S. College of Engineering, Aurangabad. (M.S.), India
S. N. Deshmukh
Dr. Babasaheb Ambedkar Marathwada University, Aurangabad. (M.S.), India

Abstract


Presence of missing values in the dataset leads to difficult for data analysis in data mining task. In this research work, student dataset is taken contains marks of four different subjects in engineering college. Mean, Mode, Median Imputation were used to deal with challenges of incomplete data. By using MSE and RMSE on dataset using with proposed Method and imputation methods like Mean, Mode, and Median Imputation on the dataset and found out to be values of Mean Squared Error and Root Mean Squared Error for the dataset. Accuracy also found out to be using Proposed Method with Imputation Technique. Experimental observation it was found that, MSE and RMSE gradually decreases when size of the databases is gradually increases by using proposed Method. Also MSE and RMSE gradually increase when size of the databases is gradually increases by using simple imputation technique. Accuracy is also increases with increases size of the databases.

Keywords


Incomplete Data, Missing Values, Imputation, Mean Imputation, Median Imputation and Mode Imputation, MSE(Mean Squared Error), RMSE(Root Mean Squared Error )etc.

References





DOI: https://doi.org/10.13005/ojcst%2F10.04.11