Open Access
Subscription Access
Open Access
Subscription Access
A Comparison of Missing Data Handling Techniques
Subscribe/Renew Journal
Missing data is a regular concern on data that professionals have to deal with. Efficient analysis techniques have to be followed to find interesting patterns. In this study, we are comparing 16 different imputation methods namely Linear, Index, Values, Nearest, Zero, slinear, Quadratic, Cubic, Barycentric, Krogh, Polynomial, Spline, Piecewise Polynomial, From derivatives, Pchip and Akima. These techniques are performed on real time UCI dataset and are under Missing Completely at a Random (MCAR) assumption, our result suggests the nearest, zero, quadratic and polynomial imputation methods which provides above 96% of accuracy when compared to the other techniques.
Keywords
Missing Data, Imputation Methods, Missing Completely at Random.
Subscription
Login to verify subscription
User
Font Size
Information
- R.J. Little and D.B. Rubin, “Statistical Analysis with Missing Data”, Wiley Press, 2019.
- J. Sim, J.S. Lee and O. Kwon, “Missing Values and Optimal Selection of an Imputation Method and Classification Algorithm to Improve the Accuracy of Ubiquitous Computing Applications”, Mathematical problems in Engineering, Vol. 2015, pp. 1-18, 2015.
- Peter Schmitt, Jonas Mandel and Mickael Guedj, “A Comparison of Six Methods for Missing Data Imputation”, Journal of Biometrics and Biostatistics, Vol. 6, No. 1, pp. 1-6, 2015.
- Xueying Xu, Leizhen Xia, Qimeng Zhang, Shaoning Wu, Mingcheng Wu and Hongbo Liu, “The Ability of Different Imputation Methods for Missing Values in Mental Measurement Questionnaires”, BMC Medical Research Methodology, Vol. 20, No. 42, pp. 1-16, 2020.
- R.M. Thomas, W. Bruin and P. Zhutovsky, “Dealing with Missing Data, Small Sample Sizes, and Heterogeneity in Machine Learning Studies of Brain Disorders”, Academic Press, 2020.
- J.M. Jerez, I. Molina and P.J. García-Laencina, “Missing Data Imputation using Statistical and Machine Learning Methods in a Real Breast Cancer Problem”, Artificial Intelligence in Medicine, Vol. 50, No. 2, pp. 105-115, 2010.
- Iris Data Set, Available at https://archive.ics.uci.edu/ml/datasets/Iris, Accessed at 2020.
- Credit Card Fraud, Available at https://www.kaggle.com/mlg-ulb/ creditcardfraud, Accessed at 2016.
- Wine Data, Available at https://www.kaggle.com/sgus1318/winedata, Accessed at 2020.
- The Boston Housing Dataset, Available at https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html, Accessed at 2020.
- Scipy, Available at https://www.scipy.org, Accessed at 2020.
- D.B. Rubin, “Inference and Missing Data”, Biometrika, Vol. 63, No. 3, pp. 581-592, 1976.
Abstract Views: 369
PDF Views: 1