Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Missing Value Imputation and Normalization Techniques in Myocardial Infarction


Affiliations
1 Department of Computer Applications, Sri GVG Visalakshi College for Women, India
2 Department of Computer Science, Kongunadu Arts and Science College, India
     

   Subscribe/Renew Journal


Missing Data imputation is an important research topic in data mining. In general, real data contains missing values. The presence of the missing value in the data set has a major problem for precise prediction. The objective of this paper is to highlight possible improvement of existing algorithm for medical data. KNBP imputation method based on KNN and BPCA is proposed and evaluate MSE and RMSE estimates. Normalization is done by comparing three algorithms namely min-max normalization, Z-score and decimal scaling. The experiment is done with standard bench mark data and real time collected data. KNBP imputation method and Decimal Scaling Algorithm for Normalization got lower error rate.

Keywords

Mean, Hot Deck, KNN, BPCA, KNBP, Min-Max Algorithm, Z-Score, Decimal Scaling.
Subscription Login to verify subscription
User
Notifications
Font Size

  • A. Sudha, P. Gayathri and N. Jaishankar, “Utilization of Data Mining Approaches for Prediction of Life Threatening Disease Survivability”, International Journal of Computer Applications, Vol. 14, No. 17, pp. 51-56, 2012.
  • M. Durairaj and S. Sivagowry, “A Pragmatic Approach of Preprocessing the Data Set for Heart Disease Prediction”, International Journal of Innovative Research in Computer and Communication Engineering, Vol. 2, No. 11, pp. 23-29, 2014.
  • Tahani Aljuaid and Sreela Sasi, “Proper Imputation Techniques for Missing Values in Data sets”, Proceedings of International Conference on Data Science and Engineering, pp. 168-176, 2016.
  • Peter Schmitt, Jonas Mandel and Mickael Guedj, “A Comparison of Six Methods for Missing Data Imputation”, Journal of Biometrics and Biostatistics, Vol. 6, No. 1, pp. 1-6, 2015.
  • Vinod Bharat, Balaji Shelale, K. Khandelwal and Sushant Navsare, “A Review Paper on Data Mining Techniques”, International Journal of Engineering Science and Computing, Vol. 4, No. 5, pp. 1976-1979, 2016.
  • Suad A. Alasadi and Wesam S. Bhaya, “Review of Data Preprocessing Techniques in Data Mining”, Journal of Engineering and Applied Sciences, Vol. 12, No. 16, pp. 4102-4107, 2017.
  • J.F. Mac Gregor and T. Kourti, “Statistical Process Control of Multivariate Processes”, Control Engineering Practice, Vol. 3, No. 3, pp. 403-414, 1995.
  • R. Dunia, S.J. Qin and T.F. Edgar, “Identification of Faulty Sensors using Principal Component Analysis”, AICHE Journal, Vol. 42, No. 10, pp. 2797-2812, 1996.
  • R. Little, “Statistical Analysis with Missing Data”, 2nd Edition, Wiley Press, 2002.
  • Nazri Mohd Nawi, Walid Hasen Atomi and M. Z. Rehman, “The Effect of Data Pre-Processing on Optimized Training of Artificial Neural Networks”, Procedia Technology, Vol. 11, pp. 32-39, 2013.
  • Bhavisha Suthar, Hemant Patel and Ankur Goswami, “A Survey: Classification of Imputation Methods in Data Mining”, International Journal of Emerging Technology and Advanced Engineering, Vol. 2, No. 1, pp. 309-312, 2012.
  • Runmin Wei, Jingye Wang, Mingming Su, Erik Jia, Tianlu Chen and Yan Ni, “Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data”, Scientific Reports, Vol. 8, pp. 663-674, 2017.
  • Kristen A. Seversaon, Mark C. Molaro and Richard D. Braatz, “Principal component Analysis of process Datasets with Missing Values”, Processes, Vol. 5, No. 3, pp. 38-49, 2017.
  • Hiroshi De Silva and A. Shehan Perera, “Missing Data Imputation using Evolutionary K-Nearest Neighbor Algorithm for Gene Expression Data”, Proceedings of 6th International Conference on Advances in ICT for Emerging Regions, 2017.
  • Dan Zeng, Dan Xie, Ran Liu and Xiaodong Li, “Missing Value Imputation Methods for TCM Medical Data and its Effect in the Classifier Accuracy”, Proceedings of 16th International Conference on Advances in ICT for Emerging Regions, pp. 339-354, 2017.
  • S. Thirukumaran and A. Sumathi, “Improving Accuracy Rate of Imputation of Missing Data using Classifier Methods”, Proceedings of 10th International Conference on Intelligent Systems and Control, pp. 1243-1251, 2017.
  • Shichao Zhang, Debo Cheng, Zhenyun Deng, Ming Zong and Xuelian Deng, “A Novel KNN Algorithm with Data Driven K Parameter Computation”, Pattern Recognition Letters, 2017.
  • Xianglin Yang, Yunhai Tong, Xiang Shuai Zhao, Zhi xu, Yanjunli, Xin Jia and Shaohna Tan, “Adaptive Logistic Group Lasso Method for Predicting the No-Reflow among the Multiple Types of High Dimensional Variables with Missing Data”, Proceedings of 7th International Conference on Software Engineering and Service Science, pp. 461-469, 2017.
  • Mehran Amiri and Richard Jensen, “Missing Data Imputation using Fuzzy-Rough Methods”, Neurocomputing, Vol. 205, pp. 152-164, 2016.
  • Asma Saleem, Khadim Hussain Asif, Ahmad Ali and Shahid Mahmood Awan, “Pre-Processing Methods of Data Mining” Proceedings of 7th International Conference on Utility and Cloud Computing, pp. 651-663, 2014.
  • Xiangyang Liu, “A Preprocessing Method of AdaBoost for Mislabeled Data Classification”, Proceedings of 29th International Conference on Control and Decision, pp. 23-32, 2017.
  • A. Daraei, H. Hamaidi, “An Efficient Predictive Model for Myocardial Infarction using Cost-sensitive J48 Model”, Iran Journal of Public Health, Vol. 46, No. 5, pp. 682-692, 2017.
  • Thripurna Thatipelli and Padmavathi Kora, “Classification of Myocardial Infarction using Discrete Wavelet Transform and Support Vector Machine”, International Research Journal of Engineering and Technology, Vol. 4, No. 7, pp. 429-432, 2017.
  • V. Hemalatha and C. Usha Nandhini, “An Efficient Approach for Constructing a Model for Diagnosing Heart Disease Dataset”, International Journal of Contemporary Research in Computer Science and Technology, Vol. 3, No. 3, pp. 41-44, 2017.
  • Sarab AlMuhaideb, “An Individualized Preprocessing for Medical Data Classification”, Procedia Computer Science, Vol. 82, pp. 35-42, 2016.
  • Hojat Hamidi and Atefeh Daraci, “A New Hybrid Method for Improving the Performance of Myocardial Infarction Prediction”, Journal of Community Health Research, Vol. 5, No. 2, pp. 110-120, 2016.
  • Muhammad Sheikh Sadi, et al., “A New Approach to Extract Features from ECG Signals”, Proceedings of 2nd International Conference on Electrical Information and Communication Technology, pp. 189-194, 2015.
  • S. Selva Nithyananthan, S. Saranya and R. Santha Selva Kumari, “Myocardial Infarction Detection and Heart Patient Identity Verification”, Proceedings of International Conference on Wireless Communications, Signal Processing and Networking, pp. 1107-1111, 2016.
  • Manuel Martin Salvador, Marcin Budka and Bogdan Gabrys, “Effects of Change Propagation Resulting from Adaptive Preprocessing in Multicomponent Predictive Systems”, Procedia Computer Science, Vol. 96, pp. 713-722, 2016.
  • V. Seenivasagam and R. Chitra, “Myocardial Infarction Detection using Intelligent Algorithms”, Neural Network World, Vol. 1, pp. 91-110, 2016.
  • S. Gopal Krishna Patro, Kishore Kumar sahu, “Normalization: A Preprocessing Stage”, Available at: https://arxiv.org/ftp/arxiv/papers/1503/1503.06462.pdf.
  • Padmavathi Kora and Sri Ramakrishna Kalva, “Improved Bat Algorithm for the Detection of Myocardial Infarction”, Springerplus, Vol. 3, No. 4, pp. 666-678, 2015.
  • Thripurna Thatipelli and Padmavathi Kora, “Classification of Myocardial Infarction using Discrete Wavelet Transform and Support Vector Machine”, International Research Journal of Engineering and Technology, Vol. 4, No. 7, pp. 429-432, 2014.
  • M. Durairaj and S. Sivagowry, “A Pragmatic Approach of Preprocessing the Data Set for Heart Disease Prediction”, International Journal of Innovative Research in Computer and Communication Engineering, Vol. 2, No. 11, pp. 6457-6465, 2014.
  • S. Florence, N.G. Bhuvaneswari Amma, G. Annapoorani and K. Malathi, “Predicting the Risk of Heart Attacks using Neural Network and Decision Tree”, International Journal of Innovative Research in Computer and Communication Engineering, Vol. 2, No. 11, pp. 7025-7030, 2014.
  • Li Xiang-Wei and Qi Yian-Fang, “A Data Preprocessing Algorithm for Classification Model based on Rough Sets”, Physics Procedia, Vol. 25, pp. 2025-2029, 2012.
  • V.V. Jaya Rama Krishniah, D.V. Chandra Sekar and K. Ramchand H Rao, “Predicting the Heart Attack Symptoms using Biomedical Data Mining Techniques”, The International Journal of Computer Science and Applications, Vol. 1, No. 3, pp. 10-18, 2012.
  • V. Kumutha and S. Palaniammal, “An Enchanced Approach on Handling Missing Values using Bagging K-NN Imputation”, Proceedings of International Conference on Computer Communication and Informatics, pp. 123-128, 2013.
  • Tahani Aljuaid and Sreela Sasi, “Proper Imputation Techniques for Missing Values in Data sets”, Proceedings of International Conference on Data Science and Engineering, pp. 108-116, 2016.
  • S. Oba, I. Takemasa, M. Monden and K. Matsubara, “A Bayesian Missing Value Estimation Method for Gene Expression Profile Data”, Bioinformatics, Vol. 19, No. 16, pp. 2088-2096, 2003.

Abstract Views: 268

PDF Views: 2




  • Missing Value Imputation and Normalization Techniques in Myocardial Infarction

Abstract Views: 268  |  PDF Views: 2

Authors

K. Manimekalai
Department of Computer Applications, Sri GVG Visalakshi College for Women, India
A. Kavitha
Department of Computer Science, Kongunadu Arts and Science College, India

Abstract


Missing Data imputation is an important research topic in data mining. In general, real data contains missing values. The presence of the missing value in the data set has a major problem for precise prediction. The objective of this paper is to highlight possible improvement of existing algorithm for medical data. KNBP imputation method based on KNN and BPCA is proposed and evaluate MSE and RMSE estimates. Normalization is done by comparing three algorithms namely min-max normalization, Z-score and decimal scaling. The experiment is done with standard bench mark data and real time collected data. KNBP imputation method and Decimal Scaling Algorithm for Normalization got lower error rate.

Keywords


Mean, Hot Deck, KNN, BPCA, KNBP, Min-Max Algorithm, Z-Score, Decimal Scaling.

References