MICE and ADASYN for Missing Data Imputation and Imbalanced Data Handling on Heart Disease Classification
Abstract
The quality of data is determined by several things, namely the completeness and balance data. The heart disease dataset from the University of California, Irvine (UCI) has missing and imbalanced data, which if it is not handled, can lead to a lack of accuracy in the prediction model and errors in interpreting the data. To overcome missing data, several methods can be used, one of which is data imputation. Attributes with missing data of 5% or less are handled using imputation methods such as Mean, Mode, and MICE. Attributes with numeric types are handled by Mean. Attributes with categorical types are imputed byMode. Attributes with more than 5% missing data are imputed using the MICE method. Imbalanced data can be handled by applying an oversampling method using the Adaptive Synthetic Sampling Approach (ADASYN). The effect of imputing missing data and addressing class imbalance on heart disease classification performance was tested using Random Forest, Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP) algorithms. After handling missing values and data imbalance, improvements were observed in the classification results. The accuracy, precision, recall, and F1-score showed excellent performance, above 90% on several classification methods. The results indicate that handling missing and imbalanced data through Mean, Mode, MICE, and ADASYN positively impacts the performance of classifiers on the UCI heart disease dataset.
References
Al Khaldy, M. and C. Kambhampati (2016). Performance Analysis of Various Missing Value Imputation Methods on Heart Failure Dataset. In Proceedings of SAI Intelligent Systems Conference. pages 415–425
Ali, H., M. N. M. Salleh, K. Hussain, A. Ahmad, A. Ullah, A. Muhammad, R. Naseem, and M. Khan (2019). A Review on Data Preprocessing Methods for Class Imbalance Problem. International Journal of Engineering & Technology, 8(3); 390–397
Austin, P. C., I. R. White, D. S. Lee, and S. van Buuren (2021). Missing Data in Clinical Research: A Tutorial on Multiple Imputation. Canadian Journal of Cardiology, 37(9); 1322–1331
Bayuaji, L., Kusnadi, M. Y. Amzah, and D. Pebrianti (2024). Optimization of Feature Selection in Support Vector Machines (SVM) Using Recursive Feature Elimination (RFE) and Particle Swarm Optimization (PSO) for Heart Disease Detection. In 2024 9th International Conference on Mechatronics Engineering (ICOM). IEEE, pages 304–309
Chen, M., Y. Hao, K. Hwang, L. Wang, and L. Wang (2017). Disease Prediction by Machine Learning over Big Data from Healthcare Communities. IEEE Access, 5; 8869–8879
De Diego, I. M., A. R. Redondo, R. R. Fernández, J. Navarro, and J. M. Moguerza (2022). General Performance Score for Classification Problems. Applied Intelligence, 52(10); 12049–12063
Desiani, A., Y. Andriani, I. Ramayanti, S. Priyanta, B. Suprihatin, C. N. Apriyani, and M. Arhami (2024). RIB-Net as Modification of CNN Architecture for Semantic Segmentation of Optic Disc and Optic Cup. Biomedical Engineering: Applications, Basis and Communications, 36(06); 2450036
Desiani, A., N. R. Dewi, A. N. Fauza, N. Rachmatullah, M. Arhami, and M. Nawawi (2021a). Handling Missing Data Using Combination of Deletion Technique, Mean, Mode, and Artificial Neural Network Imputation for Heart Disease Dataset. Science and Technology Indonesia, 6(4); 303–312
Desiani, A., S. Yahdin, A. Kartikasari, and I. Irmeilyana (2021b). Handling the Imbalanced Data with Missing Value Elimination SMOTE in the Classification of the Relevance Education Background with Graduates Employment. IAES International Journal of Artificial Intelligence, 10(2); 346
Douzas, G., F. Bacao, and F. Last (2018). Improving Imbalanced Learning Through a Heuristic Oversampling Method Based on K-Means and SMOTE. Information Sciences, 465; 1–20
Ebenuwa, S. H., M. S. Sharif, M. Alazab, and A. Al-Nemrat (2019). Variance Ranking Attributes Selection Techniques for Binary Classification Problem in Imbalance Data. IEEE Access, 7; 24649–24666
Gabr, M. I., Y. M. Helmy, and D. S. Elzanfaly (2023). Effect of Missing Data Types and Imputation Methods on Supervised Classifiers: An Evaluation Study. Big Data and Cognitive Computing, 7(1); 55
Guan, S., H. Yang, and T. Wu (2023). Transformer Fault Diagnosis Method Based on TLR-ADASYN Balanced Dataset. Scientific Reports, 13(1); 23010
Hasan, M. K., M. A. Alam, S. Roy, A. Dutta, M. T. Jawad, and S. Das (2021). Missing Value Imputation Affects the Performance of Machine Learning: A Review and Analysis of the Literature (2010–2021). Informatics in Medicine Unlocked, 27; 100799
Jäger, S., A. Allhorn, and F. Bießmann (2021). A Benchmark for Data Imputation Methods. Frontiers in Big Data, 4; 693674
Khan, S. I. and A. S. M. L. Hoque (2020). SICE: An Improved Missing Data Imputation Technique. Journal of Big Data, 7(1); 37
Kurniawati, Y. E., A. E. Permanasari, and S. Fauziati (2018). Adaptive Synthetic-Nominal (ADASYN-N) and Adaptive Synthetic-KNN (ADASYN-KNN) for Multiclass Imbalance Learning on Laboratory Test Data. In 2018 4th International Conference on Science and Technology (ICST). IEEE, pages 1–6
Lee, D.-H., S.-E. Woo, M.-W. Jung, and T.-Y. Heo (2022). Evaluation of Odor Prediction Model Performance and Variable Importance According to Various Missing Imputation Methods. Applied Sciences, 12(6); 2826
Liu, D., D. Liang, and C. Wang (2016). A Novel Three-Way Decision Model Based on Incomplete Information System. Knowledge-Based Systems, 91; 32–45
Mamilla, M. Y., R. Al-Haddad, and S. Chowdhury (2025). Resampling Imbalanced Healthcare Data for Predictive Modelling. International Journal of Advanced Computer Science and Applications, 16(2); 36–44
Mera-Gaona, M., U. Neumann, R. Vargas-Canas, and D. M. López (2021). Evaluating the Impact of Multivariate Imputation by MICE in Feature Selection. PLoS ONE, 16(7); 1–28
Misir, R. and R. K. Samanta (2017). A Study on Performance of UCI Hungarian Dataset Using Missing Value Management Techniques. International Journal of Computer Sciences and Engineering, 5(3); 40–44
Osisanwo, F. Y., J. E. T. Akinsola, O. Awodele, J. O. Hinmikaiye, O. Olakanmi, and J. Akinjobi (2017). Supervised Machine Learning Algorithms: Classification and Comparison. International Journal of Computer Trends and Technology (IJCTT), 48(3); 128–138
Pauzi, N. A. M., Y. B. Wah, S. M. Deni, S. K. N. A. Rahim, and Suhartono (2021). Comparison of Single and MICE Imputation Methods for Missing Values: A Simulation Study. Pertanika Journal of Science and Technology, 29(2); 979–998
Pedersen, A. B., E. M. Mikkelsen, D. Cronin-Fenton, N. R. Kristensen, T. M. Pham, L. Pedersen, and I. Petersen (2017). Missing Data and Multiple Imputation in Clinical Epidemiological Research. Clinical Epidemiology, 9; 157–166
Poolsawad, N., L. Moore, C. Kambhampati, and J. G. F. Cleland (2012). Handling Missing Values in Data Mining: A Case Study of Heart Failure Dataset. In 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, pages 2934–2938
Ramadhan, N. G. (2021). Comparative Analysis of ADASYN-SVM and SMOTE-SVM Methods on the Detection of Type 2 Diabetes Mellitus. Scientific Journal of Informatics, 8(2); 276–282
Reddy, K. V. V., I. Elamvazuthi, A. A. Aziz, S. Paramasivam, H. N. Chua, and S. Pranavanand (2021). Heart Disease Risk Prediction Using Machine Learning Classifiers with Attribute Evaluators. Applied Sciences, 11(18); 8352
Salamah, U., S. P. Sakti, A. Naba, and H. Soetedjo (2024). Identification of CO₂, SO₂, and a Mixture of Both Gases Using Optical Imaging Combined with Convolutional Neural Network (CNN). Science and Technology Indonesia, 9(2); 371–379
Seliem, M. M. (2022). Handling Outlier Data as Missing Values by Imputation Methods: Application of Machine Learning Algorithms. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 13(1); 273–286
Tan, H. (2021). Machine Learning Algorithm for Classification. Journal of Physics: Conference Series, 1994(1); 12016
Thabtah, F., S. Hammoud, F. Kamalov, and A. Gonsalves (2020). Data Imbalance in Classification: Experimental Evaluation. Information Sciences, 513; 429–441
Wongvorachan, T., S. He, andO. Bulut (2023). A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining. Information, 14(1); 54
Wu, X., H. Akbarzadeh Khorshidi, U. Aickelin, Z. Edib, and M. Peate (2019). Imputation Techniques on Missing Values in Breast Cancer Treatment and Fertility Data. Health Information Science and Systems, 7(1); 19
Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.