MICE and ADASYN for Missing Data Imputation and Imbalanced Data Handling on Heart Disease Classification

Anita Desiani; Deshinta Arrova Dewi; Ali Amran; Ananda Pratiwi; Yuli Andriani; Endro Setyo Cahyono

doi:10.26554/sti.2025.10.4.1020-1030

Anita Desiani, Deshinta Arrova Dewi, Ali Amran, Ananda Pratiwi, Yuli Andriani, Endro Setyo Cahyono

https://doi.org/10.26554/sti.2025.10.4.1020-1030

Issue
Vol. 10 No. 4 (2025): October

Keywords:

Imbalanced Data, Heart Disease, Health Risk, Missing Data, Public Health

FULL TEXT PDF

Abstract

The quality of data is determined by several things, namely the completeness and balance data. The heart disease dataset from the University of California, Irvine (UCI) has missing and imbalanced data, which if it is not handled, can lead to a lack of accuracy in the prediction model and errors in interpreting the data. To overcome missing data, several methods can be used, one of which is data imputation. Attributes with missing data of 5% or less are handled using imputation methods such as Mean, Mode, and MICE. Attributes with numeric types are handled by Mean. Attributes with categorical types are imputed byMode. Attributes with more than 5% missing data are imputed using the MICE method. Imbalanced data can be handled by applying an oversampling method using the Adaptive Synthetic Sampling Approach (ADASYN). The effect of imputing missing data and addressing class imbalance on heart disease classification performance was tested using Random Forest, Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP) algorithms. After handling missing values and data imbalance, improvements were observed in the classification results. The accuracy, precision, recall, and F1-score showed excellent performance, above 90% on several classification methods. The results indicate that handling missing and imbalanced data through Mean, Mode, MICE, and ADASYN positively impacts the performance of classifiers on the UCI heart disease dataset.

References

Aditsania, A. and A. L. Saonard (2017). Handling Imbalanced Data in Churn Prediction Using ADASYN and Backpropagation Algorithm. In 2017 3rd International Conference on Science in Information Technology (ICSITech). IEEE, pages 533–536

Al Khaldy, M. and C. Kambhampati (2016). Performance Analysis of Various Missing Value Imputation Methods on Heart Failure Dataset. In Proceedings of SAI Intelligent Systems Conference. pages 415–425

Ali, H., M. N. M. Salleh, K. Hussain, A. Ahmad, A. Ullah, A. Muhammad, R. Naseem, and M. Khan (2019). A Review on Data Preprocessing Methods for Class Imbalance Problem. International Journal of Engineering & Technology, 8(3); 390–397

Austin, P. C., I. R. White, D. S. Lee, and S. van Buuren (2021). Missing Data in Clinical Research: A Tutorial on Multiple Imputation. Canadian Journal of Cardiology, 37(9); 1322–1331

Bayuaji, L., Kusnadi, M. Y. Amzah, and D. Pebrianti (2024). Optimization of Feature Selection in Support Vector Machines (SVM) Using Recursive Feature Elimination (RFE) and Particle Swarm Optimization (PSO) for Heart Disease Detection. In 2024 9th International Conference on Mechatronics Engineering (ICOM). IEEE, pages 304–309

Chen, M., Y. Hao, K. Hwang, L. Wang, and L. Wang (2017). Disease Prediction by Machine Learning over Big Data from Healthcare Communities. IEEE Access, 5; 8869–8879

De Diego, I. M., A. R. Redondo, R. R. Fernández, J. Navarro, and J. M. Moguerza (2022). General Performance Score for Classification Problems. Applied Intelligence, 52(10); 12049–12063

Desiani, A., Y. Andriani, I. Ramayanti, S. Priyanta, B. Suprihatin, C. N. Apriyani, and M. Arhami (2024). RIB-Net as Modification of CNN Architecture for Semantic Segmentation of Optic Disc and Optic Cup. Biomedical Engineering: Applications, Basis and Communications, 36(06); 2450036

Desiani, A., N. R. Dewi, A. N. Fauza, N. Rachmatullah, M. Arhami, and M. Nawawi (2021a). Handling Missing Data Using Combination of Deletion Technique, Mean, Mode, and Artificial Neural Network Imputation for Heart Disease Dataset. Science and Technology Indonesia, 6(4); 303–312

Desiani, A., S. Yahdin, A. Kartikasari, and I. Irmeilyana (2021b). Handling the Imbalanced Data with Missing Value Elimination SMOTE in the Classification of the Relevance Education Background with Graduates Employment. IAES International Journal of Artificial Intelligence, 10(2); 346

Douzas, G., F. Bacao, and F. Last (2018). Improving Imbalanced Learning Through a Heuristic Oversampling Method Based on K-Means and SMOTE. Information Sciences, 465; 1–20

Ebenuwa, S. H., M. S. Sharif, M. Alazab, and A. Al-Nemrat (2019). Variance Ranking Attributes Selection Techniques for Binary Classification Problem in Imbalance Data. IEEE Access, 7; 24649–24666

Gabr, M. I., Y. M. Helmy, and D. S. Elzanfaly (2023). Effect of Missing Data Types and Imputation Methods on Supervised Classifiers: An Evaluation Study. Big Data and Cognitive Computing, 7(1); 55

Guan, S., H. Yang, and T. Wu (2023). Transformer Fault Diagnosis Method Based on TLR-ADASYN Balanced Dataset. Scientific Reports, 13(1); 23010

Hasan, M. K., M. A. Alam, S. Roy, A. Dutta, M. T. Jawad, and S. Das (2021). Missing Value Imputation Affects the Performance of Machine Learning: A Review and Analysis of the Literature (2010–2021). Informatics in Medicine Unlocked, 27; 100799

Jäger, S., A. Allhorn, and F. Bießmann (2021). A Benchmark for Data Imputation Methods. Frontiers in Big Data, 4; 693674

Khan, S. I. and A. S. M. L. Hoque (2020). SICE: An Improved Missing Data Imputation Technique. Journal of Big Data, 7(1); 37

Kurniawati, Y. E., A. E. Permanasari, and S. Fauziati (2018). Adaptive Synthetic-Nominal (ADASYN-N) and Adaptive Synthetic-KNN (ADASYN-KNN) for Multiclass Imbalance Learning on Laboratory Test Data. In 2018 4th International Conference on Science and Technology (ICST). IEEE, pages 1–6

Lee, D.-H., S.-E. Woo, M.-W. Jung, and T.-Y. Heo (2022). Evaluation of Odor Prediction Model Performance and Variable Importance According to Various Missing Imputation Methods. Applied Sciences, 12(6); 2826

Liu, D., D. Liang, and C. Wang (2016). A Novel Three-Way Decision Model Based on Incomplete Information System. Knowledge-Based Systems, 91; 32–45

Mamilla, M. Y., R. Al-Haddad, and S. Chowdhury (2025). Resampling Imbalanced Healthcare Data for Predictive Modelling. International Journal of Advanced Computer Science and Applications, 16(2); 36–44

Mera-Gaona, M., U. Neumann, R. Vargas-Canas, and D. M. López (2021). Evaluating the Impact of Multivariate Imputation by MICE in Feature Selection. PLoS ONE, 16(7); 1–28

Misir, R. and R. K. Samanta (2017). A Study on Performance of UCI Hungarian Dataset Using Missing Value Management Techniques. International Journal of Computer Sciences and Engineering, 5(3); 40–44

Osisanwo, F. Y., J. E. T. Akinsola, O. Awodele, J. O. Hinmikaiye, O. Olakanmi, and J. Akinjobi (2017). Supervised Machine Learning Algorithms: Classification and Comparison. International Journal of Computer Trends and Technology (IJCTT), 48(3); 128–138

Pauzi, N. A. M., Y. B. Wah, S. M. Deni, S. K. N. A. Rahim, and Suhartono (2021). Comparison of Single and MICE Imputation Methods for Missing Values: A Simulation Study. Pertanika Journal of Science and Technology, 29(2); 979–998

Pedersen, A. B., E. M. Mikkelsen, D. Cronin-Fenton, N. R. Kristensen, T. M. Pham, L. Pedersen, and I. Petersen (2017). Missing Data and Multiple Imputation in Clinical Epidemiological Research. Clinical Epidemiology, 9; 157–166

Poolsawad, N., L. Moore, C. Kambhampati, and J. G. F. Cleland (2012). Handling Missing Values in Data Mining: A Case Study of Heart Failure Dataset. In 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, pages 2934–2938

Ramadhan, N. G. (2021). Comparative Analysis of ADASYN-SVM and SMOTE-SVM Methods on the Detection of Type 2 Diabetes Mellitus. Scientific Journal of Informatics, 8(2); 276–282

Reddy, K. V. V., I. Elamvazuthi, A. A. Aziz, S. Paramasivam, H. N. Chua, and S. Pranavanand (2021). Heart Disease Risk Prediction Using Machine Learning Classifiers with Attribute Evaluators. Applied Sciences, 11(18); 8352

Salamah, U., S. P. Sakti, A. Naba, and H. Soetedjo (2024). Identification of CO₂, SO₂, and a Mixture of Both Gases Using Optical Imaging Combined with Convolutional Neural Network (CNN). Science and Technology Indonesia, 9(2); 371–379

Seliem, M. M. (2022). Handling Outlier Data as Missing Values by Imputation Methods: Application of Machine Learning Algorithms. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 13(1); 273–286

Tan, H. (2021). Machine Learning Algorithm for Classification. Journal of Physics: Conference Series, 1994(1); 12016

Thabtah, F., S. Hammoud, F. Kamalov, and A. Gonsalves (2020). Data Imbalance in Classification: Experimental Evaluation. Information Sciences, 513; 429–441

Wongvorachan, T., S. He, andO. Bulut (2023). A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining. Information, 14(1); 54

Wu, X., H. Akbarzadeh Khorshidi, U. Aickelin, Z. Edib, and M. Peate (2019). Imputation Techniques on Missing Values in Breast Cancer Treatment and Fertility Data. Health Information Science and Systems, 7(1); 19