Diagnosis of Diabetes Mellitus in Women of Reproductive Age using the Prediction Methods of Naive Bayes, Discriminant Analysis, and Logistic Regression

Yulia Resti, Endang Sri Kresnawati, Novi Rustiana Dewi, Des Alwine Zayanti, Ning Eliyati


Diabetes is a chronic disease that can cause serious illness. Women are four times more likely to develop heart problems caused by diabetes. Women are also more prone to experience complications due to diabetes, such as kidney problems, depression, and decreased vision quality. Nearly 200 million women worldwide are affected by diabetes, with two out of five affected by the disease being women of reproductive age. This paper aims to predict women with at least 21 years of age having diabetes based on eight diagnostic measurements using the statistical learning methods; Multinomial Naive Bayes, Fisher Discriminant Analysis, and Logistic Regression. Model validation is built based on dividing the data into training data and test data based on 5-fold cross-validation. The model validation performance shows that the Gaussian Naïve Bayes is the best method in predicting diabetes diagnosis. This paper’s contribution is that all performance measures of the Multinomial Naïve Bayes method have a value greater than 93 %. These results are beneficial in predicting diabetes status with the same explanatory variables.


Abdollahi, J., B. N. Moghaddam, and M. E. Parvar (2019). Improv- ing diabetes diagnosis in smart health using genetic-based Ensemble learning algorithm. Approach to IoT Infrastructure. Future Gen Distrib Systems Journal, 1; 23–30
Adetunji, A., J. Oguntoye, O. Fenwa, and N. Akande (2018). Web Document Classification Using Naïve Bayes. Journal of Ad- vances in Mathematics and Computer Science, 29(6); 1–11
Agarwal, S., N. Jain, and S. Dholay (2015). Adaptive testing and performance analysis using naive bayes classifier. Procedia Computer Science, 45; 70–75
Alpaydin, E. (2020). Introduction to machine learning. 2nd ed. Massachusetts: Massachusetts Institute of Technology.
Araki, R., T. Yamada, K. Maruo, A. Araki, R. Miyakawa, H. Suzuki, and K. Hashimoto (2020). Gamma-Polyglutamic Acid-Rich Natto Suppresses Postprandial Blood Glucose Response in the Early Phase after Meals: A Randomized Crossover Study. Nutrients, 12(8); 2374
Aronoff, S. (1985). The minimum accuracy value as an index of classification accuracy. Photogrammetric Engineering and Remote Sensing, 51(1); 99–111
Bengio, Y. and Y. Grandvalet (2004). No unbiased estimator of the variance of k-fold cross-validation. Journal of machine learning research, 5; 1089–1105
Burger, S. V. (2018). Introduction to machine learning with R: Rigorous mathematical analysis. O’Reilly Media
Chatterjee, S. and J. S. Simonoff (2013). Handbook of regression analysis, volume 5. Wiley Online Library
Federation, I. D. (2016). Cost-effective solutions for the prevention of type 2 diabetes. Brussels, Belgium:International Diabetes Federation
Ghatak, A. (2017). Machine learning with R. Springer
Ghojogh, B., F. Karray, and M. Crowley (2019). Fisher and kernel Fisher discriminant analysis: Tutorial. Manifold Learning and
Dimensionality Reduction
Hallin, M. and D. Paindaveine (2009). Optimal tests for homo-
geneity of covariance, scale, and shape. Journal of Multivariate
Analysis, 100(3); 422–444
Hastie, T., R. Tibshirani, and J. Friedman (2009). The elements
of statistical learning: data mining, inference, and prediction.
Springer Science & Business Media
Hosmer Jr, D. W., S. Lemeshow, and R. X. Sturdivant (2013). Ap-
plied Logistic Regression Analysis. Applied logistic regression James, G., D. Witten, T. Hastie, and R. Tibshirani (2013). An
introduction to statistical learning. Springer
Karegowda, A. G., V. Punya, M. Jayaram, and A. Manjunath
(2012). Rule based classification for diabetic patients using cascaded k-means and decision tree C4. 5. International Journal of Computer Applications, 45(12); 45–50
Khadilkar, A., R. Mandlik, S. Chiplonkar, V. Khadilkar, V. Ekbote, and V. Patwardhan (2015). Reference centile curves for triceps skinfold thickness for Indian children aged 5–17 years and cut-offs for predicting risk of childhood hypertension: a multi- centric study. Indian pediatrics, 52(8); 675–680
Lampinen, R., K. Vehviläinen-Julkunen, and P. Kankkunen (2009). A review of pregnancy in women over 35 years of age. The open nursing journal, 3; 33
Lantz, B. (2013). Machine learning with R. Packt publishing, Birmingham-Mumbay
Le, K. T., C. Chaux, F. J. Richard, and E. Guedj (2020). An adapted linear discriminant analysis with variable selection for the classification in high-dimension, and an application to medical data. Computational Statistics & Data Analysis, 152; 107031
Liu, J. E. and P. A. Feng (2020). Image classification algorithm based on deep learning-kernel function. 14. Scientific program- ming, ID 7607612
Lohar, P., K. Dutta Chowdhury, H. Afli, H. Mohammad, and A. Way (2017). ADAPT at IJCNLP-2017 Task 4: a multinomial naive Bayes classification approach for customer feedback analysis task. Proceedings of the 8th International Joint Confer- ence on Natural Language Processing; 161–169
Maniruzzaman, M., N. Kumar, M. M. Abedin, M. S. Islam, H. S. Suri, A. S. El-Baz, and J. S. Suri (2017). Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm. Computer methods and programs in biomedicine, 152; 23–34
Marrodán, M. D., M. G.-M. de Espinosa, Á. Herráez, E. L. Alfaro, I. F. Bejarano, M. M. Carmenate, C. Prado, D. B. Lomaglio, N. López-Ejeda, A. Martínez, et al. (2015). Subscapular and tri- ceps skinfolds reference values of Hispanic American children and adolescents and their comparison with the reference of Centers for Disease Control and Prevention (CDC). Nutricion
hospitalaria, 32(6); 2862–2873
Mika, S., G. Ratsch, J. Weston, B. Scholkopf, and K.-R. Mullers
(1999). Fisher discriminant analysis with kernels. Neural networks for signal processing IX: Proceedings of the 1999 IEEE signal processing society workshop; 41–48
Nilashi, M., O. Ibrahim, M. Dalvi, H. Ahmadi, and L. Shahmoradi (2017). Accuracy improvement for diabetes disease classifica- tion: a case on a public medical dataset. Fuzzy Information and Engineering, 9(3); 345–357
Nishtar, S. (2017). Shape the Future of Diabetes. Diabetes Voice, 62(1); 23–26
Nuttall, F. Q. (2015). Body mass index: obesity, BMI, and health: a critical review. Nutrition today, 50(3); 117
Organization, W. H. (2017). Global diffusion of eHealth: making universal health coverage achievable: report of the third global survey on eHealth. World Health Organization
Resti, Y., I. Yani, F. Burlian, D. A. Zayanti, and I. M. Sari (2020). Improved the Cans Waste Classification Rate of Naive Bayes using Fuzzy Approach. Science and Technology Indonesia, 5(2); 75–78
Rodriguez, J. D., A. Perez, and J. A. Lozano (2009). Sensitivity analysis of k-fold cross validation in prediction error esti- mation. IEEE transactions on pattern analysis and machine intelligence, 32(3); 569–575
Sever, M., J. Lajovic, and B. Rajer (2005). Robustness of the Fisher’s discriminant function to skew-curved normal distri- bution. Metodoloski zvezki, 2(2); 231
Sisodia, D. and D. S. Sisodia (2018). Prediction of diabetes using classification algorithms. Procedia computer science, 132; 1578– 1585
Snedecor, G. W. and W. G. Cochran (1989). Statistical Methods, eight edition. Iowa state University Press, Ames,
Soria, D., J. M. Garibaldi, F. Ambrogi, E. M. Biganzoli, and I. O. Ellis (2011). A ‘non-parametric’ version of the naive Bayes classifier. Knowledge-Based Systems, 24(6); 775–784
Survey, N. F. H. (2017). International Institute for Population Sciences. Mumbai 400 088
Székely, G. J. and M. L. Rizzo (2005). A new test for multivariate normality. Journal of Multivariate Analysis, 93(1); 58–80
Tigga, N. P. and S. Garg (2020). Prediction of type 2 diabetes using machine learning classification methods. Procedia Computer Science, 167; 706–716
Tsujimoto, T. and H. Kajio (2018). Low diastolic blood pressure and adverse outcomes in heart failure with preserved ejection fraction. International journal of cardiology, 263; 69–74
Xu, S., Y. Li, and Z. Wang (2017). Bayesian multinomial Naïve Bayes classifier to text classification. Advanced multimedia and ubiquitous engineering, 448; 347–352
Zlativa, S., Atanasova., and Ivanova (2017). Glucose and Insulin Reference Ranges in Oral Glucose Tolerance Test. International Journal of Scientific Research, 6(5); 451–452
Zou, Q., K. Qu, Y. Luo, D. Yin, Y. Ju, and H. Tang (2018). Predicting diabetes mellitus with machine learning techniques. Frontiers in genetics, 9; 515


Yulia Resti
yulia_resti@mipa.unsri.ac.id (Primary Contact)
Endang Sri Kresnawati
Novi Rustiana Dewi
Des Alwine Zayanti
Ning Eliyati
Resti, Y., Kresnawati, E. S., Dewi, N. R., Zayanti, D. A., & Eliyati, N. (2021). Diagnosis of Diabetes Mellitus in Women of Reproductive Age using the Prediction Methods of Naive Bayes, Discriminant Analysis, and Logistic Regression. Science and Technology Indonesia, 6(2), 96–104. https://doi.org/10.26554/sti.2021.6.2.96-104

Article Details

Most read articles by the same author(s)