Machine learning applied to the prediction of diabetes mellitus, using socioeconomic and environmental information from health system users
DOI:
https://doi.org/10.17533/udea.rfnsp.e351168Keywords:
machine learning, diabetes mellitus, environmental factors, socioeconomic factors, predictive modelAbstract
Objective: The objective was to apply models based on
machine learning techniques to support the early diagnosis of diabetes mellitus, using environmental, social, economic and health data variables, without dependence on clinical sample collection.
Methodology: Data from 10,889 users affiliated with the subsidized health system in the southwestern area of Colombia, diagnosed with hypertension and grouped into
users without (74.3%) and with (25.7%) diabetes mellitus,
were used. Supervised models were trained using k-nearest
neighbors, decision trees, and random forests, as well as
ensemble-based models, applied to the database before and after balancing the number of cases in each diagnostic group. The performance of the algorithms was evaluated by dividing the database into training and test data (70/30, respectively), and metrics of accuracy, sensitivity, specificity, and area under the curve were used.
Results: Sensitivity values increased significantly when using balanced data, going from maximum values of 17.1% (unbalanced data) to values as high as 57.4% (balanced data). The highest value of area under the curve (0.61) was obtained with the ensemble models, by applying a balance in the amount of data for each group and by coding the categorical variables. The variables with the greatest weight were associated with hereditary aspects (24.65%) and with
the ethnic group (5.59%), in addition to visual difficulty, low
water consumption, a diet low in fruits and vegetables, and
the consumption of salt and sugar.
Conclusions: Although predictive models, using people's socioeconomic and environmental information, emerge as a tool for the early diagnosis of diabetes mellitus, their predictive capacity still needs to be improved.
Downloads
References
Howlader KC, Satu MS, Awal MA, et al. Machine learning models for classification and identification of significant attributes to detect type 2 diabetes. Health Inf Sci Syst 2022;10(2). DOI: https://doi.org/10.1007/s13755-021-00168-2
Bernardini D. Sobre los aspectos económicos de la diabetes mellitus. Rev Cubana Aliment Nutr. [internet]. 2022 [citado 2022 ago. 26 ]; 30(Supl. 2):255-61. Disponible en: http://revalnutricion.sld.cu/index.php/rcan/article/view/1226/1701
Organización Mundial de la Salud. Informe mundial sobre la diabetes. Geneva, Switzerland: WHO [internet]; 2016 [citado 2022 ago. 26]. Dispo-nible en: https://apps.who.int/iris/bitstream/handle/10665/254649/9789243565255-spa.pdf
Cuenta de Alto Costo, Fondo Colombiano de Enfermedades de Alto Costo. Situación de la enfermedad renal crónica, la hipertensión arterial y la diabetes mellitus en Colombia 2020. Bogotá [internet]; 2021 [citado 2022 ago. 26]. Disponible en: https://cuentadealtocosto.org/site/publicaciones/situacion-de-la-enfermedad-renal-cronica-la-hipertension-arterial-y-diabetes-mellitus-en-colombia-2020/
Colombia, Ministerio de Salud y Protección Social. Prevenir la diabetes, clave desde los hábitos saludables. [internet]; 2021 [citado 2022 ago. 26]. Disponible en: https://www.minsalud.gov.co/Paginas/Prevenir-la-diabetes-clave-desde-los-habitos-saludables.aspx
Kruczkowski M, Drabik-Kruczkowska A, Marciniak A, et al. Predictions of cervical cancer identification by photonic method combined with machine learning. Sci Rep. 2022;12(1):3762. DOI: https://doi.org/10.1038/s41598-022-07723-1
Hameed Z, Zahia S, Garcia-Zapirain B, et al. Breast cancer histopathology image classification using an ensemble of deep learning models. Sen-sors. 2020;20(16):4373. DOI: https://doi.org/10.3390/s20164373
Konnaris MA, Brendel M, Fontana MA, et al. Computational pathology for musculoskeletal conditions using machine learning: Advances, trends, and challenges. Arthritis Res Ther. 2022;24(1):68. DOI: https://doi.org/10.1186/s13075-021-02716-3
Lee LS, Chan PK, Wen C, et al. Artificial intelligence in diagnosis of knee osteoarthritis and prediction of arthroplasty outcomes: A review. Arth-roplasty. 2022;4(1):16. DOI: https://doi.org/10.1186/s42836-022-00118-7
Lazzarini PA, Raspovic A, Prentice J, et al. Guidelines development protocol and findings: Part of the 2021 Australian evidence-based guidelines for diabetes-related foot disease. J Foot Ankle Res. 2022;28:15. DOI: https://doi.org/10.1186/s13047-022-00533-8
Patel D, Msosa YJ, Wang T, et al. An implementation framework and a feasibility evaluation of a clinical decision support system for diabetes management in secondary mental healthcare using CogStack. BMC Med Inform Decis Mak. 2022;100(1):22. DOI: https://doi.org/10.1186/s12911-022-01842-5
Cerón-Rios GM, Lopez-Gutierrez DM, et al. Recommendation System based on CBR algorithm for the Promotion of Healthier Habits. Sanchez-Ruiz AA, Kofod-Petersen A, editors. Proceedings of ICCBR 2017 Workshops (CAW, CBRDL, PO-CBR), Doctoral Consortium, and Competitions co-located with the 25th International Conference on Case-Based Reasoning (ICCBR 2017). Trondheim, Norway, June 26-28, 2017. CEUR Workshop Proce-edings [internet]; 2017. pp. 167-76 [citado 2022 ago. 26]. Disponible en: https://ceur-ws.org/Vol-2028/paper16.pdf
Li J, Huang J, et al. Application of artificial intelligence in diabetes education and management: Present status and promising prospect. Front Pu-blic Health. 2020;8:173. DOI: https://doi.org/10.3389/fpubh.2020.00173
Rohokale V, Rashmi Neeli, Prassad Ramjee. A cooperative internet of things (IoT) for rural healthcare monitoring and control. 2011 2nd Interna-tional Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic Systems Technology (Wireless VITAE). 2011; 1-6. DOI: https://doi.org/10.1109/WIRELESSVITAE.2011.5940920
Abbas H, Alic L, Rios M, et al. Predicting diabetes in healthy population through machine learning. In: Proceedings - IEEE Symposium on Compu-ter-Based Medical Systems. Institute of Electrical and Electronics Engineers Inc. [internet]; 2019. pp. 567-70 [citado 2022 ago. 26]. Disponible en: https://ieeexplore.ieee.org/document/8787404
Zhang L, Wang Y, Niu M, et al. Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: The Henan Rural Cohort Study. Sci Rep. 2020;4406(1):10. DOI: https://doi.org/10.1038/s41598-020-61123-x
Dinh A, Miertschin S, et al. A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Med Inform Decis Mak. 2019; 211(1):19. DOI: https://doi.org/10.1186/s12911-019-0918-5
Fazakis N, Kocsis O, Dritsas E, et al. Machine learning tools for long-term type 2 diabetes risk prediction. IEEE Access. 2021;9:103737-57. DOI: https://doi.org/10.1109/ACCESS.2021.3098691
Shetty G, Katkar V. Type-II diabetes detection using decision-tree based ensemble of classifiers. In: 2019 5th International Conference On Com-puting, Communication, Control And Automation (ICCUBEA); 2019. pp. 1-5. DOI: https://doi.org/10.1109/ICCUBEA47591.2019.9129348
Haq AU, Li JP, Khan J, et al. Intelligent machine learning approach for effective recognition of diabetes in e-healthcare using clinical data. Sen-sors. 2020;20(9):2649. DOI: https://doi.org/10.3390/s20092649
Leiva AM, Martínez MA, Petermann F, et al. Factores asociados al desarrollo de diabetes mellitus tipo 2 en Chile. Nutr Hosp. 2018;35(2):400-7. DOI: https://doi.org/10.20960/nh.1434
Géron A. Hands-on machine learning with Scikit-Learn and TensorFlow. CA: O’Reilly Media; 2017. https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/
Priyam A, Abhijeeta, Gupta R, et al. Comparative analysis of decision tree classification algorithms. Int. J. Curr. Eng. Technol. 2013;3(2):334-7. https://inpressco.com/comparative-analysis-of-decision-tree-classification-algorithms/
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2023 Universidad de Antioquia
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The contents of the articles are the responsibility of the authors
The editorial committee has editorial independence from the National School of Public Health "Héctor Abad Gómez" of the University of Antioquia.
The editorial committee is not responsible for aspects related to copying, plagiarism or fraud that may appear in the articles published in it.
When you are going to reproduce and disclose photographs or personal data in printed or digital format, informed consent is required. Therefore, this requirement is required of the author at the time of receipt of the manuscript.
Authors are responsible for obtaining the necessary permissions to reproduce any material protected by reproduction rights.
The authors preserve the moral rights and assign the economic rights that will correspond to the University of Antioquia, to publish it, distribute electronic copies, include them in indexing services, directories or national and international databases in Open Access, under the Creative Commons Attribution license -Not Commercial-Share Equal 4.0 International Commercial (CC BY-NC-SA) which allows others to distribute, remix, retouch, and create from the work in a non-commercial way, as long as the respective credit and license are granted. new creations under the same conditions.
The authors will sign the declaration of transfer of economic rights to the University of Antioquia, after the acceptance of the manuscript.
The editorial committee reserves the right to reject the articles whose authors do not offer satisfactory explanations about the contribution of each author, to meet the criteria of authorship in the submission letter. All authors must meet the four criteria of authorship according to ICMJE: "a) .- That there is a substantial contribution to the conception or design of the article or to the acquisition, analysis or interpretation of the data. b) That they have participated in the design of the research work or in the critical review of its intellectual content. c) .- That has been intervened in the approval of the final version that will be published.d). That they have the capacity to respond to all aspects of the article in order to ensure that issues related to the accuracy or integrity of any part of the work are adequately investigated and resolved. "