An ensemble approach for the prediction of diabetes mellitus using a soft voting classifier with an explainable AI

Kibria, Hafsa Binte, Nahiduzzaman, Md, Goni, Md Omaer Faruq, Ahsan, Mominul and Haider, Julfikar (2022) An ensemble approach for the prediction of diabetes mellitus using a soft voting classifier with an explainable AI. Sensors, 22 (19). p. 7268. ISSN 1424-8220

Preview

Published Version
Available under License Creative Commons Attribution.
Download (3MB) | Preview

Official URL: https://www.mdpi.com/1424-8220/22/19/7268

Abstract

Diabetes is a chronic disease that continues to be a primary and worldwide health concern since the health of the entire population has been affected by it. Over the years, many academics have attempted to develop a reliable diabetes prediction model using machine learning (ML) algorithms. However, these research investigations have had a minimal impact on clinical practice as the current studies focus mainly on improving the performance of complicated ML models while ignoring their explainability to clinical situations. Therefore, the physicians find it difficult to understand these models and rarely trust them for clinical use. In this study, a carefully constructed, efficient, and interpretable diabetes detection method using an explainable AI has been proposed. The Pima Indian diabetes dataset was used, containing a total of 768 instances where 268 are diabetic, and 500 cases are non-diabetic with several diabetic attributes. Here, six machine learning algorithms (artificial neural network (ANN), random forest (RF), support vector machine (SVM), logistic regression (LR), AdaBoost, XGBoost) have been used along with an ensemble classifier to diagnose the diabetes disease. For each machine learning model, global and local explanations have been produced using the Shapley additive explanations (SHAP), which are represented in different types of graphs to help physicians in understanding the model predictions. The balanced accuracy of the developed weighted ensemble model was 90% with a F1 score of 89% using a five-fold cross-validation (CV). The median values were used for the imputation of the missing values and the synthetic minority oversampling technique (SMOTETomek) was used to balance the classes of the dataset. The proposed approach can improve the clinical understanding of a diabetes diagnosis and help in taking necessary action at the very early stages of the disease.

Item Type:	Article
Peer-reviewed:	Yes
Date Deposited:	27 Sep 2022 12:47
Publisher:	MDPI AG
Additional Information:	This is an Open Access article which appears in Sensors, published by MDPI.
Divisions:	Faculties > Science and Engineering
Subject terms:	0301 Analytical Chemistry, 0502 Environmental Science and Management, 0602 Ecology, 0805 Distributed Computing, 0906 Electrical and Electronic Engineering, Analytical Chemistry
Data Access Statement:	: The data presented in this study are available in the article.
URI:	https://e-space.mmu.ac.uk/id/eprint/630437
DOI:	https://doi.org/10.3390/s22197268
ISSN	1424-8220
e-ISSN	1424-8220

Impact and Reach

Statistics

DownloadsShow export options

Activity Overview

6 month trend

417Downloads

6 month trend

135Hits

Additional statistics for this dataset are available via IRStats2.

Altmetric

Repository staff only

Edit record