G, Thippa Reddy, Bhattacharya, Sweta, Maddikunta, Praveen Kumar Reddy, Hakak, Saqib, Khan, Wazir Zada, Bashir, Ali Kashif ORCID: https://orcid.org/0000-0001-7595-2522, Jolfaei, Alireza and Tariq, Usman (2022) Antlion re-sampling based deep neural network model for classification of imbalanced multimodal stroke dataset. Multimedia Tools and Applications, 81 (29). pp. 41429-41453. ISSN 1380-7501
|
Accepted Version
Available under License In Copyright. Download (5MB) | Preview |
Abstract
Stroke is enlisted as one of the leading causes of death and serious disability affecting millions of human lives across the world with high possibilities of becoming an epidemic in the next few decades. Timely detection and prompt decision making pertinent to this disease, plays a major role which can reduce chances of brain death, paralysis and other resultant outcomes. Machine learning algorithms have been a popular choice for the diagnosis, analysis and predication of this disease but there exists issues related to data quality as they are collected cross-institutional resources. The present study focuses on improving the quality of stroke data implementing a rigorous pre-processing technique. The present study uses a multimodal stroke dataset available in the publicly available Kaggle repository. The missing values in this dataset are replaced with attribute means and LabelEncoder technique is applied to achieve homogeneity. However, the dataset considered was observed to be imbalanced which reflect that the results may not represent the actual accuracy and would be biased. In order to overcome this imbalance, resampling technique was used. In case of oversampling, some data points in the minority class are replicated to increase the cardinality value and rebalance the dataset. transformed and oversampled data is further normalized using Standardscalar technique. Antlion optimization (ALO) algorithm is implemented on the deep neural network (DNN) model to select optimal hyperparameters in minimal time consumption. The proposed model consumed only 38.13% of the training time which was also a positive aspect. The experimental results proved the superiority of proposed model.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.