Improving AI-Driven Stroke Prediction Models: A Comparative Evaluation of SMOTE and Undersampling Methods

Tourt, Dara; Booker, Queen; Jin, Simon

Volume 19

V19 N2 Pages 50-70	Jun 2026
Improving AI-Driven Stroke Prediction Models: A Comparative Evaluation of SMOTE and Undersampling Methods
Dara Tourt Metropolitan State University Minnesota St Paul, MN USA Queen Booker Metropolitan State University Minnesota St Paul, MN USA Simon Jin Metropolitan State University Minnesota St Paul, MN USA

Abstract: Artificial intelligence (AI) is improving the field of predictive healthcare by enabling data-driven decision-making through advanced machine learning (ML) algorithms. Stroke prediction is challenging due to highly imbalanced clinical datasets, where positive cases are rare. This study investigates the impact of data-level resampling methods on the performance of AI-driven predictive models. Four widely used classifiers—Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Gradient Boosting (GB)—were applied to a highly imbalanced stroke dataset. Models were evaluated across key AI performance metrics. Paired t-tests assessed the statistical significance of observed differences. This comparative analysis offers critical insights into how data balancing techniques impact the reliability of AI models. The findings support the development of more effective and ethically responsible AI systems for early stroke detection.

Download this article: JISARA - V19 N2 Page 50.pdf

Recommended Citation: Tourt, D., Booker, Q., Jin, S.S., (2026). Improving AI-Driven Stroke Prediction Models: A Comparative Evaluation of SMOTE and Undersampling Methods. Journal of Information Systems Applied Research and Analytics 19(2) pp 50-70. https://doi.org/10.62273/MWHS5422

JISARA

Volume 19

V19 N2 Pages 50-70

Jun 2026

Improving AI-Driven Stroke Prediction Models: A Comparative Evaluation of SMOTE and Undersampling Methods