JISARA

Journal of Information Systems Applied Research and Analytics

Volume 19

V19 N2 Pages 50-70

Jun 2026


Improving AI-Driven Stroke Prediction Models: A Comparative Evaluation of SMOTE and Undersampling Methods


Dara Tourt
Metropolitan State University Minnesota
St Paul, MN USA

Queen Booker
Metropolitan State University Minnesota
St Paul, MN USA

Simon Jin
Metropolitan State University Minnesota
St Paul, MN USA

Abstract: Artificial intelligence (AI) is improving the field of predictive healthcare by enabling data-driven decision-making through advanced machine learning (ML) algorithms. Stroke prediction is challenging due to highly imbalanced clinical datasets, where positive cases are rare. This study investigates the impact of data-level resampling methods on the performance of AI-driven predictive models. Four widely used classifiers—Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Gradient Boosting (GB)—were applied to a highly imbalanced stroke dataset. Models were evaluated across key AI performance metrics. Paired t-tests assessed the statistical significance of observed differences. This comparative analysis offers critical insights into how data balancing techniques impact the reliability of AI models. The findings support the development of more effective and ethically responsible AI systems for early stroke detection.

Download this article: JISARA - V19 N2 Page 50.pdf


Recommended Citation: Tourt, D., Booker, Q., Jin, S.S., (2026). Improving AI-Driven Stroke Prediction Models: A Comparative Evaluation of SMOTE and Undersampling Methods. Journal of Information Systems Applied Research and Analytics 19(2) pp 50-70. https://doi.org/10.62273/MWHS5422