Volume 19
Abstract: Artificial intelligence (AI) is improving the field of predictive healthcare by enabling data-driven decision-making through advanced machine learning (ML) algorithms. Stroke prediction is challenging due to highly imbalanced clinical datasets, where positive cases are rare. This study investigates the impact of data-level resampling methods on the performance of AI-driven predictive models. Four widely used classifiers—Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Gradient Boosting (GB)—were applied to a highly imbalanced stroke dataset. Models were evaluated across key AI performance metrics. Paired t-tests assessed the statistical significance of observed differences. This comparative analysis offers critical insights into how data balancing techniques impact the reliability of AI models. The findings support the development of more effective and ethically responsible AI systems for early stroke detection. Download this article: JISARA - V19 N2 Page 50.pdf Recommended Citation: Tourt, D., Booker, Q., Jin, S.S., (2026). Improving AI-Driven Stroke Prediction Models: A Comparative Evaluation of SMOTE and Undersampling Methods. Journal of Information Systems Applied Research and Analytics 19(2) pp 50-70. https://doi.org/10.62273/MWHS5422 | ||||||