PredictStr: a balanced benchmark dataset for improve stroke prediction
Abstract
Predicting strokes is essential for improving healthcare outcomes and saving lives. This paper introduces a benchmarking dataset, PredictStr, specifically developed to enhance stroke prediction. This dataset improves upon a previously unique dataset identified in the literature. Our methodology comprises two main steps: firstly, we outline a series of preprocessing and cleaning measures to enhance data quality. Secondly, we present a novel algorithm, the Dynamic Hybrid Balancing Algorithm, which builds upon the ADSYSN algorithm by integrating consistency constraints to address class imbalances. Our contribution extends to the application of sophisticated analysis techniques, including histogram and boxplot analyses, feature distribution assessments, statistical explorations, correlation evaluations, feature importance rankings, and Individual Conditional Expectation (ICE) plots. These methodologies are designed to provide valuable insights into feature significance, thereby assisting researchers in identifying the most critical attributes for effective stroke detection.
Origin | Files produced by the author(s) |
---|