Researchers at the University of Gondar and partners apply seven supervised machine learning algorithms to DHS survey data across eight sub‐Saharan nations. They use Recursive Feature Elimination to select top predictors, address class imbalance via SMOTE+Tomek balancing, and identify Decision Tree as the best performer, reaching 82% accuracy and 0.87 ROC‐AUC.
Key points
- Preprocessed 133 425 weighted DHS samples from eight sub‐Saharan African countries using STATA 17, Python 3.10, Min-Max and standard scaling.
- Applied Recursive Feature Elimination with K-fold cross-validation to identify top demographic predictors—including age, smartphone access, and healthcare interactions.
- Balanced classes with SMOTE+Tomek and compared seven ML models; Decision Tree achieved highest performance (82% accuracy, ROC-AUC 0.87).
Why it matters: By leveraging accessible machine learning methods on large survey datasets, this approach pinpoints demographic drivers of health awareness and guides targeted interventions to enhance early breast cancer detection in underserved regions.
Q&A
- What is Recursive Feature Elimination (RFE)?
- How does SMOTE+Tomek balancing work?
- Why did the Decision Tree outperform other models?
- What do accuracy and ROC-AUC indicate here?