A research team at NOVA University Lisbon performs a comprehensive scoping review of supervised ML frameworks—including XGBoost, Random Forest, and LASSO—leveraging electronic health record datasets to predict 30- and 90-day heart failure hospitalisation and readmission risks, emphasizing ensemble methods and the current lack of economic impact assessments.
Key points
Ensemble algorithms (XGBoost, CATBOOST) achieved top predictive performance with mean AUC up to 0.88 for unspecified-period heart failure risk.
EHR-derived datasets across 13 countries provided clinical, demographic, and utilization variables for 30- and 90-day risk modelling.
No reviewed studies included economic evaluations, indicating a critical gap for assessing cost-effectiveness before clinical deployment.
Why it matters:
This synthesis underscores ensemble ML's potential to refine heart failure risk stratification and highlights gaps in cost-effectiveness evaluations crucial for clinical adoption.
Q&A
What is a scoping review?
How does AUC measure predictive performance?
What are ensemble learning methods?
Why are economic analyses important in ML healthcare studies?
Read full article
Academy
Machine Learning in Healthcare
Machine learning (ML) refers to computational techniques that enable computers to learn from data without explicit programming. In healthcare, ML systems analyze large volumes of patient information—such as medical histories, laboratory results, imaging data, and demographic factors—to uncover patterns and make predictions about disease risk, treatment efficacy, or patient outcomes.
The typical ML workflow in healthcare includes:
- Data Collection: Clinicians and hospitals collect diverse data, including electronic health records (EHRs), wearable sensor readings, and patient-reported surveys.
- Data Preprocessing: Raw data often contain errors, missing values, or inconsistencies. Preprocessing cleans and formats the data through steps like normalization, imputation of missing entries, and encoding categorical variables into numerical form.
- Feature Selection: Among dozens or hundreds of candidate variables—such as age, blood pressure, or prior hospital visits—feature selection techniques identify the most informative predictors to improve model performance and interpretability.
- Model Training: Supervised learning algorithms, such as logistic regression, decision trees, or neural networks, learn associations between input features and known outcomes (e.g., hospital readmission). Training involves optimizing model parameters to minimize prediction error on historical data.
- Validation and Testing: Models are evaluated on separate datasets not used during training. Common metrics include accuracy, sensitivity, specificity, and the area under the receiver operating characteristic curve (AUC-ROC), which measures the model's discriminatory power.
- Deployment and Monitoring: Once validated, ML models can be integrated into clinical decision support systems. Continuous monitoring ensures performance remains accurate over time and adapts to new data or shifts in patient populations.
ML applications in healthcare span diagnostics (e.g., interpreting imaging scans for early cancer detection), prognostics (e.g., predicting length of hospital stays), personalized treatment planning, and operational optimization (e.g., forecasting bed occupancy and staffing needs).
Ensemble Learning Methods
Ensemble learning combines multiple individual models—often called weak learners—into a single, robust predictive system. By aggregating diverse decision-making strategies, ensembles typically outperform any single model on complex tasks.
Key ensemble approaches include:
- Bagging (Bootstrap Aggregating): Constructs multiple versions of a base model (e.g., decision trees) on different random subsets of the training data. Predictions are averaged (for regression) or voted on (for classification), reducing variance and overfitting. Random Forest is a popular bagging method for classification.
- Boosting: Trains models sequentially, where each new model focuses on correcting errors made by its predecessors. Algorithms like XGBoost and CATBOOST assign higher weights to misclassified examples, improving overall accuracy and sensitivity to difficult cases. Boosting excels at handling imbalanced datasets common in healthcare.
- Stacking (Stacked Generalization): Trains multiple diverse base learners and then uses their outputs as inputs to a higher-level model, called a meta-learner, which makes the final prediction. Stacking leverages complementary strengths of different algorithms.
Ensemble methods are particularly valuable in medical risk prediction because they:
- Capture complex, nonlinear relationships among clinical, demographic, and utilization features.
- Mitigate overfitting, improving generalization to new patient cohorts.
- Handle high-dimensional data and variable interactions, essential for accurate hospitalisation risk forecasting.
By integrating ensemble ML into clinical workflows, healthcare providers can achieve more reliable forecasts of patient outcomes, prioritize high-risk individuals for early intervention, and ultimately improve patient care.