A team at Beijing Chaoyang Hospital builds and compares five supervised machine learning algorithms using clinical, echocardiographic, and hemodynamic features. They identify six key predictors via LASSO, train models with logistic regression, SVM, random forest, XGBoost, and decision tree, and use SHAP to interpret the best model’s decisions in predicting BPA outcomes.

Key points

  • Six predictors selected by LASSO: occlusive lesion proportion, TAPSE/PASP, 6MWD, RVESA, TR severity, PVR.
  • Logistic regression with L2 regularisation outperforms other ML models, achieving test AUC of 0.865, accuracy 0.848, sensitivity 0.950.
  • SHAP analysis identifies occlusive lesion proportion as the most influential feature driving BPA response predictions.

Why it matters: A reliable ML tool for preoperative BPA response prediction can enhance patient selection, reduce procedural risks, and improve outcomes in CTEPH management.

Q&A

  • What is CTEPH?
  • How does balloon pulmonary angioplasty (BPA) work?
  • What role does LASSO feature selection play?
  • What are SHAP values?
Copy link
Facebook X LinkedIn WhatsApp
Share post via...


Read full article

Machine Learning for Clinical Prediction in Pulmonary Hypertension

Introduction
Machine learning (ML) applies statistical algorithms to identify patterns in large clinical datasets, offering objective support for diagnosis, prognosis, and treatment decisions. In pulmonary hypertension—high blood pressure in the vessels carrying blood from the heart to the lungs—ML can predict which patients will respond best to interventions like balloon pulmonary angioplasty (BPA).

Key Concepts in Machine Learning

  • Supervised Learning: The most common type in medicine, where models learn to map input features (e.g., clinical measurements) to known outcomes (e.g., treatment success).
  • Feature Selection: Techniques like LASSO shrink irrelevant feature coefficients toward zero, reducing noise and improving model interpretability.
  • Model Types: Algorithms vary from simple linear models (logistic regression) to complex ensemble methods (random forest, XGBoost) or kernel-based classifiers (support vector machines).
  • Model Evaluation: Metrics such as area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and calibration curves assess discrimination and reliability.
  • Interpretability: Methods like SHapley Additive exPlanations (SHAP) assign each feature a contribution value, making black-box models transparent for clinical use.

Workflow for ML-Based BPA Prediction

  1. Data Collection: Gather demographic, functional, echocardiographic, catheterization, and angiographic parameters from patient records.
  2. Preprocessing and Imputation: Address missing data via multiple imputation and standardize measurement descriptors.
  3. Feature Selection: Perform univariate statistical tests followed by LASSO regression to identify the most predictive variables.
  4. Model Training: Split data temporally into training and test sets; tune hyperparameters via cross-validation.
  5. Model Evaluation: Compare models on AUC, accuracy, sensitivity, specificity, F1 score, and Brier score.
  6. Interpretation: Use SHAP values to explain individual and global feature contributions.

Important Clinical Features

Occlusive Lesion Proportion: Fraction of pulmonary artery branches with subtotal or total occlusion; higher proportions indicate more complex vascular pathology.
TAPSE/PASP Ratio: Measures right ventricular–arterial coupling by dividing tricuspid annular plane systolic excursion by pulmonary artery systolic pressure; lower ratios suggest poorer RV function.
Six-Minute Walk Distance (6MWD): Functional assessment of exercise capacity reflecting overall cardiopulmonary health.
Right Ventricular End-Systolic Area (RVESA): Echocardiographic measure of RV dilation; larger area implies worse RV remodeling.
Tricuspid Regurgitation (TR) Severity: Graded by Doppler echocardiography; severe regurgitation signals advanced RV stress.
Pulmonary Vascular Resistance (PVR): Hemodynamic parameter measured on right heart catheterization; higher resistance indicates more extensive vascular obstruction.

Implications for Patient Care

By integrating readily available echocardiographic and hemodynamic data with ML algorithms, clinicians can personalize treatment planning. Patients predicted to respond poorly to BPA may be directed toward alternative therapies, while those with high predicted success can undergo the procedure with greater confidence. This data-driven approach aims to optimize outcomes and resource allocation in complex pulmonary hypertension care.

Machine learning in CTEPH: predicting the efficacy of BPA based on clinical and echocardiographic features