Researchers at the Second Affiliated Hospital of Army Medical University develop a CatBoost model enhanced by active learning to predict Philadelphia chromosome-positive acute lymphoblastic leukemia using routine clinical and laboratory parameters, with feature selection via BorutaShap and interpretability via SHAP.
Key points
- Ten routine clinical and laboratory features—age, neutrophil and monocyte counts, liver enzymes, among others—are selected via BorutaShap.
- CatBoost model integrated with an active learning algorithm achieves validation AUC of 0.797 and external AUC of 0.794 for Ph+ALL prediction.
- SHAP analysis identifies age, monocyte count, γ-glutamyl transferase, neutrophil count, and ALT as critical drivers of model output.
Why it matters: This interpretable ML approach enables early, low-cost detection of Ph+ALL in settings lacking genetic testing, improving diagnostic access and guiding timely treatment choices.
Q&A
- What is BorutaShap feature selection?
- How does active learning improve the model?
- Why use the CatBoost algorithm?
- What role do SHAP values play in interpretability?