Researchers at the Second Affiliated Hospital of Army Medical University develop a CatBoost model enhanced by active learning to predict Philadelphia chromosome-positive acute lymphoblastic leukemia using routine clinical and laboratory parameters, with feature selection via BorutaShap and interpretability via SHAP.

Key points

  • Ten routine clinical and laboratory features—age, neutrophil and monocyte counts, liver enzymes, among others—are selected via BorutaShap.
  • CatBoost model integrated with an active learning algorithm achieves validation AUC of 0.797 and external AUC of 0.794 for Ph+ALL prediction.
  • SHAP analysis identifies age, monocyte count, γ-glutamyl transferase, neutrophil count, and ALT as critical drivers of model output.

Why it matters: This interpretable ML approach enables early, low-cost detection of Ph+ALL in settings lacking genetic testing, improving diagnostic access and guiding timely treatment choices.

Q&A

  • What is BorutaShap feature selection?
  • How does active learning improve the model?
  • Why use the CatBoost algorithm?
  • What role do SHAP values play in interpretability?
Copy link
Facebook X LinkedIn WhatsApp
Share post via...


Read full article