Inonu University researchers apply four machine learning algorithms—Random Forest, SVM, XGBoost and KNN—to complete blood count parameters to predict polycythaemia vera. After balancing the dataset with SMOTE and training on hemoglobin, hematocrit, white cell and platelet values, the XGBoost model attains an area under the curve of 0.99 and 94% accuracy, demonstrating AI’s potential to reduce reliance on expensive diagnostics like JAK2 mutation assays and bone marrow biopsy.
Key points
- XGBoost model classifies PV with 0.99 AUC and 94% accuracy based on CBC features.
- SMOTE oversampling addresses 82:1402 class imbalance before 80:20 train-test split.
- PLT contributed 42.4% to model predictions, highlighting platelet count’s diagnostic value.
Why it matters: This study shows that machine learning on routine CBC can screen polycythaemia vera accurately, cutting diagnostic costs and invasiveness.
Q&A
- What is the Synthetic Minority Oversampling Technique (SMOTE)?
- How does XGBoost differ from other machine learning models?
- Why use complete blood count (CBC) parameters for disease prediction?
- What are the standard diagnostic tests for polycythaemia vera?