Researchers at Khalifa University and ASPIREPMRIAD applied nested cross-validation on de-identified SEHA EHR data, training nine ML models with both automated and expert-driven feature selection. A Naive Bayes classifier achieved 0.96 AUC, highlighting dental and respiratory codes for cost-effective early mucopolysaccharidosis detection.
Key points
- Domain-expert feature selection identifies dental and respiratory codes (e.g., acute gingivitis, bronchitis) critical for MPS prediction.
- Naive Bayes classifier achieves 0.96 AUC, 0.93 accuracy, and 0.91 F1-score using EHR-derived features.
- Nested cross-validation with SMOTE balancing validates nine ML models across five feature selection strategies on 1186 EHR covariates.
Why it matters: This non-invasive, AI-driven screening transforms rare disease diagnostics by flagging mucopolysaccharidosis risk from routine EHR data, enabling earlier intervention and better outcomes.
Q&A
- What is mucopolysaccharidosis?
- Why choose Naive Bayes for diagnosis?
- What is nested cross-validation?
- How does feature selection improve model accuracy?