Researchers from Southeast University and the Jiangsu Provincial Center for Disease Prevention and Control compare logistic regression with seven machine learning methods—like GA-RF, GRNN, and PNN—on SNP data from 1,338 noise-exposed workers. They use cross-validation and hyperparameter tuning to evaluate accuracy, AUC, and F-scores for predicting noise-induced hearing loss.
Key points
- Dataset of 1,338 noise-exposed workers genotyped at 88 SNP loci.
- GA-RF achieved top accuracy (84.4%), F-score (0.773), R² (0.757), and AUC (0.752).
- GRNN and PNN used hyperparameter-optimized neural nets, with GRNN hitting 97.5% accuracy on select SNP combos.
- Classical ML (DT, GBDT, KNN, XGBoost) showed varied improvements over logistic regression.
- Logistic regression’s AUC capped at 0.704, while ML methods uncovered nonlinear SNP interactions.
Why it matters: Applying advanced machine learning to high-dimensional SNP datasets reveals nuanced genetic risk factors for occupational hearing loss, surpassing traditional statistical models. This approach enables earlier, more precise identification of susceptible workers, paving the way for personalized prevention strategies in occupational health.
Q&A
- What is noise-induced hearing loss?
- What role do SNP loci play here?
- How does GA-RF work?
- Why use GRNN and PNN?
- What metrics evaluate model performance?