A team from Kırıkkale University systematically evaluated ScholarGPT, ChatGPT-4o, and Google Gemini on 30 endodontic apical surgery questions sourced from Cohen’s Pathways of the Pulp. Analyzing 5,400 responses, they found ScholarGPT achieved 97.7% accuracy, markedly higher than ChatGPT-4o’s 90.1% and Gemini’s 59.5%.
Key points
- 5,400 responses to 30 endodontic apical surgery questions (12 dichotomous, 18 open-ended) drawn from Cohen’s Pathways of the Pulp.
- ScholarGPT (academic-tuned LLM) attains 97.7% accuracy versus ChatGPT-4o’s 90.1% and Gemini’s 59.5% (χ2=22.61, p<0.05).
- High inter-rater reliability confirmed by weighted Cohen’s kappa (κ=0.85) for coding correctness.
Why it matters: Demonstrating an academic-tuned GPT’s superior accuracy underscores the value of specialized LLMs for reliable clinical decision support in dentistry.
Q&A
- What makes ScholarGPT different?
- How was model performance evaluated?
- What are limitations of this study?
- Why use both dichotomous and open-ended questions?
- What is endodontic apical surgery?