bmcoralhealth.biomedcentral.com


A team from Kırıkkale University systematically evaluated ScholarGPT, ChatGPT-4o, and Google Gemini on 30 endodontic apical surgery questions sourced from Cohen’s Pathways of the Pulp. Analyzing 5,400 responses, they found ScholarGPT achieved 97.7% accuracy, markedly higher than ChatGPT-4o’s 90.1% and Gemini’s 59.5%.

Key points

  • 5,400 responses to 30 endodontic apical surgery questions (12 dichotomous, 18 open-ended) drawn from Cohen’s Pathways of the Pulp.
  • ScholarGPT (academic-tuned LLM) attains 97.7% accuracy versus ChatGPT-4o’s 90.1% and Gemini’s 59.5% (χ2=22.61, p<0.05).
  • High inter-rater reliability confirmed by weighted Cohen’s kappa (κ=0.85) for coding correctness.

Why it matters: Demonstrating an academic-tuned GPT’s superior accuracy underscores the value of specialized LLMs for reliable clinical decision support in dentistry.

Q&A

  • What makes ScholarGPT different?
  • How was model performance evaluated?
  • What are limitations of this study?
  • Why use both dichotomous and open-ended questions?
  • What is endodontic apical surgery?
Copy link
Facebook X LinkedIn WhatsApp
Share post via...
Assessment of various artificial intelligence applications in responding to technical questions in endodontic surgery

Researchers from Karabuk University and Antalya Oral and Dental Health Hospital assess ChatGPT 3.5 and Google Gemini performance in addressing parent queries on pediatric dental trauma. They employ the DISCERN instrument and PEMAT-P tool to evaluate response quality, understandability, and actionability. Both chatbots deliver comparable guidance, with Gemini showing marginally higher reliability and ChatGPT demonstrating superior clarity, yet neither system substitutes professional dental consultation.

Key points

  • ChatGPT 3.5 and Google Gemini are evaluated using the DISCERN instrument, with Gemini achieving marginally higher mean reliability scores.
  • PEMAT-P analysis shows ChatGPT delivers superior understandability and both chatbots provide similar actionability for pediatric dental trauma guidance.
  • Study uses 17 IADT-based case scenarios with inter-rater Cohen’s kappa of 0.72–0.78 and parametric statistical tests to compare chatbot performance.

Why it matters: This study validates AI chatbots as accessible, consistent sources of pediatric dental trauma guidance, heralding scalable support alongside clinical expertise.

Q&A

  • What is the DISCERN instrument?
  • How does PEMAT-P measure actionability?
  • Why can’t AI chatbots replace dentists?
  • What factors influence chatbot reliability?
  • How were the case scenarios designed?
Copy link
Facebook X LinkedIn WhatsApp
Share post via...
Artificial intelligence in pediatric dental trauma: do artificial intelligence chatbots address parental concerns effectively?

A recent 2025 study by Wu et al. shows how machine learning models can predict early childhood caries outcomes using factors like lesion location and brushing habits. The research demonstrates that digital analysis refines preventive care strategies. Clinicians might use these insights for more tailored treatment approaches.

Copy link
Facebook X LinkedIn WhatsApp
Share post via...
Use machine learning to predict treatment outcome of early childhood caries