A team from Kırıkkale University systematically evaluated ScholarGPT, ChatGPT-4o, and Google Gemini on 30 endodontic apical surgery questions sourced from Cohen’s Pathways of the Pulp. Analyzing 5,400 responses, they found ScholarGPT achieved 97.7% accuracy, markedly higher than ChatGPT-4o’s 90.1% and Gemini’s 59.5%.
Key points
5,400 responses to 30 endodontic apical surgery questions (12 dichotomous, 18 open-ended) drawn from Cohen’s Pathways of the Pulp.
ScholarGPT (academic-tuned LLM) attains 97.7% accuracy versus ChatGPT-4o’s 90.1% and Gemini’s 59.5% (χ2=22.61, p<0.05).
High inter-rater reliability confirmed by weighted Cohen’s kappa (κ=0.85) for coding correctness.
Why it matters:
Demonstrating an academic-tuned GPT’s superior accuracy underscores the value of specialized LLMs for reliable clinical decision support in dentistry.
Q&A
What makes ScholarGPT different?
How was model performance evaluated?
What are limitations of this study?
Why use both dichotomous and open-ended questions?
What is endodontic apical surgery?
Read full article
Academy
Introduction to Large Language Models in Clinical Dentistry
Large Language Models (LLMs) are advanced artificial intelligence systems designed to understand and generate human-like text. In clinical dentistry, they offer the potential to summarize research, provide diagnostic guidance, and assist with treatment planning. This course explores how LLMs work, their applications in endodontics, and considerations for safe use.
How LLMs Work
LLMs like GPT-4 are trained on massive datasets comprising books, articles, and web content. Through a process called transformer-based learning, they learn statistical patterns in language, enabling them to predict and generate coherent text. Key concepts include:
- Tokens: The smallest units of text (words or subwords) processed by the model.
- Transformer Architecture: Uses self-attention mechanisms to weigh relationships between tokens in a sequence.
- Fine-Tuning: Adapting a general model to a specific domain, such as academic literature in dentistry, enhances precision.
Applications in Endodontic Dentistry
Endodontic procedures involve diagnosing and treating diseases of the dental pulp and periapical tissues. LLMs can support clinicians by:
- Information Retrieval: Summarizing guidelines from authoritative texts like Cohen’s Pathways of the Pulp.
- Decision Support: Comparing treatment options and suggesting materials based on evidence.
- Patient Communication: Generating clear explanations of procedures and aftercare instructions.
Case Study: ScholarGPT vs. ChatGPT-4o vs. Gemini
A recent study from Kırıkkale University compared three LLMs on endodontic apical surgery questions. ScholarGPT, an academic-tuned model, achieved 97.7% accuracy, outpacing ChatGPT-4o (90.1%) and Google Gemini (59.5%). This highlights the benefit of specialized fine-tuning on peer-reviewed literature.
Benefits and Limitations
Benefits:
- Rapid access to summarized evidence.
- Consistent decision support for common procedures.
- Scalable training materials for dental education.
Limitations:
- Reliance on available training data; may omit paywalled studies.
- Potential for outdated or incomplete information.
- Need for human oversight to catch errors and ethical considerations.
Guidelines for Safe Use
- Verify Citations: Cross-check AI-generated references with primary literature.
- Limit Scope: Use LLMs as adjuncts, not sole decision-makers.
- Maintain Privacy: Do not share patient-identifiable data with AI services.
- Continuing Education: Stay informed about model updates and validation studies.
Future Directions
Advancements in domain-specific training may yield even higher accuracy for dental subspecialties. Combining LLMs with imaging analysis tools and electronic health records could create integrated clinical AI systems, further enhancing patient care and research translation.