A team led by Duke-NUS Medical School conducted a comprehensive scoping review of 467 clinical AI fairness studies. They catalogued medical fields, bias-relevant attributes, and fairness metrics, exposing narrow focus areas and methodological gaps, and offered actionable strategies to advance equitable AI integration across healthcare contexts.

Key points

  • Reviewed 467 clinical AI fairness studies, mapping applications across 28 medical fields and seven data types.
  • Identified that group fairness metrics (e.g., equalized odds) dominate over individual and distribution fairness approaches.
  • Found limited clinician-in-the-loop involvement and proposed integration strategies to bridge technical solutions with clinical contexts.

Why it matters: Addressing identified fairness gaps is crucial to ensure equitable AI-driven diagnoses and treatment decisions across all patient populations.

Q&A

  • What is AI fairness?
  • What are group fairness metrics?
  • How does bias occur in healthcare AI?
  • What is individual fairness?
Copy link
Facebook X LinkedIn WhatsApp
Share post via...


Read full article

AI Fairness in Healthcare

Introduction
The rapid adoption of artificial intelligence (AI) in healthcare promises faster diagnoses, personalized treatment plans, and more efficient resource allocation. However, without careful oversight, AI systems risk perpetuating or amplifying biases that exist in medical data, leading to unfair treatment recommendations and health inequities.

Defining Key Concepts

  • Bias: A systematic deviation in AI decisions favoring or disadvantaging specific patient groups, often due to unbalanced or nonrepresentative data.
  • Fairness: The principle that AI models should produce equitable outcomes across different populations, avoiding disparity in performance or resource distribution.
  • Group Fairness: Ensuring parity of a chosen metric (e.g., accuracy, false positive rate) across predefined subgroups (such as race or gender).
  • Individual Fairness: Guaranteeing that similar patients receive similar AI-driven recommendations, using well-defined clinical similarity measures.
  • Distribution Fairness: Fair allocation of limited resources (e.g., vaccines, organ transplants) based on need and contribution, not just algorithmic output.

Common Sources of Bias

  1. Data Imbalance: Minority populations may be underrepresented in training data, leading to poor model performance for those groups.
  2. Sensitive Attributes: Attributes like race, gender, or age may correlate with outcomes; direct or proxy use without proper adjustment introduces bias.
  3. Measurement Variability: Medical devices and protocols may perform differently across patient subgroups (e.g., pulse oximeters underestimating oxygen levels in darker skin tones).
  4. Clinical Practice Differences: Variations in clinician coding, diagnostic criteria, or treatment protocols can embed bias into electronic health records.

Approaches to Mitigate Bias

  • Pre-processing Methods: Balance data distributions by resampling, reweighting, or augmenting underrepresented groups.
  • In-processing Methods: Incorporate fairness constraints into training objectives, such as adversarial debiasing or regularization penalties for subgroup disparity.
  • Post-processing Methods: Adjust model outputs, for example by recalibrating decision thresholds for different subgroups to improve parity metrics.
  • Clinician-in-the-Loop: Engage healthcare professionals throughout model design, evaluation, and deployment to identify context-specific biases and refine solutions.

Fairness Metrics

  • Demographic Parity: The proportion of positive predictions is equal across subgroups.
  • Equalized Odds: True positive and false positive rates are equal among all subgroups, ensuring balanced sensitivity and specificity.
  • Calibration: The predicted risk probabilities accurately reflect observed outcomes for each subgroup.
  • Individual Similarity: A distance-based measure ensuring that patients with similar clinical profiles receive comparable recommendations.

Challenges and Future Directions

Although group fairness remains well studied, research on individual and distribution fairness in clinical contexts is scarce. The integration of large language models and federated learning introduces new bias risks. Future efforts should develop nuanced metrics, leverage intersectional analyses, and enrich datasets with diverse patient cohorts.

Conclusion: Building fair AI systems in healthcare demands a holistic approach—combining robust data strategies, algorithmic safeguards, and active clinical partnership—to achieve health equity and transform patient care.

A scoping review and evidence gap analysis of clinical AI fairness