Researchers at the NIH Clinical Center and University of Oxford build a pipeline using OpenAI’s Whisper for transcription and the o1 model for summarization. They embed the filtered summaries and train a compact neural network to classify COVID-19 variants, achieving an AUROC of 0.823 without date or vaccine data.

Key points

  • Whisper-Large transcribes user-recorded COVID-19 accounts, then o1 LLM filters out non-clinical details.
  • Text embeddings of LLM summaries feed a 787K-parameter neural network trained on CPU under nested k-fold CV.
  • Model classifies Omicron vs Pre-Omicron with AUROC=0.823 and 0.70 specificity at 0.80 sensitivity.

Why it matters: Demonstrates that LLM-driven audio analysis can rapidly yield low-resource diagnostic tools for emerging pathogens when conventional data is scarce.

Q&A

  • What is Whisper-Large?
  • Why remove dates and vaccination details?
  • What does AUROC of 0.823 mean?
  • How was variant status labeled?
  • What is nested k-fold cross-validation?
Copy link
Facebook X LinkedIn WhatsApp
Share post via...


Read full article

Audiomics and Generative AI in Healthcare

Audiomics refers to extracting clinically relevant information from audio recordings of human voices, coughs, or breathing. Unlike traditional acoustic biomarkers that focus on pitch or frequency, audiomics leverages spoken language and symptom narratives—free speech recordings from patients—to uncover complex health signals.

In recent research, teams have used audiomics to classify COVID-19 variants by first transcribing patient videos with automatic speech recognition, then summarizing the transcripts with large language models (LLMs). This pipeline, combining speech recognition, LLM filtering, and neural network classification, has broadened the scope of digital health tools.

Key Components

  1. Automatic Speech Recognition (ASR): Models like Whisper-Large transcribe raw audio into text, handling diverse accents and environments.
  2. Generative AI Summarization: LLMs such as the o1 model filter transcripts to remove non-clinical details (dates, vaccines), focusing on health-related language.
  3. Embeddings and Classification: Bio-aware embedding models convert summaries into vectors for training compact neural networks that detect disease phenotypes.

Applications Beyond COVID-19

  • Early Symptom Monitoring: Audiomics can flag emerging infectious outbreaks by capturing subtle shifts in symptom descriptions.
  • Neurological Assessments: Free speech patterns may help diagnose conditions like Parkinson’s or Alzheimer’s based on language biomarkers.
  • Remote Patient Monitoring: Mobile apps can collect ambient audio, enabling non-invasive, low-cost follow-ups for chronic illnesses.

Future Directions: Integrating audiomics with electronic health records and wearable sensors could enrich multimodal patient profiles. As LLMs become more accessible, audiomics pipelines may power real-time public health surveillance and personalized care.

Generative AI and unstructured audio data for precision public health