A team from Bar-Ilan University and Tel Aviv Medical Center applies machine learning to peripheral blood T cell receptor sequencing data, achieving an average AUC of 0.96 in distinguishing breast cancer patients from healthy donors and showcasing potential for minimally invasive diagnostics.
Key points
High-throughput TCR-seq of PBMCs from 47 breast cancer patients and 51 healthy donors generates 1.16 million unique CDR3 clonotypes.
Select From Model feature selection identifies 10 public TCR clonotypes, which are used to train an XGBoost classifier achieving an average test AUC of 0.96.
Bootstrap evaluation and multiple subsamplings confirm model stability and support feasibility of a liquid biopsy for non-invasive breast cancer detection.
Why it matters:
This AI-driven liquid biopsy approach enables non-invasive, accurate breast cancer detection, potentially transforming early screening and improving patient outcomes.
Q&A
What is T cell receptor sequencing?
How does the machine learning model classify samples?
What does AUC of 0.96 indicate?
Why is subsampling used in analysis?
Read full article
Academy
T Cell Receptor Sequencing for Cancer Detection
T cell receptors (TCRs) are proteins on the surface of T lymphocytes that recognize specific antigens. Each T cell clone carries a unique TCR, defined largely by the highly variable Complementarity Determining Region 3 (CDR3) of its α and β chains. By sequencing these CDR3 regions across thousands of T cells in a blood sample, researchers can map the adaptive immune repertoire—a comprehensive record of immune exposures and responses.
What Are T Cell Receptors?
TCRs consist of two protein chains, α and β, that together form a binding site for antigen fragments presented by major histocompatibility complex (MHC) molecules. Diversity in TCRs arises from V(D)J recombination, where variable (V), diversity (D), and joining (J) gene segments shuffle and introduce random nucleotides at junctions, producing millions of possible CDR3 sequences.
How Is TCR Sequencing Performed?
- Sample Collection: Peripheral blood mononuclear cells (PBMCs) are isolated from whole blood using density-gradient centrifugation.
- RNA Extraction: Total RNA is purified from PBMCs, ensuring high integrity (RNA Integrity Number).
- Library Preparation: A targeted amplification kit selectively enriches TCR α and β transcripts, appends sequencing adapters, and introduces molecular barcodes.
- High-Throughput Sequencing: Prepared libraries are sequenced on Illumina platforms (e.g., MiSeq), generating millions of paired-end reads covering CDR3 regions.
- Data Processing: Bioinformatics tools like MiXCR align reads, assemble clonotypes, and quantify abundances, producing a frequency table of CDR3 sequences per sample.
Machine Learning Analysis
After sequencing, TCR frequency tables are input into a machine learning pipeline. Feature selection algorithms (e.g., Select From Model) pick the most informative public clonotypes. Classifiers such as XGBoost then learn patterns that distinguish disease from healthy repertoires, validated by metrics like the Area Under the ROC Curve (AUC).
Applications in Oncology
- Liquid Biopsy: Detect tumor-induced immune signatures in blood without invasive tissue sampling.
- Early Detection: Recognize cancer-associated TCR patterns at pre-symptomatic stages.
- Minimal Residual Disease: Monitor real-time immune responses after treatment to detect recurrence.
By providing a high-resolution view of the immune repertoire, TCR sequencing combined with AI classification opens a new frontier in precision oncology, offering minimally invasive, sensitive, and dynamic tools for cancer detection and monitoring.