A team from Kyoto University, Osaka University, and US collaborators introduces MLOmics, an open-access cancer multi-omics database. It integrates mRNA, miRNA, DNA methylation, and CNV datasets through standardized preprocessing, feature alignment, and statistical selection. This resource supports pan-cancer classification, subtype clustering, and imputation using uniform datasets and fair benchmarking.
Key points
- Integrates 8,314 TCGA patient samples across 32 cancer types with mRNA, miRNA, methylation, and CNV omics profiles.
- Implements standardized preprocessing including FPKM conversion, limma normalization, GAIA CNV annotation, and unified gene ID alignment.
- Delivers 20 ready-to-use datasets for classification, clustering, and imputation with rigorous benchmarking using statistical and deep learning baselines.
Why it matters: By providing uniform, task-ready multi-omics datasets, MLOmics accelerates reproducible cancer ML research and enables robust model evaluation.
Q&A
- What is multi-omics?
- How does MLOmics preprocess omics data?
- What are the Original, Aligned, and Top feature scales?
- Which machine learning tasks does MLOmics support?