← Back to all models Foundation Model

scBERT

Yang F, Wang W, Wang F, Fang Y, Tang D, Huang J, Lu H, Yao J · 2022-09-26 · Nature Machine Intelligence

BERT-based pre-trained model for single-cell transcriptomics with gene-level tokenization.

Overview

scBERT adapts the BERT architecture for single-cell transcriptomics by treating each gene as a token. It is pre-trained on large-scale scRNA-seq data using masked gene expression prediction and achieves strong performance on cell type annotation tasks.

Publication

scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data

DOI: 10.1038/s42256-022-00534-z

Links

📄 Read Paper 💻 GitHub

Specifications

  • ArchitectureTransformer Encoder (BERT-style)
  • Parameters~100M
  • Pretraining DataPanglaoDB (1M+ cells)
  • ModalityscRNA-seq

Tags