SyNG-BTS Documentation

SyNG-BTS (Synthesis of Next Generation Bulk Transcriptomic Sequencing) is a Python package for data augmentation of bulk transcriptomic sequencing data using deep generative models.

PyPI version Python 3.10+ License: AGPL v3

Overview

SyNG-BTS synthesizes transcriptomics data with realistic distributions without relying on predefined formulas. It supports three types of deep generative models:

  • Variational Autoencoders (VAE/CVAE) - For general data augmentation

  • Generative Adversarial Networks (GAN/WGANGP) - Alternative generative approach

  • Flow-based Models (MAF) - For transfer learning scenarios

These models are trained on pilot datasets and used to generate synthetic samples for any desired sample size.

Quick Start

Install SyNG-BTS:

pip install syng-bts

Generate synthetic data with generate():

from syng_bts import generate

result = generate(data="SKCMPositive_4", model="VAE1-10", epoch=5)
print(result.generated_data.shape)
figs = result.plot_loss()

Run a pilot study with pilot_study():

from syng_bts import pilot_study

result = pilot_study(
    data="SKCMPositive_4",
    pilot_size=[50, 100],
    model="VAE1-10",
    early_stop_patience=30,
)
print(result.summary())

Browse and load full TCGA cohorts with list_tcga_datasets() and load_tcga_dataset():

from syng_bts import list_tcga_datasets, load_tcga_dataset

list_tcga_datasets(short=True)
ds = load_tcga_dataset("BRCA")
real_df, real_groups = ds.real("TC")
print(real_df.shape)

For more details, see the Usage Guide guide, or the TCGA Datasets guide for full TCGA cohort access. For upgrading from v2.x, see the Migration Guide / Changelog guide.

Citation

If you use SyNG-BTS in your research, please cite:

Qi Y, Wang X, Qin LX. Optimizing sample size for supervised machine learning with bulk transcriptomic sequencing: a learning curve approach. Brief Bioinform. 2025 Mar 4;26(2):bbaf097. doi: 10.1093/bib/bbaf097. PMID: 40072846; PMCID: PMC11899567. https://doi.org/10.1093/bib/bbaf097

Contents

Indices and tables