SyNG-BTS Documentation
======================

**SyNG-BTS** (Synthesis of Next Generation Bulk Transcriptomic Sequencing) is a Python package
for data augmentation of bulk transcriptomic sequencing data using deep generative models.

.. image:: https://badge.fury.io/py/syng-bts.svg
   :target: https://badge.fury.io/py/syng-bts
   :alt: PyPI version

.. image:: https://img.shields.io/badge/python-3.10+-blue.svg
   :target: https://www.python.org/downloads/
   :alt: Python 3.10+

.. image:: https://img.shields.io/badge/License-AGPL%20v3-blue.svg
   :target: https://www.gnu.org/licenses/agpl-3.0
   :alt: License: AGPL v3

Overview
--------

SyNG-BTS synthesizes transcriptomics data with realistic distributions without relying on
predefined formulas. It supports three types of deep generative models:

- **Variational Autoencoders (VAE/CVAE)** - For general data augmentation
- **Generative Adversarial Networks (GAN/WGANGP)** - Alternative generative approach
- **Flow-based Models (MAF)** - For transfer learning scenarios

These models are trained on pilot datasets and used to generate synthetic samples
for any desired sample size.

Quick Start
-----------

Install SyNG-BTS:

.. code-block:: bash

   pip install syng-bts

Generate synthetic data with :func:`~syng_bts.generate`:

.. code-block:: python

   from syng_bts import generate

   result = generate(data="SKCMPositive_4", model="VAE1-10", epoch=5)
   print(result.generated_data.shape)
   figs = result.plot_loss()

Run a pilot study with :func:`~syng_bts.pilot_study`:

.. code-block:: python

   from syng_bts import pilot_study

   result = pilot_study(
       data="SKCMPositive_4",
       pilot_size=[50, 100],
       model="VAE1-10",
       early_stop_patience=30,
   )
   print(result.summary())

Browse and load full TCGA cohorts with :func:`~syng_bts.list_tcga_datasets` and :func:`~syng_bts.load_tcga_dataset`:

.. code-block:: python

   from syng_bts import list_tcga_datasets, load_tcga_dataset

   list_tcga_datasets(short=True)
   ds = load_tcga_dataset("BRCA")
   real_df, real_groups = ds.real("TC")
   print(real_df.shape)

For more details, see the :doc:`usage` guide, or the :doc:`tcga` guide for full TCGA cohort access.
For upgrading from v2.x, see the :doc:`migration` guide.

Citation
--------

If you use SyNG-BTS in your research, please cite:

   Qi Y, Wang X, Qin LX. Optimizing sample size for supervised machine 
   learning with bulk transcriptomic sequencing: a learning curve approach. 
   Brief Bioinform. 2025 Mar 4;26(2):bbaf097. doi: 10.1093/bib/bbaf097. 
   PMID: 40072846; PMCID: PMC11899567. https://doi.org/10.1093/bib/bbaf097

Contents
--------

.. toctree::
   :maxdepth: 2
   :caption: Getting Started

   usage
   migration

.. toctree::
   :maxdepth: 2
   :caption: User Guide

   methods
   evals
   synthesize
   tcga
   datasets
   configuration

.. toctree::
   :maxdepth: 2
   :caption: Reference

   api

Links
-----

- **GitHub Repository**: https://github.com/Omics-Data-Synthesis/SyNG-BTS
- **Documentation**: https://syng-bts.readthedocs.io/
- **PyPI Package**: https://pypi.org/project/syng-bts/
- **Issue Tracker**: https://github.com/Omics-Data-Synthesis/SyNG-BTS/issues

Indices and tables
------------------

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`