Configuration Reference
This page documents all configuration parameters available in SyNG-BTS.
Available Models
SyNG-BTS supports several deep generative models for data augmentation:
Model Code |
Description |
|---|---|
|
Variational Autoencoder with 1:10 reconstruction/KL loss ratio |
|
VAE with 1:20 loss ratio |
|
Conditional VAE with 1:10 loss ratio |
|
Conditional VAE with 1:20 loss ratio |
|
Generative Adversarial Network |
|
Wasserstein GAN with Gradient Penalty |
|
Masked Autoregressive Flow |
Common Parameters
These parameters are shared across all experiment functions (generate,
pilot_study, transfer):
Data Parameters
Parameter |
Type |
Description |
|---|---|---|
|
DataFrame, str, or Path |
Input data — a pandas DataFrame, a path to a CSV file, or the name
of a bundled dataset (e.g. |
|
str or None |
Short name for output filenames. Derived automatically from data
when |
|
str, Path, or None |
If set, save results to this directory. When |
Training Parameters
Parameter |
Type |
Description |
|---|---|---|
|
str |
The generative model to use (e.g. |
|
float |
Batch size as a fraction of training data (default: 0.1) |
|
float |
Learning rate for optimizer (default: 0.0005) |
|
int or None |
Number of training epochs. If |
|
int or None |
Stop if loss does not improve for this many epochs. |
|
bool |
Apply |
|
int |
Random seed for reproducibility (default: 123). |
Generation Parameters
Parameter |
Type |
Description |
|---|---|---|
|
int or list[int] |
Generation size (default: 500).
|
|
list[int] |
Sample sizes to evaluate (only for |
|
int |
Number of replicated random draws per pilot size (default: 5).
Used in |
Augmentation Parameters
Parameter |
Type |
Description |
|---|---|---|
|
str or None |
Offline augmentation mode: |
|
int |
Fold multiplier for AE-head augmentation (default: 2). |
|
int |
Fold multiplier for Gaussian-head augmentation (default: 9). |
Advanced Parameters (generate only)
Parameter |
Type |
Description |
|---|---|---|
|
float |
Validation split ratio for AE family (default: 0.2). |
|
bool |
Enable learning-rate scheduler for AE family (default: |
|
int |
Scheduler step size (default: 10). |
|
float |
Scheduler gamma (default: 0.5). |
|
bool |
Cap generated values to observed range (default: |
Model Architecture Parameters
Parameter |
Type |
Description |
|---|---|---|
|
bool |
Use a wider encoder/decoder for the CVAE model (default: |
generate() Parameters
from syng_bts import generate
result = generate(
data="SKCMPositive_4", # Data input (required)
name=None, # Output name (auto-derived)
new_size=500, # Samples to generate
model="VAE1-10", # Model specification
apply_log=True, # Log-transform data
batch_frac=0.1, # Batch fraction
learning_rate=0.0005, # Learning rate
epoch=None, # Epochs (None=early stopping)
early_stop_patience=None, # Early stopping patience
off_aug=None, # Offline augmentation
AE_head_num=2, # AE-head folds
Gaussian_head_num=9, # Gaussian-head folds
use_scheduler=False, # LR scheduler
cap=False, # Cap generated values
random_seed=123, # Random seed
output_dir=None, # Output directory
)
pilot_study() Parameters
from syng_bts import pilot_study
result = pilot_study(
data="SKCMPositive_4", # Data input (required)
pilot_size=[50, 100], # Pilot sizes (required)
name=None, # Output name (auto-derived)
n_draws=5, # Draws per pilot size
model="VAE1-10", # Model specification
batch_frac=0.1, # Batch fraction
learning_rate=0.0005, # Learning rate
epoch=None, # Epochs (None=early stopping)
early_stop_patience=30, # Early stopping patience
off_aug=None, # Offline augmentation
AE_head_num=2, # AE-head folds
Gaussian_head_num=9, # Gaussian-head folds
random_seed=123, # Random seed
output_dir=None, # Output directory
)
transfer() Parameters
from syng_bts import transfer
result = transfer(
source_data="PRAD", # Source dataset (required)
target_data="BRCA", # Target dataset (required)
source_name=None, # Source name (auto-derived)
target_name=None, # Target name (auto-derived)
new_size=500, # Target generation size
model="maf", # Model specification
apply_log=True, # Log-transform data
batch_frac=0.1, # Batch fraction
learning_rate=0.0005, # Learning rate
epoch=None, # Epochs (None=early stopping)
early_stop_patience=30, # Early stopping patience
off_aug=None, # Offline augmentation
random_seed=123, # Random seed
output_dir=None, # Output directory
)
Output and Saving
In v3.0, no files are written by default. Results stay in memory as
SyngResult or PilotResult objects. To persist results to disk, either:
Pass
output_dirto the experiment function, orCall
result.save(output_dir)on the returned object.
result = generate(data="SKCMPositive_4", model="VAE1-10", epoch=5)
# Option 1: Save later
paths = result.save("./my_output/")
print(paths)
# {'generated': PosixPath('./my_output/SKCMPositive_4_VAE1-10_generated.csv'),
# 'loss': PosixPath('./my_output/SKCMPositive_4_VAE1-10_loss.csv'), ...}
# Option 2: Save automatically
result = generate(
data="SKCMPositive_4", model="VAE1-10", epoch=5,
output_dir="./auto_output/",
)
Bundled Datasets
SyNG-BTS includes several bundled datasets for testing and examples:
from syng_bts import list_bundled_datasets, resolve_data
# List all available datasets
print(list_bundled_datasets())
# ['SKCMPositive_4', 'BRCA', 'PRAD', 'BRCASubtypeSel', ...]
# Load a bundled dataset as a DataFrame
data, groups = resolve_data("SKCMPositive_4")
print(f"Shape: {data.shape}")
Available bundled datasets (see Example Datasets for details):
Examples:
SKCMPositive_4Transfer Learning:
BRCA,PRADBRCA Subtype:
BRCASubtypeSel,BRCASubtypeSel_train,BRCASubtypeSel_testLIHC Subtype:
LIHCSubtypeFamInd,LIHCSubtypeFamInd_DESeq, and more