Migration Guide / Changelog

SyNG-BTS v3.0 is a breaking release that replaces the file-centric API with a Pythonic, DataFrame-friendly interface. This guide covers the key changes and all subsequent version updates. See also Changes in v3.1 and Changes in v3.2 below.

Function Renames 

v2.x	v3.0	Returns
`PilotExperiment(...)`	`pilot_study(...)`	`PilotResult`
`ApplyExperiment(...)`	`generate(...)`	`SyngResult`
`TransferExperiment(...)`	`transfer(...)`	`SyngResult`

Parameter Changes 

v2.x Parameter	v3.0 Parameter	Notes
`dataname="SKCMPositive_4"`	`data="SKCMPositive_4"`	Also accepts a DataFrame or file path
`data_dir="./input/"`	(removed)	Pass the full file path via `data`
`path="./input/"`	(removed)	Was deprecated in v2; removed in v3
`early_stop_num=30`	`early_stop_patience=30`	Renamed for clarity
`fromname` / `toname`	`source_data` / `target_data`	Now accept DataFrames too
`fromsize`	(removed)	`transfer()` no longer generates from the source phase; source is pre-training only

Data Input 

v2.x — String name only, with optional directory

v3.0 — DataFrame, file path, or bundled name:

# v3.0 — bundled dataset (same as before, shorter param name)
result = generate(data="SKCMPositive_4", ...)

# v3.0 — file path
result = generate(data="./my_data/my_data.csv", ...)

# v3.0 — DataFrame (new!)
import pandas as pd
df = pd.read_csv("./my_data/my_data.csv")
result = generate(data=df, name="my_data", ...)

Output Handling 

v2.x — Results always written to disk

v3.0 — Results returned in memory; disk write is optional:

# v3.0 — no files written by default
result = generate(data="SKCMPositive_4", ...)

# Access data directly
generated = result.generated_data         # pd.DataFrame
loss_log  = result.loss                   # pd.DataFrame
recons    = result.reconstructed_data     # pd.DataFrame or None
state     = result.model_state            # dict (state_dict)

# Optional: save to disk
result.save("./my_output/")

# Or pass output_dir to save automatically
result = generate(data="SKCMPositive_4", ..., output_dir="./my_output/")

Plotting 

v2.x — Standalone functions

v3.0 — Methods on result objects (never call plt.show()):

# v3.0
result = generate(data="SKCMPositive_4", ...)

figs = result.plot_loss()        # dict[str, Figure] per loss column
fig = result.plot_heatmap()      # heatmap of generated data

# For pilot studies
pilot = pilot_study(data="SKCMPositive_4", pilot_size=[50, 100], ...)
figs = pilot.plot_loss()                          # dict[(pilot, draw)] -> dict[str, Figure]
figs = pilot.plot_loss(style="overlay_runs")                # dict[str, Figure], all runs overlaid
figs = pilot.plot_loss(style="mean_band")          # dict[str, Figure], mean ± std

Evaluation Functions 

The evaluation() function now accepts DataFrames or dataset names:

# v3.0
figs = evaluation(
    real_data="SKCMPositive_4",
    generated_data=result.generated_data,
)

Quick Migration Checklist 

Update imports: PilotExperiment → pilot_study, etc.
Replace dataname= with data=.
Replace early_stop_num= with early_stop_patience=.
Remove data_dir / path parameters; pass full paths via data.
Capture the return value (SyngResult / PilotResult).
Access generated data via result.generated_data instead of reading CSVs.
Use result.save(output_dir) when you need files on disk.
Replace standalone plot calls with result.plot_loss() / result.plot_heatmap().
Update eval calls: dat_real → real_data, random_state → random_seed.

Changes in v3.1 

SyNG-BTS v3.1 mainly tightens data and evaluation contracts.

resolve_data() now returns (dataframe, groups_or_none). Update callers to unpack both values.
Group labels are explicit API inputs (groups=, source_groups=, target_groups=); do not pass metadata columns inside feature data.
User feature DataFrames are strict: numeric columns only; groups and samples columns are rejected.
Bundled datasets are stored as Parquet (transparent via loader APIs).
Generated/reconstructed outputs are returned in count scale when apply_log=True.
SyngResult/PilotResult include original_data.
transfer() is single-run only and returns a SyngResult; pilot controls (pilot_size, n_draws) and source_size are removed.
evaluation() uses real_groups/generated_groups; load_data() and load_dataset() are removed.

# v3.0
data = resolve_data("SKCMPositive_4")

# v3.1
data, groups = resolve_data("SKCMPositive_4")

v3.1 Quick Migration Checklist 

Unpack loader returns: data, groups = resolve_data(...).
Pass labels via groups= (or transfer equivalents), not DataFrame columns.
Remove groups / samples columns from user input DataFrames.
Replace group_names= with real_groups= / generated_groups=.
Replace load_data() / load_dataset() with resolve_data().

Changes in v3.2 

SyNG-BTS v3.2 is an internal refactor with no breaking changes to public API outputs or result schemas.

transfer() is now single-run only: the parameters pilot_size, n_draws, and source_size are removed from its signature. Use pilot_study() for multi-draw sweeps.
Training orchestration is centralized in an internal orchestrate_training() helper that consolidates early-stop resolution, data augmentation, batch-size computation, and model dispatch — no change in training behavior.
_train_model(), _run_generate(), and _run_pilot() private helpers are removed; all code paths now use the unified orchestrator.

v3.2 Migration Checklist 

Remove pilot_size, n_draws, and source_size arguments from any transfer() calls; these parameters no longer exist.
No other public API changes — all other call sites are unaffected.