Evaluation Functions
This page documents the evaluation and visualization functions in SyNG-BTS.
Overview
SyNG-BTS provides multiple ways to evaluate generated data:
On result objects (recommended):
result.plot_loss()— dict of per-column training loss figuresresult.plot_heatmap()— heatmap of generated or reconstructed data
Standalone functions for comparing real vs. generated data:
heatmap_eval()— side-by-side heatmap comparisonUMAP_eval()— 2D UMAP scatter plot comparisonevaluation()— combined heatmap + UMAP pipeline
Result Object Plotting
The simplest way to visualize results is through the
SyngResult methods:
from syng_bts import generate
result = generate(data="SKCMPositive_4", model="VAE1-10", epoch=500)
# Training loss (one figure per loss column)
figs = result.plot_loss(running_average_window=50)
# Heatmap of generated data
fig_heat = result.plot_heatmap(which="generated")
# Heatmap of reconstructed data (AE/VAE/CVAE only)
fig_recon = result.plot_heatmap(which="reconstructed")
For pilot studies, plot all runs at once:
from syng_bts import pilot_study
pilot = pilot_study(data="SKCMPositive_4", pilot_size=[50, 100], model="VAE1-10")
# All runs overlaid, one figure per loss column (default)
figs = pilot.plot_loss() # style="overlay_runs" (default)
# Per-run dicts of per-column figures
figs = pilot.plot_loss(style="per_run")
# Mean ± std across runs
figs = pilot.plot_loss(style="mean_band")
Heatmap Evaluation
Compare real and generated data with heatmaps.
- syng_bts.heatmap_eval(real_data: DataFrame, generated_data: DataFrame | None = None, *, apply_log: bool = True, cmap: str = 'YlGnBu') matplotlib.figure.Figure[source]
Create a heatmap visualization comparing real and generated data.
If only one dataset is provided, displays a single heatmap. If both real and generated data are provided, displays them side by side.
- Parameters:
real_data (pd.DataFrame) – The original/real data.
generated_data (pd.DataFrame or None, optional) – The generated/synthetic data. If
None, only real_data is plotted.apply_log (bool, default True) – Whether to apply
log2(x + 1)transformation to both real and generated data before visualization.cmap (str, default
"YlGnBu") – Colormap passed toseaborn.heatmap().
- Returns:
The matplotlib Figure containing the heatmap(s).
- Return type:
Figure
Examples
Example 1: Visualize only real data
from syng_bts import resolve_data, heatmap_eval
real_data, _groups = resolve_data("SKCMPositive_4")
fig = heatmap_eval(real_data=real_data.head(50))
Example 2: Compare real and generated data
from syng_bts import generate, resolve_data, heatmap_eval
result = generate(data="SKCMPositive_4", model="VAE1-10", epoch=5)
real_data, _groups = resolve_data("SKCMPositive_4")
fig = heatmap_eval(
real_data=real_data.head(50),
generated_data=result.generated_data.head(50),
)
UMAP Visualization
Visualize real and generated data distributions using UMAP.
- syng_bts.UMAP_eval(real_data: DataFrame, generated_data: DataFrame | None = None, *, apply_log: bool = True, groups_real: Series | None = None, groups_generated: Series | None = None, random_seed: int = 42, legend_pos: str = 'best') matplotlib.figure.Figure[source]
Create a UMAP visualization comparing real and generated data.
Uses UMAP dimensionality reduction to visualize high-dimensional data in 2D, with optional group colouring.
- Parameters:
real_data (pd.DataFrame) – The original/real data.
generated_data (pd.DataFrame or None, optional) – The generated/synthetic data. If
None, only real_data is visualised.apply_log (bool, default True) – Whether to apply
log2(x + 1)transformation to both real and generated data before dimensionality reduction.groups_real (pd.Series or None, optional) – Group labels for real samples. Used for styling.
groups_generated (pd.Series or None, optional) – Group labels for generated samples. Used for styling.
random_seed (int, default 42) – Random seed for UMAP reproducibility.
legend_pos (str, default
"best") – Legend position ("best","upper right","lower left", …).
- Returns:
The matplotlib Figure containing the UMAP scatter plot.
- Return type:
Figure
Examples
Example 1: Compare real and generated data
from syng_bts import generate, resolve_data, UMAP_eval
result = generate(data="SKCMPositive_4", model="VAE1-10", epoch=500)
real_data, _groups = resolve_data("SKCMPositive_4")
fig = UMAP_eval(
real_data=real_data,
generated_data=result.generated_data,
random_seed=42,
)
Example 2: UMAP with group labels
import pandas as pd
from syng_bts import resolve_data, UMAP_eval
real_data, _groups = resolve_data("SKCMPositive_4")
groups_real = pd.Series(["Group A", "Group B"] * (len(real_data) // 2))
fig = UMAP_eval(
real_data=real_data,
groups_real=groups_real,
random_seed=42,
legend_pos="best",
)
Comprehensive Evaluation
Run combined heatmap + UMAP evaluation in a single call.
- syng_bts.evaluation(real_data: DataFrame | str | Path, generated_data: DataFrame | str | Path, *, real_groups: Series | ndarray | list | tuple | Index | None = None, generated_groups: Series | ndarray | list | tuple | Index | None = None, n_samples: int | None = 200, apply_log: bool = True, random_seed: int = 42) dict[str, matplotlib.figure.Figure][source]
Preprocessing and visualization of generated vs real data.
Loads and preprocesses the input data, then creates heatmap and UMAP visualizations comparing generated and real datasets.
- Parameters:
real_data (pd.DataFrame, str, or Path) – The original/real dataset. Accepts a DataFrame, a file path, or a bundled dataset name (resolved via
resolve_data()).generated_data (pd.DataFrame, str, or Path) – The generated/synthetic dataset. Same input types as real_data.
real_groups (pd.Series, np.ndarray, list, tuple, pd.Index, or None, optional) – Group labels for the real samples. When provided, takes precedence over any bundled groups resolved from real_data. Values are used as-is for plot labels (converted to
str).generated_groups (pd.Series, np.ndarray, list, tuple, pd.Index, or None, optional) – Group labels for the generated samples. When provided, takes precedence over any bundled groups resolved from generated_data. Values are used as-is for plot labels (converted to
str).n_samples (int or None, default 200) – Number of samples from each end of the dataset to use for visualization (to keep UMAP fast). If
None, all samples are used.apply_log (bool, default True) – Whether to apply
log2(x + 1)transformation to both real and generated data before comparison.random_seed (int, default 42) – Random seed for UMAP reproducibility.
- Returns:
{"heatmap": <Figure>, "umap": <Figure>}— the two evaluation figures. Neither figure has been displayed; the caller decides when to callplt.show()orfig.savefig().- Return type:
Example
The evaluation function accepts DataFrames, file paths, or bundled dataset
names (via resolve_data) and returns a dict of figures:
from syng_bts import generate, evaluation
result = generate(data="SKCMPositive_4", model="VAE1-10", epoch=5)
figs = evaluation(
real_data="SKCMPositive_4",
generated_data=result.generated_data,
n_samples=200,
)
figs["heatmap"].savefig("heatmap.png")
figs["umap"].savefig("umap.png")
Evaluation Workflow
A typical end-to-end workflow:
from syng_bts import generate, resolve_data, heatmap_eval, UMAP_eval
# Step 1: Generate synthetic data
result = generate(
data="SKCMPositive_4",
model="VAE1-10",
new_size=500,
batch_frac=0.1,
learning_rate=0.0005,
)
# Step 2: Load original data for comparison
real_data, _groups = resolve_data("SKCMPositive_4")
# Step 3: Visualize training loss (one figure per loss column)
figs_loss = result.plot_loss()
# Step 4: Compare with UMAP
fig_umap = UMAP_eval(
real_data=real_data,
generated_data=result.generated_data,
random_seed=42,
)
# Step 5: Compare with heatmap
fig_heatmap = heatmap_eval(
real_data=real_data.head(50),
generated_data=result.generated_data.head(50),
)
See Synthetic Data Generation for more information on running experiments.