Skip to content

Regression Testing¤

Detect performance regressions over time.

See Also¤

Overview¤

The detect_regressions() function compares a current Run against a baseline Run and flags metrics that degraded beyond a configurable threshold (default 5%). Detection is direction-aware: throughput decreases are regressions, but latency decreases are improvements. Metrics with direction info are skipped.

Points are matched between runs using a composite key of (name, tags), ensuring that "CV-1/small for Datarax" is compared against the correct baseline even when multiple frameworks share the same point name.

Quick Start¤

from calibrax.analysis import detect_regressions
from calibrax.core import Metric, MetricDef, MetricDirection, Point, Run

baseline = Run(
    points=(
        Point(
            name="CV-1/small",
            scenario="CV-1",
            tags={"framework": "Datarax"},
            metrics={"throughput": Metric(value=20000.0)},
        ),
    ),
    metric_defs={
        "throughput": MetricDef(
            name="throughput",
            unit="elem/s",
            direction=MetricDirection.HIGHER,
        ),
    },
)

current = Run(
    points=(
        Point(
            name="CV-1/small",
            scenario="CV-1",
            tags={"framework": "Datarax"},
            metrics={"throughput": Metric(value=18000.0)},
        ),
    ),
    metric_defs=baseline.metric_defs,
)

regressions = detect_regressions(current, baseline, threshold=0.05)
for r in regressions:
    print(f"  {r.metric} on {r.point_name}: {r.delta_pct:+.1f}%")
    print(f"    baseline={r.baseline_value:.0f} -> current={r.current_value:.0f}")

CI Integration¤

The calibrax check CLI command wraps detect_regressions() for CI pipelines:

calibrax check --data benchmark-data/ --threshold 0.05

Exits with code 1 if any regressions exceed the threshold.


calibrax.analysis.regression ¤

Regression detection for benchmark runs.

Compares a current run against a baseline to flag metrics that degraded beyond a specified threshold.

detect_regressions ¤

detect_regressions(run: Run, baseline: Run, threshold: float = 0.05) -> list[Regression]

Flag metrics that degraded beyond threshold.

Uses MetricDef.direction: 'higher' metrics regress when they decrease, 'lower' metrics regress when they increase. 'info' metrics are skipped.

Parameters:

Name Type Description Default
run Run

Current benchmark run.

required
baseline Run

Baseline run to compare against.

required
threshold float

Relative change threshold (e.g. 0.05 = 5%).

0.05

Returns:

Type Description
list[Regression]

List of detected regressions.