Regression Testing¤
Detect performance regressions over time.
See Also¤
- Benchmarking Overview - All benchmarking tools
- Comparative - Compare configurations
- Testing Guide - Test infrastructure
- Benchmarking Guide
Overview¤
The detect_regressions() function compares a current Run against a baseline Run and flags metrics that degraded beyond a configurable threshold (default 5%). Detection is direction-aware: throughput decreases are regressions, but latency decreases are improvements. Metrics with direction info are skipped.
Points are matched between runs using a composite key of (name, tags), ensuring that "CV-1/small for Datarax" is compared against the correct baseline even when multiple frameworks share the same point name.
Quick Start¤
from calibrax.analysis import detect_regressions
from calibrax.core import Metric, MetricDef, MetricDirection, Point, Run
baseline = Run(
points=(
Point(
name="CV-1/small",
scenario="CV-1",
tags={"framework": "Datarax"},
metrics={"throughput": Metric(value=20000.0)},
),
),
metric_defs={
"throughput": MetricDef(
name="throughput",
unit="elem/s",
direction=MetricDirection.HIGHER,
),
},
)
current = Run(
points=(
Point(
name="CV-1/small",
scenario="CV-1",
tags={"framework": "Datarax"},
metrics={"throughput": Metric(value=18000.0)},
),
),
metric_defs=baseline.metric_defs,
)
regressions = detect_regressions(current, baseline, threshold=0.05)
for r in regressions:
print(f" {r.metric} on {r.point_name}: {r.delta_pct:+.1f}%")
print(f" baseline={r.baseline_value:.0f} -> current={r.current_value:.0f}")
CI Integration¤
The calibrax check CLI command wraps detect_regressions() for CI pipelines:
Exits with code 1 if any regressions exceed the threshold.
calibrax.analysis.regression ¤
Regression detection for benchmark runs.
Compares a current run against a baseline to flag metrics that degraded beyond a specified threshold.
detect_regressions ¤
Flag metrics that degraded beyond threshold.
Uses MetricDef.direction: 'higher' metrics regress when they decrease, 'lower' metrics regress when they increase. 'info' metrics are skipped.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
run
|
Run
|
Current benchmark run. |
required |
baseline
|
Run
|
Baseline run to compare against. |
required |
threshold
|
float
|
Relative change threshold (e.g. 0.05 = 5%). |
0.05
|
Returns:
| Type | Description |
|---|---|
list[Regression]
|
List of detected regressions. |