Benchmarking¤

External package

This page documents calibrax, the benchmarking library datarax depends on.

Performance measurement and analysis tools for data pipelines. Use these tools to measure throughput, identify bottlenecks, and track performance regressions.

Tools Overview¤

Tool	Purpose	Output
TimingCollector	Measure samples/sec with GPU sync	Throughput metrics
GPUMemoryProfiler	GPU memory profiling	Memory usage stats
MemoryOptimizer	Pipeline memory analysis	Optimization suggestions
detect_regressions	Track over time	Regression alerts
rank_table	Compare frameworks	Ranked performance tables
AdvancedMonitor	Real-time monitoring	Live metrics + alerts

Benchmarking best practices

Always warm up pipelines before benchmarking (JIT compilation)
Use block_until_ready() for accurate JAX timing
Attach confidence bounds via Metric(lower=, upper=, samples=) and use calibrax.statistics for significance testing
Profile first, optimize second

Quick Start¤

from calibrax.profiling import TimingCollector

# Measure throughput (CPU — pass sync_fn for GPU)
timer = TimingCollector()
result = timer.measure_iteration(
    iter(pipeline),
    num_batches=100,
    count_fn=lambda batch: batch["image"].shape[0],
)
throughput = result.num_elements / result.wall_clock_sec
print(f"Throughput: {throughput:.2f} samples/sec")
print(f"First batch: {result.first_batch_time:.4f}s (includes JIT)")

Modules¤

profiler - GPU memory profiling and hardware-adaptive optimization
comparative - Compare configurations side-by-side
regression - Detect performance regressions over time
monitor - Real-time performance monitoring and alerting
timing - Framework-agnostic timing with GPU sync
statistics - Statistical analysis with bootstrap CI
resource_monitor - Background resource sampling
results - Serializable benchmark result containers

GPU Memory Profiling¤

from calibrax.profiling import GPUMemoryProfiler, MemoryOptimizer

# Check GPU memory usage
profiler = GPUMemoryProfiler()
usage = profiler.get_memory_usage()
print(f"GPU memory: {usage['gpu_memory_used_mb']:.1f} MB used")

# Analyze pipeline memory patterns
optimizer = MemoryOptimizer()
analysis = optimizer.analyze_pipeline_memory(pipeline_fn, sample_data)
if analysis is not None:
    for suggestion in analysis.suggestions:
        print(f"  Suggestion: {suggestion}")