Skip to content

Benchmarking¤

Performance measurement and analysis tools for data pipelines. Use these tools to measure throughput, identify bottlenecks, and track performance regressions.

Tools Overview¤

Tool Purpose Output
TimingCollector Measure samples/sec with GPU sync Throughput metrics
GPUMemoryProfiler GPU memory profiling Memory usage stats
MemoryOptimizer Pipeline memory analysis Optimization suggestions
detect_regressions Track over time Regression alerts
rank_table Compare frameworks Ranked performance tables
AdvancedMonitor Real-time monitoring Live metrics + alerts

Benchmarking best practices

  • Always warm up pipelines before benchmarking (JIT compilation)
  • Use block_until_ready() for accurate JAX timing
  • Comparative benchmarks control for variance automatically
  • Profile first, optimize second

Quick Start¤

from calibrax.profiling import TimingCollector

# Measure throughput (CPU — pass sync_fn for GPU)
timer = TimingCollector()
result = timer.measure_iteration(
    iter(pipeline),
    num_batches=100,
    count_fn=lambda batch: batch["image"].shape[0],
)
throughput = result.num_elements / result.wall_clock_sec
print(f"Throughput: {throughput:.2f} samples/sec")
print(f"First batch: {result.first_batch_time:.4f}s (includes JIT)")

Modules¤

  • profiler - GPU memory profiling and hardware-adaptive optimization
  • comparative - Compare configurations side-by-side
  • regression - Detect performance regressions over time
  • monitor - Real-time performance monitoring and alerting
  • timing - Framework-agnostic timing with GPU sync
  • statistics - Statistical analysis with bootstrap CI
  • resource_monitor - Background resource sampling
  • results - Serializable benchmark result containers

GPU Memory Profiling¤

from calibrax.profiling import GPUMemoryProfiler, MemoryOptimizer

# Check GPU memory usage
profiler = GPUMemoryProfiler()
usage = profiler.get_memory_usage()
print(f"GPU memory: {usage['gpu_memory_used_mb']:.1f} MB used")

# Analyze pipeline memory patterns
optimizer = MemoryOptimizer()
analysis = optimizer.analyze_pipeline_memory(pipeline_fn, sample_data)
for suggestion in analysis["suggestions"]:
    print(f"  Suggestion: {suggestion}")

See Also¤