Roofline Model¤
Analyze compute vs memory bottlenecks.
See Also¤
- Performance Overview - All performance tools
- XLA Optimization - XLA tuning
- Benchmarking - Measure performance
- NNX Best Practices
datarax.performance.roofline ¤
Roofline analysis for hardware-aware performance optimization.
This module provides tools for analyzing operations based on the roofline model to identify performance bottlenecks and suggest optimizations.
HARDWARE_SPECS
module-attribute
¤
HARDWARE_SPECS = {'tpu_v5e': HardwareSpecs(peak_flops_bf16=197000000000000.0, hbm_bandwidth=820000000000.0, vmem_bandwidth=18000000000000.0, critical_intensity=240, matrix_unit_size=(128, 128), optimal_batch_size=240, memory_layout='row_major', use_vmem_optimization=True, preferred_tile_size=128), 'h100': HardwareSpecs(peak_flops_bf16=989000000000000.0, hbm_bandwidth=3350000000000.0, critical_intensity=298, tensor_core_shapes=[(16, 16, 8), (32, 8, 16)], optimal_batch_size=298, memory_layout='NHWC', use_vmem_optimization=False, preferred_tile_size=16), 'a100': HardwareSpecs(peak_flops_bf16=312000000000000.0, hbm_bandwidth=1550000000000.0, critical_intensity=201, tensor_core_shapes=[(16, 16, 8)], optimal_batch_size=128, memory_layout='NHWC', use_vmem_optimization=False, preferred_tile_size=16), 'cpu': HardwareSpecs(peak_flops_bf16=1000000000000.0, hbm_bandwidth=100000000000.0, critical_intensity=10, optimal_batch_size=32, memory_layout='row_major', use_vmem_optimization=False, preferred_tile_size=64)}
HardwareSpecs
dataclass
¤
HardwareSpecs(peak_flops_bf16: float, hbm_bandwidth: float, critical_intensity: float, optimal_batch_size: int, matrix_unit_size: tuple[int, int] | None = None, vmem_bandwidth: float | None = None, memory_layout: str = 'row_major', use_vmem_optimization: bool = False, tensor_core_shapes: list[tuple[int, int, int]] | None = None, preferred_tile_size: int = 128)
RooflineAnalyzer ¤
RooflineAnalyzer(hardware: str = 'auto')
Analyze operations based on roofline model for performance optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hardware
|
str
|
Target hardware ('tpu_v5e', 'h100', 'a100', 'cpu', 'auto') |
'auto'
|
analyze_operation ¤
analyze_operation(func: Callable, *args: Any, output_shape: tuple | None = None, **kwargs: Any) -> dict[str, Any]
Analyze a JAX operation using roofline model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func
|
Callable
|
Function to analyze |
required |
args
|
Any
|
Arguments to the function |
()
|
output_shape
|
tuple | None
|
Optional output shape for memory estimation |
None
|
kwargs
|
Any
|
Keyword arguments to the function |
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Analysis dict with performance metrics and recommendations |