Framework Comparison¤
This page describes the comparative analysis framework. Live results are available on the W&B dashboard with interactive charts, comparison tables, and filtering.
Metrics¤
Every benchmark measures three primary metric groups:
| Group | Metrics | Direction |
|---|---|---|
| Throughput | Elements processed per second | Higher is better |
| Latency | Per-batch time (p50, p95, p99) | Lower is better |
| Memory | Peak RSS, GPU memory | Lower is better |
The W&B dashboard groups these automatically using slash notation (Throughput/throughput/Datarax).
Scenario Coverage¤
Each adapter supports only the scenarios where it implements the required transforms. This ensures fair comparisons — every framework performs equivalent work on each scenario.
| Framework | CV-1 | NLP-1 | TAB-1 | MM-1 | DIST-1 | PR-1 | Total |
|---|---|---|---|---|---|---|---|
| Datarax | 28 | ||||||
| Google Grain | 5 | ||||||
| tf.data | 5 | ||||||
| PyTorch DataLoader | 5 | ||||||
| NVIDIA DALI | 4 | ||||||
| SPDL | 4 | ||||||
| MosaicML Streaming | 2 | ||||||
| WebDataset | 2 | ||||||
| HuggingFace Datasets | 2 | ||||||
| Ray Data | 2 | ||||||
| jax-dataloader | 1 | ||||||
| FFCV | 1 | ||||||
| LitData | 1 | ||||||
| Deep Lake | 1 | ||||||
| Energon | 1 |
Datarax supports all 28 scenarios
The table above shows the 6 most widely supported scenarios. Datarax supports the full 28-scenario catalog, including AUG-1/AUG-2/AUG-3, PC-1 through PC-5, IO-1 through IO-4, and NNX-1/XFMR-1.
Fair Comparison Views¤
Use two complementary views to avoid misleading conclusions:
- Same-backend, shared-coverage view (canonical): compare frameworks only on the scenario intersection they all support on the same backend/hardware profile.
- Native-optimal view: compare frameworks on scenarios that represent each framework's strongest native capabilities.
Canonical published cloud reports use on-demand Vast A100 runs, profile-controlled scenario sets, and manifest/backend-truth validation.
Visualization¤
The W&B dashboard provides interactive versions of these chart types:
| Chart | What It Shows |
|---|---|
| Comparison Table | All frameworks side-by-side with best values highlighted |
| Throughput Bars | Grouped bar chart — elem/s per scenario per framework |
| Latency Distribution | Per-batch latency distribution across frameworks |
| Memory Profile | Peak RSS comparison across frameworks |
| Ranking Tables | Per-metric rankings with delta-from-best percentages |
For local chart generation (offline use), the benchmarks.visualization.charts module can produce static plots:
from pathlib import Path
from benchmarks.runners.full_runner import ComparativeResults
from benchmarks.visualization.charts import ChartGenerator
results = ComparativeResults.load(Path("benchmark-data/reports/latest"))
gen = ChartGenerator(results, Path("benchmark-data/charts"))
gen.generate_all()
Comparative Analysis¤
Strengths¤
Scenarios where Datarax leads other frameworks by >1.2x. These represent areas where the JAX-native architecture provides clear advantages:
- Pipeline Complexity (PC-*): Datarax's DAG execution engine handles complex multi-branch pipelines that other frameworks cannot express
- Datarax Unique (NNX-1, XFMR-1): Features like Flax NNX module integration and JIT+vmap transform acceleration are exclusive to Datarax
Comparable Performance¤
Scenarios where performance is within 0.8x-1.2x of the closest alternative.
Optimization Opportunities¤
Scenarios where other frameworks lead. Each gap is mapped to a prioritized optimization target. The gap detector generates an optimization backlog:
from pathlib import Path
from benchmarks.analysis.gap_detection import GapDetector
from benchmarks.runners.full_runner import ComparativeResults
results = ComparativeResults.load(Path("benchmark-data/reports/latest"))
detector = GapDetector(results)
detector.generate_backlog(Path("benchmark-data/optimization_backlog.md"))
Viewing Results¤
W&B Dashboard¤
After running benchmarks, export to W&B for interactive exploration:
See Dashboard & calibrax for setup details.
Terminal Summary¤
For a quick local overview without W&B: