Control¤
Pipeline control flow and execution management utilities. These modules handle asynchronous operations and data prefetching for optimal performance.
Components¤
| Component | Purpose | Benefit |
|---|---|---|
| Prefetcher | Async data loading | Hide I/O latency |
★ Insight ─────────────────────────────────────
- Prefetching loads next batch while GPU processes current
- Overlaps I/O and compute for better throughput
- Most useful when I/O is the bottleneck
- Works automatically with Pipeline
─────────────────────────────────────────────────
Quick Start¤
from datarax.control import Prefetcher
# Wrap iterator with prefetching
prefetcher = Prefetcher(
iterator=pipeline,
prefetch_count=2, # Keep 2 batches ready
)
for batch in prefetcher:
# Next batch loads while this one processes
train_step(batch)
Modules¤
- prefetcher - Asynchronous data prefetching for pipeline optimization
How Prefetching Works¤
Without prefetching:
[Load B1] [Process B1] [Load B2] [Process B2] ...
^-- GPU idle during load
With prefetching:
[Load B1] [Load B2 ] [Load B3 ] ...
[Process B1] [Process B2] ...
^-- GPU always busy
Integration with DAG¤
from datarax.pipeline import Pipeline
# Prefetching is built into Pipeline
pipeline = Pipeline(source=source, stages=[], batch_size=32, rngs=nnx.Rngs(0))
See Also¤
- DAG Executor - Pipeline execution
- Performance - Optimization tools
- Benchmarking - Measure improvements