DAG Construction Guide¤
Datarax uses a Directed Acyclic Graph (DAG) model to represent data pipelines. This guide explains how to construct and execute DAGs.
Introduction¤
A DAG in Datarax consists of Nodes representing operations (data sources, transformations, batching) and Edges representing data flow.
Building a DAG¤
You build a pipeline by instantiating the Pipeline class with a
source, a list of stages, a batch size, and an nnx.Rngs instance.
For DAG-shaped pipelines (branching or merging), use
Pipeline.from_dag(...).
1. Linear Pipeline¤
from flax import nnx
from datarax.pipeline import Pipeline
from datarax.sources import MemorySource, MemorySourceConfig
source = MemorySource(MemorySourceConfig(), data=data, rngs=nnx.Rngs(0))
pipeline = Pipeline(
source=source,
stages=[op1, op2],
batch_size=32,
rngs=nnx.Rngs(0),
)
2. Execute¤
The pipeline is iterable and supports pipeline.scan(...) for
whole-epoch JIT compilation.
3. Branching DAG¤
Use Pipeline.from_dag when stages need to branch or merge. Each
node declares its predecessors via the edges mapping and the sink
selects the output node.
pipeline = Pipeline.from_dag(
source=source,
nodes={"augment": aug, "normalize": norm, "merge": merge},
edges={"augment": [], "normalize": [], "merge": ["augment", "normalize"]},
sink="merge",
batch_size=32,
rngs=nnx.Rngs(0),
)
For runnable recipes (parallel, merge variants — stack/average/concat —
and conditional Branch), see the
Branching DAG Cookbook.
Stage Types¤
Any nnx.Module whose __call__(batch) -> batch transforms the
batch can be used as a stage. Datarax also provides:
- OperatorModule subclasses (e.g.
BrightnessOperator,NoiseOperator): receive anElement, return an updatedElement. Pipeline detects these and uses an optimized fast path. - Plain
nnx.Module: receives the dict batch directly. Use this for user-defined transforms.
API Reference¤
For full details on available nodes and execution options, see the DAG API Reference.