Skip to content

Configuration System¤

Datarax uses a typed configuration system based on Python dataclasses. All module configurations inherit from a base class hierarchy that provides validation, immutability guarantees, and consistent behavior.

Configuration Hierarchy¤

DataraxModuleConfig (base)
├── OperatorConfig (mutable, learnable)
│   ├── MapOperatorConfig
│   ├── ElementOperatorConfig
│   └── BatchMixOperatorConfig
└── StructuralConfig (immutable, compile-time)
    ├── HFEagerConfig
    └── TFDSEagerConfig

★ Insight ─────────────────────────────────────

  • OperatorConfig: For modules with learnable parameters (mutable state)
  • StructuralConfig: For modules with fixed behavior (frozen after creation)
  • All configs validate on construction (__post_init__)
  • For shuffle configurations, use seed=42 parameter

─────────────────────────────────────────────────

Base Configuration¤

All configs inherit common settings:

from datarax.core.config import DataraxModuleConfig

# Base attributes available to all configs:
config = DataraxModuleConfig(
    cacheable=False,           # Enable caching
    batch_stats_fn=None,       # Dynamic statistics function
    precomputed_stats=None,    # Static statistics
)

Mutual Exclusivity

batch_stats_fn and precomputed_stats cannot both be set.

Operator Configuration¤

For operators with mutable state and potential randomness:

from datarax.core.config import OperatorConfig

# Deterministic operator
config = OperatorConfig(
    stochastic=False,
)

# Stochastic operator (requires stream_name)
config = OperatorConfig(
    stochastic=True,
    stream_name="augment",  # Required!
)

Validation Rules¤

stochastic stream_name Result
False None ✅ Valid deterministic
True "name" ✅ Valid stochastic
True None ❌ Error: stream_name required
False "name" ❌ Error: stream_name forbidden

Specific Operator Configs¤

ElementOperatorConfig¤

For element-level transformations:

from datarax.core.config import ElementOperatorConfig

# Deterministic
config = ElementOperatorConfig(stochastic=False)

# Stochastic
config = ElementOperatorConfig(
    stochastic=True,
    stream_name="element_aug",
)

MapOperatorConfig¤

For per-array-leaf transformations:

from datarax.core.config import MapOperatorConfig

# Full-tree mode (transform entire element)
config = MapOperatorConfig(subtree=None)

# Subtree mode (transform specific fields)
config = MapOperatorConfig(
    subtree={"image": None, "mask": None},
)

BatchMixOperatorConfig¤

For batch-level mixing (MixUp/CutMix):

from datarax.core.config import BatchMixOperatorConfig

# MixUp
config = BatchMixOperatorConfig(
    mode="mixup",
    alpha=0.4,
    label_field="label",
)

# CutMix
config = BatchMixOperatorConfig(
    mode="cutmix",
    alpha=1.0,
    data_field="image",
)

Note

BatchMixOperatorConfig is always stochastic - stochastic=True is forced.

Structural Configuration¤

For modules with frozen configuration (compile-time constants):

from datarax.core.config import StructuralConfig

config = StructuralConfig(
    stochastic=False,
    stream_name=None,
)

# After creation, config is frozen
config.stochastic = True  # Raises FrozenInstanceError!

Why Frozen?¤

Structural configs are frozen because:

  1. They represent compile-time constants for JIT
  2. Changing config after construction breaks invariants
  3. Immutability prevents subtle bugs

Creating Custom Configs¤

Extend the base classes for custom modules:

from dataclasses import dataclass
from datarax.core.config import OperatorConfig

@dataclass(frozen=True)
class MyCustomOperatorConfig(OperatorConfig):
    """Configuration for MyCustomOperator."""

    # Custom fields
    strength: float = 1.0
    mode: str = "default"

    def __post_init__(self):
        # Validate custom fields
        if self.strength <= 0:
            raise ValueError("strength must be positive")
        if self.mode not in ("default", "advanced"):
            raise ValueError(f"Unknown mode: {self.mode}")

        # Call parent validation
        super().__post_init__()

Using with Modules¤

from datarax.operators import ElementOperator
from datarax.core.config import ElementOperatorConfig

# Create config
config = ElementOperatorConfig(
    stochastic=True,
    stream_name="noise",
)

# Pass to module
operator = ElementOperator(
    config,
    fn=my_transform_fn,
    rngs=nnx.Rngs(42),
)

See Also¤


API Reference¤

datarax.core.config ¤

Configuration dataclasses for Datarax modules.

This module provides typed configuration classes for all Datarax modules:

  • DataraxModuleConfig: Base configuration for all modules
  • OperatorConfig: Configuration for parametric operators
  • StructuralConfig: Configuration for structural processors (runtime immutable)

All configs use dataclass with post_init validation for fail-fast configuration errors.

logger module-attribute ¤

logger = getLogger(__name__)

DataraxModuleConfig dataclass ¤

DataraxModuleConfig(cacheable: bool = False, batch_stats_fn: Callable | Module | None = None, precomputed_stats: dict[str, Any] | None = None)

Base configuration for all Datarax modules.

All module configs are dataclasses with post_init validation. Configuration is validated at construction, before being passed to module.

Child classes inherit from this and add their specific configuration.

Attributes:

Name Type Description
cacheable bool

Whether to enable caching for this module

batch_stats_fn Callable | Module | None

Function or module to compute batch statistics dynamically

precomputed_stats dict[str, Any] | None

Static precomputed statistics

Validation Rules:

- batch_stats_fn and precomputed_stats are mutually exclusive

cacheable class-attribute instance-attribute ¤

cacheable: bool = False

batch_stats_fn class-attribute instance-attribute ¤

batch_stats_fn: Callable | Module | None = None

precomputed_stats class-attribute instance-attribute ¤

precomputed_stats: dict[str, Any] | None = None

OperatorConfig dataclass ¤

OperatorConfig(cacheable: bool = False, batch_stats_fn: Callable | Module | None = None, precomputed_stats: dict[str, Any] | None = None, stochastic: bool = False, stream_name: str | None = None, batch_strategy: str = 'vmap')

Bases: DataraxModuleConfig

Configuration for OperatorModule (mutable, learnable).

Inherits from DataraxModuleConfig:

- cacheable: bool
- batch_stats_fn: Callable | nnx.Module | None
- precomputed_stats: dict[str, Any] | None

Adds operator-specific configuration:

- stochastic: bool
- stream_name: str | None

Validation Rules:

- Inherits mutual exclusivity of statistics from parent
- Stochastic operators require stream_name for RNG management
- Deterministic operators should not specify stream_name

Attributes:

Name Type Description
stochastic bool

Whether this operator uses randomness

stream_name str | None

RNG stream name (required if stochastic=True)

stochastic class-attribute instance-attribute ¤

stochastic: bool = False

stream_name class-attribute instance-attribute ¤

stream_name: str | None = None

batch_strategy class-attribute instance-attribute ¤

batch_strategy: str = 'vmap'

cacheable class-attribute instance-attribute ¤

cacheable: bool = False

batch_stats_fn class-attribute instance-attribute ¤

batch_stats_fn: Callable | Module | None = None

precomputed_stats class-attribute instance-attribute ¤

precomputed_stats: dict[str, Any] | None = None

MapOperatorConfig dataclass ¤

MapOperatorConfig(cacheable: bool = False, batch_stats_fn: Callable | Module | None = None, precomputed_stats: dict[str, Any] | None = None, stochastic: bool = False, stream_name: str | None = None, batch_strategy: str = 'vmap', subtree: PyTree | None = None)

Bases: OperatorConfig

Configuration for MapOperator - unified deterministic/stochastic operator.

Inherits from OperatorConfig:

- cacheable: bool
- batch_stats_fn: Callable | nnx.Module | None
- precomputed_stats: dict[str, Any] | None
- stochastic: bool (currently must be False - stochastic mode not yet implemented)
- stream_name: str | None

Adds MapOperator-specific configuration:

- subtree: PyTree | None - Nested dict matching element.data structure
  If None, user fn is applied to full element (full-tree mode)
  If specified, only the specified subtree is affected (subtree mode)

Validation Rules:

- Inherits all validation from OperatorConfig
- Currently enforces stochastic=False (NotImplementedError if True)

Attributes:

Name Type Description
subtree PyTree | None

Optional PyTree mask specifying which parts of data to transform. Structure must match element.data. Use None as leaf to indicate field should be transformed. Example: {"image": None, "mask": None}

Examples:

Full-tree mode:

from datarax.core.config import MapOperatorConfig

config = MapOperatorConfig(subtree=None, stochastic=False)

Subtree mode (single field):

config = MapOperatorConfig(subtree={"image": None}, stochastic=False)

Subtree mode (multiple fields):

config = MapOperatorConfig(
    subtree={"image": None, "mask": None},
    stochastic=False
)

subtree class-attribute instance-attribute ¤

subtree: PyTree | None = None

cacheable class-attribute instance-attribute ¤

cacheable: bool = False

batch_stats_fn class-attribute instance-attribute ¤

batch_stats_fn: Callable | Module | None = None

precomputed_stats class-attribute instance-attribute ¤

precomputed_stats: dict[str, Any] | None = None

stochastic class-attribute instance-attribute ¤

stochastic: bool = False

stream_name class-attribute instance-attribute ¤

stream_name: str | None = None

batch_strategy class-attribute instance-attribute ¤

batch_strategy: str = 'vmap'

ElementOperatorConfig dataclass ¤

ElementOperatorConfig(cacheable: bool = False, batch_stats_fn: Callable | Module | None = None, precomputed_stats: dict[str, Any] | None = None, stochastic: bool = False, stream_name: str | None = None, batch_strategy: str = 'vmap')

Bases: OperatorConfig

Configuration for ElementOperator - element-level transformation operator.

Inherits from OperatorConfig:

- cacheable: bool
- batch_stats_fn: Callable | nnx.Module | None
- precomputed_stats: dict[str, Any] | None
- stochastic: bool
- stream_name: str | None

ElementOperator applies user-provided functions to entire Element structures (data + state + metadata), enabling coordinated transformations across multiple fields and access to element state.

Validation Rules:

- Inherits all validation from OperatorConfig

Examples:

Deterministic element transformation:

from datarax.core.config import ElementOperatorConfig

config = ElementOperatorConfig(stochastic=False)

Stochastic element augmentation:

config = ElementOperatorConfig(stochastic=True, stream_name="augment")

cacheable class-attribute instance-attribute ¤

cacheable: bool = False

batch_stats_fn class-attribute instance-attribute ¤

batch_stats_fn: Callable | Module | None = None

precomputed_stats class-attribute instance-attribute ¤

precomputed_stats: dict[str, Any] | None = None

stochastic class-attribute instance-attribute ¤

stochastic: bool = False

stream_name class-attribute instance-attribute ¤

stream_name: str | None = None

batch_strategy class-attribute instance-attribute ¤

batch_strategy: str = 'vmap'

BatchMixOperatorConfig dataclass ¤

BatchMixOperatorConfig(cacheable: bool = False, batch_stats_fn: Callable | Module | None = None, precomputed_stats: dict[str, Any] | None = None, stochastic: bool = True, stream_name: str | None = 'batch_mix', batch_strategy: str = 'vmap', mode: str = 'mixup', alpha: float = 1.0, data_field: str = 'image', label_field: str = 'label')

Bases: OperatorConfig

Configuration for BatchMixOperator - unified MixUp and CutMix batch augmentation.

Inherits from OperatorConfig:

- cacheable: bool
- batch_stats_fn: Callable | nnx.Module | None
- precomputed_stats: dict[str, Any] | None
- stochastic: bool (always True for BatchMixOperator)
- stream_name: str | None

BatchMixOperator performs batch-level sample mixing that cannot be decomposed into element-level operations. It mixes samples across the batch, either through linear interpolation (MixUp) or patch cutting/pasting (CutMix).

Attributes:

Name Type Description
mode str

Mixing mode - "mixup" or "cutmix"

alpha float

Beta distribution parameter for mixing ratio (default: 1.0)

data_field str

Field name containing data to mix (default: "image" for cutmix)

label_field str

Field name containing labels to mix (default: "label")

Validation Rules:

- mode must be "mixup" or "cutmix"
- alpha must be positive
- Always stochastic (forced to True)

Examples:

MixUp augmentation:

from datarax.core.config import BatchMixOperatorConfig

config = BatchMixOperatorConfig(mode="mixup", alpha=0.4)

CutMix augmentation:

config = BatchMixOperatorConfig(mode="cutmix", alpha=1.0)

mode class-attribute instance-attribute ¤

mode: str = 'mixup'

alpha class-attribute instance-attribute ¤

alpha: float = 1.0

data_field class-attribute instance-attribute ¤

data_field: str = 'image'

label_field class-attribute instance-attribute ¤

label_field: str = 'label'

stochastic class-attribute instance-attribute ¤

stochastic: bool = True

stream_name class-attribute instance-attribute ¤

stream_name: str | None = 'batch_mix'

cacheable class-attribute instance-attribute ¤

cacheable: bool = False

batch_stats_fn class-attribute instance-attribute ¤

batch_stats_fn: Callable | Module | None = None

precomputed_stats class-attribute instance-attribute ¤

precomputed_stats: dict[str, Any] | None = None

batch_strategy class-attribute instance-attribute ¤

batch_strategy: str = 'vmap'

StructuralConfig dataclass ¤

StructuralConfig(cacheable: bool = False, batch_stats_fn: Callable | Module | None = None, precomputed_stats: dict[str, Any] | None = None, stochastic: bool = False, stream_name: str | None = None)

Bases: DataraxModuleConfig

Configuration for StructuralModule (runtime immutable, compile-time constants).

Inherits from DataraxModuleConfig:

- cacheable: bool
- batch_stats_fn: Callable | nnx.Module | None
- precomputed_stats: dict[str, Any] | None

Adds structural-specific configuration:

- stochastic: bool
- stream_name: str | None

Note: This config enforces runtime immutability through setattr override. After post_init completes, the instance is frozen and cannot be modified. All configuration must be known at module construction time for JIT compilation.

Validation Rules:

- Inherits mutual exclusivity of statistics from parent
- Stochastic structural modules require stream_name for RNG management
- Deterministic structural modules should not specify stream_name

Attributes:

Name Type Description
stochastic bool

Whether this structural module uses randomness (e.g., sampling)

stream_name str | None

RNG stream name (required if stochastic=True)

stochastic class-attribute instance-attribute ¤

stochastic: bool = False

stream_name class-attribute instance-attribute ¤

stream_name: str | None = None

cacheable class-attribute instance-attribute ¤

cacheable: bool = False

batch_stats_fn class-attribute instance-attribute ¤

batch_stats_fn: Callable | Module | None = None

precomputed_stats class-attribute instance-attribute ¤

precomputed_stats: dict[str, Any] | None = None

validate_stochastic_config ¤

validate_stochastic_config(stochastic: bool, stream_name: str | None) -> None

Validate stochastic configuration rules.

Parameters:

Name Type Description Default
stochastic bool

Whether the module uses randomness

required
stream_name str | None

RNG stream name (required if stochastic=True)

required

Raises:

Type Description
ValueError

If validation rules are violated