Configuration System¤
Datarax uses a typed configuration system based on Python dataclasses. All module configurations inherit from a base class hierarchy that provides validation, immutability guarantees, and consistent behavior.
Configuration Hierarchy¤
DataraxModuleConfig (base)
├── OperatorConfig (mutable, learnable)
│ ├── MapOperatorConfig
│ ├── ElementOperatorConfig
│ └── BatchMixOperatorConfig
└── StructuralConfig (immutable, compile-time)
├── HFEagerConfig
└── TFDSEagerConfig
★ Insight ─────────────────────────────────────
- OperatorConfig: For modules with learnable parameters (mutable state)
- StructuralConfig: For modules with fixed behavior (frozen after creation)
- All configs validate on construction (
__post_init__) - For shuffle configurations, use
seed=42parameter
─────────────────────────────────────────────────
Base Configuration¤
All configs inherit common settings:
from datarax.core.config import DataraxModuleConfig
# Base attributes available to all configs:
config = DataraxModuleConfig(
cacheable=False, # Enable caching
batch_stats_fn=None, # Dynamic statistics function
precomputed_stats=None, # Static statistics
)
Mutual Exclusivity
batch_stats_fn and precomputed_stats cannot both be set.
Operator Configuration¤
For operators with mutable state and potential randomness:
from datarax.core.config import OperatorConfig
# Deterministic operator
config = OperatorConfig(
stochastic=False,
)
# Stochastic operator (requires stream_name)
config = OperatorConfig(
stochastic=True,
stream_name="augment", # Required!
)
Validation Rules¤
| stochastic | stream_name | Result |
|---|---|---|
False |
None |
✅ Valid deterministic |
True |
"name" |
✅ Valid stochastic |
True |
None |
❌ Error: stream_name required |
False |
"name" |
❌ Error: stream_name forbidden |
Specific Operator Configs¤
ElementOperatorConfig¤
For element-level transformations:
from datarax.core.config import ElementOperatorConfig
# Deterministic
config = ElementOperatorConfig(stochastic=False)
# Stochastic
config = ElementOperatorConfig(
stochastic=True,
stream_name="element_aug",
)
MapOperatorConfig¤
For per-array-leaf transformations:
from datarax.core.config import MapOperatorConfig
# Full-tree mode (transform entire element)
config = MapOperatorConfig(subtree=None)
# Subtree mode (transform specific fields)
config = MapOperatorConfig(
subtree={"image": None, "mask": None},
)
BatchMixOperatorConfig¤
For batch-level mixing (MixUp/CutMix):
from datarax.core.config import BatchMixOperatorConfig
# MixUp
config = BatchMixOperatorConfig(
mode="mixup",
alpha=0.4,
label_field="label",
)
# CutMix
config = BatchMixOperatorConfig(
mode="cutmix",
alpha=1.0,
data_field="image",
)
Note
BatchMixOperatorConfig is always stochastic - stochastic=True is forced.
Structural Configuration¤
For modules with frozen configuration (compile-time constants):
from datarax.core.config import StructuralConfig
config = StructuralConfig(
stochastic=False,
stream_name=None,
)
# After creation, config is frozen
config.stochastic = True # Raises FrozenInstanceError!
Why Frozen?¤
Structural configs are frozen because:
- They represent compile-time constants for JIT
- Changing config after construction breaks invariants
- Immutability prevents subtle bugs
Creating Custom Configs¤
Extend the base classes for custom modules:
from dataclasses import dataclass
from datarax.core.config import OperatorConfig
@dataclass(frozen=True)
class MyCustomOperatorConfig(OperatorConfig):
"""Configuration for MyCustomOperator."""
# Custom fields
strength: float = 1.0
mode: str = "default"
def __post_init__(self):
# Validate custom fields
if self.strength <= 0:
raise ValueError("strength must be positive")
if self.mode not in ("default", "advanced"):
raise ValueError(f"Unknown mode: {self.mode}")
# Call parent validation
super().__post_init__()
Using with Modules¤
from datarax.operators import ElementOperator
from datarax.core.config import ElementOperatorConfig
# Create config
config = ElementOperatorConfig(
stochastic=True,
stream_name="noise",
)
# Pass to module
operator = ElementOperator(
config,
fn=my_transform_fn,
rngs=nnx.Rngs(42),
)
See Also¤
- Element Operator - Using ElementOperatorConfig
- Composite Operator - CompositeOperatorConfig
- HF Source - HFEagerConfig
- TFDS Source - TFDSEagerConfig
API Reference¤
datarax.core.config ¤
Configuration dataclasses for Datarax modules.
This module provides typed configuration classes for all Datarax modules:
- DataraxModuleConfig: Base configuration for all modules
- OperatorConfig: Configuration for parametric operators
- StructuralConfig: Configuration for structural processors (runtime immutable)
All configs use dataclass with post_init validation for fail-fast configuration errors.
DataraxModuleConfig
dataclass
¤
DataraxModuleConfig(cacheable: bool = False, batch_stats_fn: Callable | Module | None = None, precomputed_stats: dict[str, Any] | None = None)
Base configuration for all Datarax modules.
All module configs are dataclasses with post_init validation. Configuration is validated at construction, before being passed to module.
Child classes inherit from this and add their specific configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
cacheable |
bool
|
Whether to enable caching for this module |
batch_stats_fn |
Callable | Module | None
|
Function or module to compute batch statistics dynamically |
precomputed_stats |
dict[str, Any] | None
|
Static precomputed statistics |
Validation Rules:
- batch_stats_fn and precomputed_stats are mutually exclusive
OperatorConfig
dataclass
¤
OperatorConfig(cacheable: bool = False, batch_stats_fn: Callable | Module | None = None, precomputed_stats: dict[str, Any] | None = None, stochastic: bool = False, stream_name: str | None = None, batch_strategy: str = 'vmap')
Bases: DataraxModuleConfig
Configuration for OperatorModule (mutable, learnable).
Inherits from DataraxModuleConfig:
- cacheable: bool
- batch_stats_fn: Callable | nnx.Module | None
- precomputed_stats: dict[str, Any] | None
Adds operator-specific configuration:
- stochastic: bool
- stream_name: str | None
Validation Rules:
- Inherits mutual exclusivity of statistics from parent
- Stochastic operators require stream_name for RNG management
- Deterministic operators should not specify stream_name
Attributes:
| Name | Type | Description |
|---|---|---|
stochastic |
bool
|
Whether this operator uses randomness |
stream_name |
str | None
|
RNG stream name (required if stochastic=True) |
MapOperatorConfig
dataclass
¤
MapOperatorConfig(cacheable: bool = False, batch_stats_fn: Callable | Module | None = None, precomputed_stats: dict[str, Any] | None = None, stochastic: bool = False, stream_name: str | None = None, batch_strategy: str = 'vmap', subtree: PyTree | None = None)
Bases: OperatorConfig
Configuration for MapOperator - unified deterministic/stochastic operator.
Inherits from OperatorConfig:
- cacheable: bool
- batch_stats_fn: Callable | nnx.Module | None
- precomputed_stats: dict[str, Any] | None
- stochastic: bool (currently must be False - stochastic mode not yet implemented)
- stream_name: str | None
Adds MapOperator-specific configuration:
- subtree: PyTree | None - Nested dict matching element.data structure
If None, user fn is applied to full element (full-tree mode)
If specified, only the specified subtree is affected (subtree mode)
Validation Rules:
- Inherits all validation from OperatorConfig
- Currently enforces stochastic=False (NotImplementedError if True)
Attributes:
| Name | Type | Description |
|---|---|---|
subtree |
PyTree | None
|
Optional PyTree mask specifying which parts of data to transform. Structure must match element.data. Use None as leaf to indicate field should be transformed. Example: {"image": None, "mask": None} |
Examples:
Full-tree mode:
from datarax.core.config import MapOperatorConfig
config = MapOperatorConfig(subtree=None, stochastic=False)
Subtree mode (single field):
Subtree mode (multiple fields):
precomputed_stats
class-attribute
instance-attribute
¤
ElementOperatorConfig
dataclass
¤
ElementOperatorConfig(cacheable: bool = False, batch_stats_fn: Callable | Module | None = None, precomputed_stats: dict[str, Any] | None = None, stochastic: bool = False, stream_name: str | None = None, batch_strategy: str = 'vmap')
Bases: OperatorConfig
Configuration for ElementOperator - element-level transformation operator.
Inherits from OperatorConfig:
- cacheable: bool
- batch_stats_fn: Callable | nnx.Module | None
- precomputed_stats: dict[str, Any] | None
- stochastic: bool
- stream_name: str | None
ElementOperator applies user-provided functions to entire Element structures (data + state + metadata), enabling coordinated transformations across multiple fields and access to element state.
Validation Rules:
- Inherits all validation from OperatorConfig
Examples:
Deterministic element transformation:
from datarax.core.config import ElementOperatorConfig
config = ElementOperatorConfig(stochastic=False)
Stochastic element augmentation:
precomputed_stats
class-attribute
instance-attribute
¤
BatchMixOperatorConfig
dataclass
¤
BatchMixOperatorConfig(cacheable: bool = False, batch_stats_fn: Callable | Module | None = None, precomputed_stats: dict[str, Any] | None = None, stochastic: bool = True, stream_name: str | None = 'batch_mix', batch_strategy: str = 'vmap', mode: str = 'mixup', alpha: float = 1.0, data_field: str = 'image', label_field: str = 'label')
Bases: OperatorConfig
Configuration for BatchMixOperator - unified MixUp and CutMix batch augmentation.
Inherits from OperatorConfig:
- cacheable: bool
- batch_stats_fn: Callable | nnx.Module | None
- precomputed_stats: dict[str, Any] | None
- stochastic: bool (always True for BatchMixOperator)
- stream_name: str | None
BatchMixOperator performs batch-level sample mixing that cannot be decomposed into element-level operations. It mixes samples across the batch, either through linear interpolation (MixUp) or patch cutting/pasting (CutMix).
Attributes:
| Name | Type | Description |
|---|---|---|
mode |
str
|
Mixing mode - "mixup" or "cutmix" |
alpha |
float
|
Beta distribution parameter for mixing ratio (default: 1.0) |
data_field |
str
|
Field name containing data to mix (default: "image" for cutmix) |
label_field |
str
|
Field name containing labels to mix (default: "label") |
Validation Rules:
- mode must be "mixup" or "cutmix"
- alpha must be positive
- Always stochastic (forced to True)
Examples:
MixUp augmentation:
from datarax.core.config import BatchMixOperatorConfig
config = BatchMixOperatorConfig(mode="mixup", alpha=0.4)
CutMix augmentation:
precomputed_stats
class-attribute
instance-attribute
¤
StructuralConfig
dataclass
¤
StructuralConfig(cacheable: bool = False, batch_stats_fn: Callable | Module | None = None, precomputed_stats: dict[str, Any] | None = None, stochastic: bool = False, stream_name: str | None = None)
Bases: DataraxModuleConfig
Configuration for StructuralModule (runtime immutable, compile-time constants).
Inherits from DataraxModuleConfig:
- cacheable: bool
- batch_stats_fn: Callable | nnx.Module | None
- precomputed_stats: dict[str, Any] | None
Adds structural-specific configuration:
- stochastic: bool
- stream_name: str | None
Note: This config enforces runtime immutability through setattr override. After post_init completes, the instance is frozen and cannot be modified. All configuration must be known at module construction time for JIT compilation.
Validation Rules:
- Inherits mutual exclusivity of statistics from parent
- Stochastic structural modules require stream_name for RNG management
- Deterministic structural modules should not specify stream_name
Attributes:
| Name | Type | Description |
|---|---|---|
stochastic |
bool
|
Whether this structural module uses randomness (e.g., sampling) |
stream_name |
str | None
|
RNG stream name (required if stochastic=True) |
validate_stochastic_config ¤
Validate stochastic configuration rules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stochastic
|
bool
|
Whether the module uses randomness |
required |
stream_name
|
str | None
|
RNG stream name (required if stochastic=True) |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If validation rules are violated |