Patch Dropout Operator¤

Drop rectangular patches from images for regularization.

datarax.operators.modality.image.patch_dropout_operator ¤

PatchDropoutOperator - Operator for patch-based occlusion augmentation.

This operator extends ModalityOperator to provide patch-based dropout (occlusion).

Key Features:

Drops random rectangular patches from images
Configurable number of patches and patch size
Deterministic mode with fixed patch positions
Stochastic mode with random patch positions per sample
Full JAX compatibility with JIT compilation

Examples:

Basic usage:

config = PatchDropoutOperatorConfig(
    field_key="image",
    num_patches=4,
    patch_size=(8, 8),
    drop_value=0.0
)
op = PatchDropoutOperator(config, rngs=rngs)

logger `module-attribute` ¤

logger = logging.getLogger(__name__)

PatchDropoutOperatorConfig `dataclass` ¤

PatchDropoutOperatorConfig(cacheable: bool = False, batch_stats_fn: Callable | Module | None = None, precomputed_stats: dict[str, Any] | None = None, stochastic: bool = False, stream_name: str | None = None, batch_strategy: str = 'vmap', *, field_key: str, target_key: str | None = None, auxiliary_fields: list[str] | None = None, clip_range: tuple[float, float] | None = None, preserve_auxiliary: bool = True, validate_domain_constraints: bool = True, num_patches: int = 4, patch_size: tuple[int, int] = (8, 8), drop_value: float = 0.0)

Bases: ModalityOperatorConfig

Configuration for PatchDropoutOperator.

Extends ModalityOperatorConfig with patch dropout-specific parameters.

Attributes:

Name	Type	Description
`num_patches`	`int`	Number of rectangular patches to drop from each image. Default: 4
`patch_size`	`tuple[int, int]`	Size of each patch as (height, width) tuple. Default: (8, 8)
`drop_value`	`float`	Value to fill dropped patches with. Typically 0.0 for black or the mean image value. Default: 0.0
`clip_range`	`tuple[float, float] \| None`	Range for clipping output values. None means no clipping. Default: None (patch dropout preserves valid ranges)

num_patches `class-attribute` `instance-attribute` ¤

num_patches: int = field(default=4, kw_only=True)

patch_size `class-attribute` `instance-attribute` ¤

patch_size: tuple[int, int] = field(default=(8, 8), kw_only=True)

drop_value `class-attribute` `instance-attribute` ¤

drop_value: float = field(default=0.0, kw_only=True)

cacheable `class-attribute` `instance-attribute` ¤

cacheable: bool = False

batch_stats_fn `class-attribute` `instance-attribute` ¤

batch_stats_fn: Callable | Module | None = None

precomputed_stats `class-attribute` `instance-attribute` ¤

precomputed_stats: dict[str, Any] | None = None

stochastic `class-attribute` `instance-attribute` ¤

stochastic: bool = False

stream_name `class-attribute` `instance-attribute` ¤

stream_name: str | None = None

batch_strategy `class-attribute` `instance-attribute` ¤

batch_strategy: str = 'vmap'

field_key `class-attribute` `instance-attribute` ¤

field_key: str = field(kw_only=True)

target_key `class-attribute` `instance-attribute` ¤

target_key: str | None = field(default=None, kw_only=True)

auxiliary_fields `class-attribute` `instance-attribute` ¤

auxiliary_fields: list[str] | None = field(default=None, kw_only=True)

clip_range `class-attribute` `instance-attribute` ¤

clip_range: tuple[float, float] | None = field(default=None, kw_only=True)

preserve_auxiliary `class-attribute` `instance-attribute` ¤

preserve_auxiliary: bool = field(default=True, kw_only=True)

validate_domain_constraints `class-attribute` `instance-attribute` ¤

validate_domain_constraints: bool = field(default=True, kw_only=True)

PatchDropoutOperator ¤

PatchDropoutOperator(config: PatchDropoutOperatorConfig, *, rngs: Rngs)

Bases: ModalityOperator

Image patch dropout transformation operator.

Applies patch dropout by randomly dropping rectangular regions from images:

- Selects num_patches random positions
- Replaces each patch with drop_value
- Useful for occlusion robustness training

Supports three modes: 1. Deterministic: Fixed patch positions using fixed seed 2. Stochastic: Per-sample random patch positions from generate_random_params() 3. External params: Accept pre-generated random parameters

The operator works on single elements (H, W, C images) and is composed into batch processing via apply_batch() from the base class.

Examples:

Deterministic patch dropout:

config = PatchDropoutOperatorConfig(
    field_key="image",
    num_patches=4,
    patch_size=(16, 16),
    drop_value=0.0,
    stochastic=False
)
operator = PatchDropoutOperator(config, rngs=nnx.Rngs(0))
result, state, metadata = operator.apply(data, state, metadata)

Stochastic patch dropout with random positions:

config = PatchDropoutOperatorConfig(
    field_key="image",
    num_patches=8,
    patch_size=(8, 8),
    drop_value=0.5,
    stochastic=True
)
operator = PatchDropoutOperator(config, rngs=nnx.Rngs(0))
# Use apply_batch() for automatic random param generation
result, state, metadata = operator.apply_batch(batch_data, state, metadata)

Parameters:

Name	Type	Description	Default
`config`	`PatchDropoutOperatorConfig`	Configuration for patch dropout operation	required
`rngs`	`Rngs`	RNG streams for stochastic operations	required

config `instance-attribute` ¤

config: PatchDropoutOperatorConfig = config

rngs `instance-attribute` ¤

rngs = rngs

name `instance-attribute` ¤

name = nnx.static(name)

stochastic `instance-attribute` ¤

stochastic = nnx.static(config.stochastic)

stream_name `instance-attribute` ¤

stream_name = nnx.static(config.stream_name)

generate_random_params ¤

generate_random_params(element_keys: Array, data_shapes: dict[str, tuple[int, ...]]) -> dict[str, Array]

Generate per-record patch positions from per-record PRNG keys.

Each record's patch positions are drawn from its own key (fold_in(base_key, global_index)), so they are reproducible per record regardless of batch composition, shuffle, host count, or resume.

Parameters:

Name	Type	Description	Default
`element_keys`	`Array`	`(batch_size,)` per-record PRNG keys.	required
`data_shapes`	`dict[str, tuple[int, ...]]`	Dictionary mapping field keys to their shapes (image dims).	required

Returns:

Type	Description
`dict[str, Array]`	Dictionary with: "patch_positions": Array of patch top-left positions Shape: (batch_size, num_patches, 2) where last dim is (y, x)

Raises:

Type	Description
`KeyError`	If field_key not in data_shapes

apply ¤

apply(data: dict[str, Array], state: dict[str, Any], metadata: dict[str, Any], random_params: dict[str, Array] | None = None, stats: dict[str, Any] | None = None) -> tuple[dict[str, Array], dict[str, Any], dict[str, Any]]

Apply patch dropout transformation to a single element.

This operates on single elements (e.g., one image of shape [H, W, C]). For batch processing, use apply_batch() which handles random param generation.

Parameters:

Name	Type	Description	Default
`data`	`dict[str, Array]`	Input data dictionary. Must contain field specified by config.field_key	required
`state`	`dict[str, Any]`	Operator state (unused for patch dropout, passed through)	required
`metadata`	`dict[str, Any]`	Metadata dictionary (passed through unchanged)	required
`random_params`	`dict[str, Array] \| None`	Optional random parameters from generate_random_params(). If config.stochastic=True and this is provided, uses random_params["patch_positions"] for patch locations.	`None`
`stats`	`dict[str, Any] \| None`	Optional statistics dictionary (unused)	`None`

Returns:

Type	Description
`tuple[dict[str, Array], dict[str, Any], dict[str, Any]]`	Tuple of (transformed_data, state, metadata) - transformed_data: Data dict with patches dropped from target field - state: Unchanged state dict - metadata: Unchanged metadata dict

Note

CRITICAL: Always check config.stochastic flag, not whether random_params is None. apply_batch() always passes random_params even in deterministic mode.

get_operation_stats ¤

get_operation_stats() -> dict[str, int]

Get operation statistics.

Note: This method converts JAX arrays to Python ints for introspection. It is intended for use outside of JIT-compiled functions.

Returns:

Type	Description
`dict[str, int]`	Dictionary with 'applied_count' and 'skipped_count'

reset_operation_stats ¤

reset_operation_stats() -> None

Reset operation statistics to zero.

Note: Creates new JAX arrays to reset the counters.

compute_statistics ¤

compute_statistics(data: Any) -> dict[str, Any] | None

Compute statistics from data using batch_stats_fn.

If batch_stats_fn is not configured, returns None. Computed statistics are cached in _computed_stats.

Parameters:

Name	Type	Description	Default
`data`	`Any`	Input data to compute statistics from	required

Returns:

Type	Description
`dict[str, Any] \| None`	Dictionary of statistics, or None if no batch_stats_fn configured

get_statistics ¤

get_statistics() -> dict[str, Any] | None

Get current statistics.

Returns precomputed_stats if configured (unless reset was called), otherwise returns cached computed statistics, or None if no statistics available.

Returns:

Type	Description
`dict[str, Any] \| None`	Dictionary of statistics, or None if no statistics available

set_statistics ¤

set_statistics(stats: dict[str, Any]) -> None

Manually set statistics.

This overwrites any previously computed statistics and clears reset flag.

Parameters:

Name	Type	Description	Default
`stats`	`dict[str, Any]`	Dictionary of statistics to set	required

reset_statistics ¤

reset_statistics() -> None

Reset all statistics to None.

This clears both computed statistics and marks that precomputed_stats should be ignored (via internal flag). After reset, get_statistics() will return None until new statistics are set or computed.

reset_cache ¤

reset_cache() -> None

Clear the cache.

Only has effect if cacheable=True in config.

copy ¤

copy(*, config: DataraxModuleConfig | None = None, rngs: Rngs | None = None, name: str | None = None) -> DataraxModule

Create a copy of this module with optional config/parameter changes.

This allows creating a new module instance with modified configuration while preserving other attributes. Useful for hyperparameter tuning.

Parameters:

Name	Type	Description	Default
`config`	`DataraxModuleConfig \| None`	New config (if None, uses current config)	`None`
`rngs`	`Rngs \| None`	New RNG state (if None, uses current rngs)	`None`
`name`	`str \| None`	New name (if None, uses current name)	`None`

Returns:

Type	Description
`DataraxModule`	New module instance with updated parameters

Examples:

Change configuration¤

new_config = DataraxModuleConfig(cacheable=True) new_module = module.copy(config=new_config)

Change name only¤

renamed = module.copy(name="new_name")

Note

Subclasses can override this method to provide more fine-grained control over copying, such as allowing individual config field updates without requiring dataclass replace().

get_state ¤

get_state() -> dict[str, Any]

Get module state for checkpointing.

This method implements the Checkpointable protocol using NNX state management. It extracts all state variables from the module and converts them to a serializable format.

Returns:

Type	Description
`dict[str, Any]`	A dictionary containing the internal state of the component.

set_state ¤

set_state(state: dict[str, Any]) -> None

Restore module state from a checkpoint.

This method implements the Checkpointable protocol using NNX state management. It restores the module state from a serialized format. Restoration is strict: checkpoint structure must match module state.

Parameters:

Name	Type	Description	Default
`state`	`dict[str, Any]`	A dictionary containing the internal state to restore.	required

Raises:

Type	Description
`TypeError`	If state is not a dictionary.
`ValueError`	If checkpoint structure does not match module state.

clone ¤

clone() -> DataraxModule

Create a new instance with the same state as this module.

Uses NNX's clone function for proper deep cloning of all state.

Returns:

Type	Description
`DataraxModule`	A new module instance with the same state.

requires_rng_streams ¤

requires_rng_streams() -> list[str] | None

Get the list of RNG streams required by this module.

Returns:

Type	Description
`list[str] \| None`	A list of required RNG stream names, or None if no RNG streams
`list[str] \| None`	are required.

ensure_rng_streams ¤

ensure_rng_streams(stream_names: list[str]) -> None

Ensure that the required RNG streams are available.

Parameters:

Name	Type	Description	Default
`stream_names`	`list[str]`	A list of available RNG stream names.	required

Raises:

Type	Description
`ValueError`	If a required RNG stream is not available.

get_output_structure ¤

get_output_structure(sample_data: PyTree, sample_state: PyTree) -> tuple[PyTree, PyTree]

Declare output PyTree structure for vmap axis specification.

Default uses jax.eval_shape to discover structure automatically. Override for efficiency or when eval_shape doesn't work (e.g., data-dependent shapes).

Parameters:

Name	Type	Description	Default
`sample_data`	`PyTree`	Single element data (not batched)	required
`sample_state`	`PyTree`	Single element state (not batched)	required

Returns:

Type	Description
`PyTree`	Tuple of (output_data_structure, output_state_structure) with None leaves.
`PyTree`	The structure (keys/nesting) matters, leaf values are ignored.

Example override for operator that adds keys

def get_output_structure(self, sample_data, sample_state): out_data = { **jax.tree.map(lambda _: None, sample_data), "score": None, "alignment": None, } return out_data, sample_state

apply_batch ¤

apply_batch(batch: Batch, stats: dict[str, Any] | None = None) -> Batch

Process entire batch with vmap and optional RNG generation.

This method implements the batch processing logic for both stochastic and deterministic modes. It uses static branching on self.stochastic for JIT compilation efficiency.

The implementation delegates to _vmap_apply() for the shared computational core, then wraps the result in a Batch object.

Parameters:

Name	Type	Description	Default
`batch`	`Batch`	Input batch (Batch[Element] structure)	required
`stats`	`dict[str, Any] \| None`	Optional statistics (if None, uses get_statistics())	`None`

Returns:

Type	Description
`Batch`	Transformed batch with same structure

Note

This method is concrete (not abstract). Subclasses typically don't override it, but can if they need custom batch processing logic.

output_spec ¤

output_spec(input_spec: PyTree) -> PyTree

Return the operator's output spec given an input spec.

Most operators (normalization, additive noise, simple element-wise transforms) do not change shape; the default returns input_spec unchanged. Shape-changing operators (Resize, Crop, Reshape) MUST override this method.

Parameters:

Name	Type	Description	Default
`input_spec`	`PyTree`	PyTree of `jax.ShapeDtypeStruct` describing the input element (matching the upstream `DataSourceModule.element_spec()` or another operator's `output_spec`).	required

Returns:

Type	Description
`PyTree`	PyTree of `jax.ShapeDtypeStruct` describing the operator's output.
`PyTree`	By default, equal to `input_spec`.

Patch Dropout Operator¤

See Also¤

datarax.operators.modality.image.patch_dropout_operator ¤

logger module-attribute ¤

PatchDropoutOperatorConfig dataclass ¤

num_patches class-attribute instance-attribute ¤

patch_size class-attribute instance-attribute ¤

drop_value class-attribute instance-attribute ¤

cacheable class-attribute instance-attribute ¤

batch_stats_fn class-attribute instance-attribute ¤

precomputed_stats class-attribute instance-attribute ¤

stochastic class-attribute instance-attribute ¤

stream_name class-attribute instance-attribute ¤

batch_strategy class-attribute instance-attribute ¤

field_key class-attribute instance-attribute ¤

target_key class-attribute instance-attribute ¤

auxiliary_fields class-attribute instance-attribute ¤

clip_range class-attribute instance-attribute ¤

preserve_auxiliary class-attribute instance-attribute ¤

validate_domain_constraints class-attribute instance-attribute ¤

PatchDropoutOperator ¤

config instance-attribute ¤

rngs instance-attribute ¤

name instance-attribute ¤

stochastic instance-attribute ¤

stream_name instance-attribute ¤

generate_random_params ¤

apply ¤

get_operation_stats ¤

reset_operation_stats ¤

compute_statistics ¤

get_statistics ¤

set_statistics ¤

reset_statistics ¤

reset_cache ¤

copy ¤

Change configuration¤

Change name only¤

get_state ¤

set_state ¤

clone ¤

requires_rng_streams ¤

ensure_rng_streams ¤

get_output_structure ¤

apply_batch ¤

output_spec ¤

logger `module-attribute` ¤

PatchDropoutOperatorConfig `dataclass` ¤

num_patches `class-attribute` `instance-attribute` ¤

patch_size `class-attribute` `instance-attribute` ¤

drop_value `class-attribute` `instance-attribute` ¤

cacheable `class-attribute` `instance-attribute` ¤

batch_stats_fn `class-attribute` `instance-attribute` ¤

precomputed_stats `class-attribute` `instance-attribute` ¤

stochastic `class-attribute` `instance-attribute` ¤

stream_name `class-attribute` `instance-attribute` ¤

batch_strategy `class-attribute` `instance-attribute` ¤

field_key `class-attribute` `instance-attribute` ¤

target_key `class-attribute` `instance-attribute` ¤

auxiliary_fields `class-attribute` `instance-attribute` ¤

clip_range `class-attribute` `instance-attribute` ¤

preserve_auxiliary `class-attribute` `instance-attribute` ¤

validate_domain_constraints `class-attribute` `instance-attribute` ¤

config `instance-attribute` ¤

rngs `instance-attribute` ¤

name `instance-attribute` ¤

stochastic `instance-attribute` ¤

stream_name `instance-attribute` ¤