Examples Overview¤
Datarax provides a full set of examples organized by complexity and topic. Each example follows a consistent structure with learning goals, prerequisites, and expected outcomes.
Quick Start¤
-
Simple Pipeline
Build your first data pipeline in 5 minutes
-
HuggingFace Integration
Load datasets from HuggingFace Hub
Example Categories¤
Core Pipeline¤
Essential examples for understanding Datarax fundamentals.
| Example | Level | Description |
|---|---|---|
| Simple Pipeline | Beginner | Basic pipeline with memory source and operators |
| Pipeline Tutorial | Intermediate | Thorough guide to operators and composition |
| Operators Tutorial | Intermediate | Deep dive into operator types and patterns |
| CIFAR-10 Quick Reference | Beginner | CIFAR-10 dataset loading and preprocessing |
| Augmentation Quick Reference | Beginner | Image augmentation techniques |
| MNIST Tutorial | Intermediate | Complete MNIST training pipeline |
| Fashion Augmentation | Intermediate | Fashion-MNIST with advanced augmentation |
| Composition Strategies | Intermediate | All 11 operator composition patterns |
| Advanced Operators | Intermediate | Probabilistic, selector, and patch dropout operators |
Integration¤
Connect Datarax with external data sources and libraries.
| Example | Level | Description |
|---|---|---|
| HuggingFace Quick Reference | Beginner | Load datasets from HuggingFace Hub |
| HuggingFace Tutorial | Intermediate | Advanced HF usage and training pipelines |
| IMDB Example | Beginner | Text classification with IMDB dataset |
| TFDS Quick Reference | Beginner | Load datasets from TensorFlow Datasets |
| ArrayRecord Quick Reference | Intermediate | Google's ArrayRecord format integration |
Differentiable Pipelines (Why Datarax)¤
Flagship examples demonstrating datarax's unique differentiable pipeline capabilities.
-
Learned Augmentation (DADA)
10,000x faster augmentation policy search via gradient descent
-
Learned ISP for Detection
End-to-end differentiable image signal processing pipeline
-
DDSP Audio Synthesis
Custom operators for differentiable digital signal processing
Advanced¤
Production-ready patterns and optimization techniques.
| Example | Level | Description |
|---|---|---|
| MixUp & CutMix Tutorial | Intermediate | Batch-level mixing augmentations |
| Checkpoint Quick Reference | Intermediate | Save and restore pipeline state |
| Resumable Training Guide | Advanced | Full checkpointing workflow |
| DAG Fundamentals Guide | Advanced | Deep dive into DAG pipeline architecture |
| Branching DAG Cookbook | Intermediate | Branch / Merge / Parallel recipes via Pipeline.from_dag |
| Sharding Quick Reference | Intermediate | Multi-device data distribution |
| Sharding Guide | Advanced | Advanced distributed training patterns |
| Interleaved Tutorial | Intermediate | Multiple data source mixing |
| Optimization Guide | Advanced | Performance tuning and profiling |
| Sampling Tutorial | Intermediate | Sequential, shuffle, range, and epoch-aware samplers |
| End-to-End CIFAR-10 | Advanced | Complete training pipeline with all features |
| DADA Learned Augmentation | Advanced | Differentiable augmentation policy search |
| Learned ISP Guide | Advanced | End-to-end ISP optimization for object detection |
| DDSP Audio Synthesis | Advanced | Custom operators for differentiable audio processing |
Documentation Tiers¤
Datarax examples follow a three-tier documentation pattern:
Tier 1: Quick Reference (~5-10 min)¤
- Minimal code, maximum clarity
- Single focused concept
- Copy-paste ready snippets
- Ideal for: Getting started, quick lookups
Tier 2: Tutorial (~30-60 min)¤
- Step-by-step instruction
- Multiple related concepts
- Hands-on practice exercises
- Ideal for: Learning new features
Tier 3: Advanced Guide (~60+ min)¤
- Deep dive into internals
- Performance optimization
- Production considerations
- Ideal for: Expert users, complex use cases
Feature Coverage¤
The examples cover all major Datarax features:
| Feature Area | Examples | Coverage |
|---|---|---|
| Data Sources | Memory, HuggingFace, TFDS, ArrayRecord | Complete |
| Operators | Element, Batch, Probabilistic, Selector, Patch Dropout | Complete |
| Composition | Linear stages and branching DAGs via Pipeline and Pipeline.from_dag |
Complete |
| Samplers | Sequential, Shuffle, Range, EpochAware | Complete |
| DAG Pipeline | Linear stages=[...] and Pipeline.from_dag topologies |
Complete |
| Distributed | Sharding, Multi-device | Complete |
| Checkpointing | State save/restore, Resumable training | Complete |
| Monitoring | Metrics, Reporters, Callbacks | Complete |
| Differentiable Pipelines | DADA, ISP, DDSP | Complete |
Running Examples¤
All examples are available as both Python scripts and Jupyter notebooks.
As Python Scripts¤
# Activate environment
source activate.sh
# Run any example
python examples/core/01_simple_pipeline.py
As Jupyter Notebooks¤
Generating Notebooks from Scripts¤
# Convert a single file
python scripts/jupytext_converter.py py-to-nb examples/core/01_simple_pipeline.py
# Batch convert directory
python scripts/jupytext_converter.py batch-py-to-nb examples/core/
Prerequisites¤
Before running examples, ensure you have:
- Datarax installed:
uv pip install datarax - JAX configured: GPU support recommended for performance
- Environment activated:
source activate.sh
For external data sources:
- HuggingFace:
uv pip install "datarax[data]" - TFDS:
uv pip install "datarax[data]" - ArrayRecord:
uv pip install "datarax[data]" array-record
For Contributors¤
Want to add your own examples? We welcome contributions!
-
Documentation Design Guide
Complete standards for creating educational examples and tutorials
-
Example Template
Start from our template with proper structure and formatting
Quick Start for Contributors¤
- Read the Example Documentation Design Guide
- Copy the template from
examples/_templates/example_template.py - Follow the 7-part structure and quality checklist
- Submit a PR with
.py, generated.ipynb, and.mddocumentation files
Next Steps¤
- API Reference - Detailed API documentation
- Contributing Guide - General contribution guidelines