Add Automatic Domain Randomization (ADR) Framework for Robust Sim-to-Real Transfer

## Description

I would like to contribute an **Automatic Domain Randomization (ADR)** framework to dm_control that adaptively adjusts environment parameters during training based on policy performance. This feature is available in Isaac Gym and is critical for robust sim-to-real transfer, but is currently missing from dm_control.

## Problem Statement

Currently, dm_control lacks built-in support for domain randomization, forcing researchers to:
- **Manually tune randomization ranges** through trial-and-error, which is time-consuming and suboptimal
- **Use fixed randomization distributions** that don't adapt to the learning agent's capabilities
- **Implement custom solutions** for each project, leading to inconsistent and non-reusable code
- **Miss out on robust sim-to-real transfer** that ADR enables

**Competitors like Isaac Gym provide ADR out-of-the-box**, giving them a significant advantage for robotics research. As documented in the "Solving Rubik's Cube with a Robot Hand" paper (OpenAI 2019), ADR enables successful zero-shot sim-to-real transfer by automatically expanding randomization ranges when agents achieve consistent performance.

## What is ADR?

Automatic Domain Randomization progressively increases environment randomization difficulty based on agent performance:

1. **Start with minimal randomization** (near-nominal physics)
2. **Test on boundary conditions** of randomization ranges
3. **Expand ranges** when agent succeeds consistently on boundaries
4. **Contract ranges** when agent fails consistently
5. **Result:** Maximally robust policy without manual tuning

## Proposed Solution

I will implement a modular ADR framework for dm_control consisting of:

### 1. **Core ADR Manager**
```python
class ADRManager:
    """Manages automatic domain randomization for dm_control environments.
    
    Attributes:
        params: Dictionary of randomizable parameters and their bounds
        performance_buffer: Tracks success rates at boundary conditions
        thresholds: High/low thresholds for expanding/contracting ranges
    """
    
    def __init__(self, config):
        self.randomization_params = {}  # e.g., {'friction': [0.5, 1.5]}
        self.performance_threshold_high = 0.95  # Expand when success > 95%
        self.performance_threshold_low = 0.7   # Contract when success < 70%
        self.buffer_size = 100  # Episodes to average over
        
    def get_randomized_params(self, mode='training'):
        """Returns physics parameters for current episode.
        
        Args:
            mode: 'training' (sample from ranges) or 'boundary' (test limits)
        """
        
    def update_ranges(self, boundary_results):
        """Adjusts randomization ranges based on performance."""
        
    def should_expand(self, param_name, boundary):
        """Check if range should expand for given parameter."""
```

### 2. **Randomizable Parameters**

Support randomization of key physics properties:
- **Dynamics:** Mass, inertia, friction, damping, armature
- **Actuation:** Motor gains (kp, kd), force limits, control noise
- **Observation:** Sensor noise, latency, dropouts  
- **Geometry:** Link lengths, COM positions (where applicable)
- **External forces:** Random pushes, wind, ground perturbations

### 3. **ADR-Compatible Environment Wrapper**

```python
class ADRWrapper:
    """Wraps dm_control environments to support ADR.
    
    Automatically applies randomization at reset and tracks performance.
    """
    
    def __init__(self, env, adr_manager, eval_fraction=0.1):
        self.env = env
        self.adr = adr_manager
        self.eval_fraction = eval_fraction  # Fraction of envs for boundary testing
        
    def reset(self, env_idx=None):
        """Reset with ADR parameters."""
        mode = 'boundary' if self._is_eval_env(env_idx) else 'training'
        params = self.adr.get_randomized_params(mode=mode)
        self._apply_randomization(params)
        return self.env.reset()
    
    def step(self, action):
        """Step and track performance for ADR."""
        timestep = self.env.step(action)
        self._record_performance(timestep)
        return timestep
```

### 4. **Configuration System**

YAML-based configuration for easy setup:

```yaml
adr_config:
  enabled: true
  performance_thresholds:
    high: 0.95  # Expand ranges
    low: 0.70   # Contract ranges
  buffer_size: 100  # Episodes for averaging
  evaluation_fraction: 0.1  # 10% of envs test boundaries
  
  randomization_params:
    dynamics:
      friction:
        initial_range: [0.8, 1.2]
        min_range: [0.5, 1.5]
        max_range: [0.1, 3.0]
        delta: 0.05  # Step size for expansion
      
      mass:
        initial_range: [0.9, 1.1]
        min_range: [0.5, 1.5]
        max_range: [0.3, 2.0]
        delta: 0.05
        
    actuation:
      kp_scale:
        initial_range: [0.95, 1.05]
        max_range: [0.5, 1.5]
        delta: 0.02
        
    observation:
      noise_std:
        initial_range: [0.0, 0.01]
        max_range: [0.0, 0.1]
        delta: 0.005
```

### 5. **Integration with dm_control Suite**

```python
from dm_control import suite
from dm_control.rl.adr import ADRManager, ADRWrapper
import yaml

# Load ADR configuration
with open('adr_config.yaml') as f:
    adr_config = yaml.safe_load(f)

# Create base environment
base_env = suite.load('walker', 'walk')

# Wrap with ADR
adr_manager = ADRManager(adr_config['adr_config'])
env = ADRWrapper(base_env, adr_manager)

# Training loop
for episode in range(10000):
    timestep = env.reset()
    while not timestep.last():
        action = policy(timestep.observation)
        timestep = env.step(action)
    
    # ADR automatically adjusts ranges based on performance
```

## Technical Implementation Details

**File Structure:**
```
dm_control/
├── rl/
│   └── adr/
│       ├── __init__.py
│       ├── adr_manager.py       # Core ADR logic
│       ├── adr_wrapper.py       # Environment wrapper
│       ├── randomizers.py       # Parameter randomization functions
│       ├── performance_tracker.py  # Track boundary test results
│       └── configs/
│           └── default_adr.yaml
├── examples/
│   └── adr_training_example.py
└── tests/
    └── adr_test.py
```

**Core Skills Used:**
- Python class design and OOP
- NumPy for parameter sampling and statistics
- YAML for configuration management
- MuJoCo physics property manipulation
- Statistical performance tracking
- Clean API design for extensibility

## Benefits

1. **Enables robust sim-to-real transfer** without manual tuning
2. **Competitive with Isaac Gym** - brings dm_control to feature parity
3. **Reusable across all dm_control tasks** - works with suite, composer, locomotion
4. **Reduces research iteration time** - no need to manually tune DR ranges
5. **Improves policy robustness** - automatically finds optimal randomization
6. **Well-documented approach** - based on established OpenAI research

## Success Metrics

- **ADR successfully expands** randomization ranges during training
- **Policies trained with ADR** show better robustness to parameter variations
- **Performance on boundary tests** guides automatic range adjustments
- **Works across dm_control suite** - walker, humanoid, quadruped, manipulator
- **Minimal overhead** - <5% slowdown compared to fixed randomization

## Example Use Case

**Before ADR (manual):**
```python
# Researcher manually tunes these... takes days of trial-and-error
friction_range = [0.5, 1.5]  # Too wide? Too narrow? Who knows?
mass_range = [0.8, 1.2]
```

**After ADR (automatic):**
```python
# ADR automatically finds optimal ranges during training
env = ADRWrapper(base_env, adr_config)
# Trains robustly without manual tuning!
```

## Testing Plan

1. **Unit tests** for ADR range expansion/contraction logic
2. **Integration tests** with walker, humanoid tasks
3. **Benchmark tests** comparing fixed DR vs ADR
4. **Robustness tests** - policy performance under parameter variations
5. **Performance tests** - overhead measurement

## Why I Want to Fix This

This contribution would:
- **Address a major gap** vs Isaac Gym and other simulators
- **Enable cutting-edge research** in sim-to-real transfer
- **Use core ML/Python skills** - statistics, numpy, clean APIs
- **Have clear success criteria** - ADR should adapt ranges automatically
- **Benefit the entire community** - usable across all dm_control tasks

I have experience with RL, sim-to-real transfer, and dm_control environments. I've implemented similar domain randomization systems before and understand the theoretical foundations from the OpenAI ADR paper. I'm excited to bring this critical feature to dm_control and make it competitive with other leading simulators.

## Implementation Timeline

- **Week 1-2:** Implement core `ADRManager` and performance tracking
- **Week 3:** Build `ADRWrapper` with environment integration
- **Week 4:** Add configuration system and parameter randomizers
- **Week 5:** Comprehensive testing across dm_control suite
- **Week 6:** Documentation, examples, and tutorials
- **Week 7:** Performance optimization and edge case handling
- **Week 8:** Address review feedback

I'm ready to start immediately and would appreciate guidance on dm_control-specific implementation details and preferred code style.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Automatic Domain Randomization (ADR) Framework for Robust Sim-to-Real Transfer #527

Description

Problem Statement

What is ADR?

Proposed Solution

1. Core ADR Manager

2. Randomizable Parameters

3. ADR-Compatible Environment Wrapper

4. Configuration System

5. Integration with dm_control Suite

Technical Implementation Details

Benefits

Success Metrics

Example Use Case

Testing Plan

Why I Want to Fix This

Implementation Timeline

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Automatic Domain Randomization (ADR) Framework for Robust Sim-to-Real Transfer #527

Description

Description

Problem Statement

What is ADR?

Proposed Solution

1. Core ADR Manager

2. Randomizable Parameters

3. ADR-Compatible Environment Wrapper

4. Configuration System

5. Integration with dm_control Suite

Technical Implementation Details

Benefits

Success Metrics

Example Use Case

Testing Plan

Why I Want to Fix This

Implementation Timeline

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions