Skip to content

Add Automatic Domain Randomization (ADR) Framework for Robust Sim-to-Real Transfer #527

@Aaryan-549

Description

@Aaryan-549

Description

I would like to contribute an Automatic Domain Randomization (ADR) framework to dm_control that adaptively adjusts environment parameters during training based on policy performance. This feature is available in Isaac Gym and is critical for robust sim-to-real transfer, but is currently missing from dm_control.

Problem Statement

Currently, dm_control lacks built-in support for domain randomization, forcing researchers to:

  • Manually tune randomization ranges through trial-and-error, which is time-consuming and suboptimal
  • Use fixed randomization distributions that don't adapt to the learning agent's capabilities
  • Implement custom solutions for each project, leading to inconsistent and non-reusable code
  • Miss out on robust sim-to-real transfer that ADR enables

Competitors like Isaac Gym provide ADR out-of-the-box, giving them a significant advantage for robotics research. As documented in the "Solving Rubik's Cube with a Robot Hand" paper (OpenAI 2019), ADR enables successful zero-shot sim-to-real transfer by automatically expanding randomization ranges when agents achieve consistent performance.

What is ADR?

Automatic Domain Randomization progressively increases environment randomization difficulty based on agent performance:

  1. Start with minimal randomization (near-nominal physics)
  2. Test on boundary conditions of randomization ranges
  3. Expand ranges when agent succeeds consistently on boundaries
  4. Contract ranges when agent fails consistently
  5. Result: Maximally robust policy without manual tuning

Proposed Solution

I will implement a modular ADR framework for dm_control consisting of:

1. Core ADR Manager

class ADRManager:
    """Manages automatic domain randomization for dm_control environments.
    
    Attributes:
        params: Dictionary of randomizable parameters and their bounds
        performance_buffer: Tracks success rates at boundary conditions
        thresholds: High/low thresholds for expanding/contracting ranges
    """
    
    def __init__(self, config):
        self.randomization_params = {}  # e.g., {'friction': [0.5, 1.5]}
        self.performance_threshold_high = 0.95  # Expand when success > 95%
        self.performance_threshold_low = 0.7   # Contract when success < 70%
        self.buffer_size = 100  # Episodes to average over
        
    def get_randomized_params(self, mode='training'):
        """Returns physics parameters for current episode.
        
        Args:
            mode: 'training' (sample from ranges) or 'boundary' (test limits)
        """
        
    def update_ranges(self, boundary_results):
        """Adjusts randomization ranges based on performance."""
        
    def should_expand(self, param_name, boundary):
        """Check if range should expand for given parameter."""

2. Randomizable Parameters

Support randomization of key physics properties:

  • Dynamics: Mass, inertia, friction, damping, armature
  • Actuation: Motor gains (kp, kd), force limits, control noise
  • Observation: Sensor noise, latency, dropouts
  • Geometry: Link lengths, COM positions (where applicable)
  • External forces: Random pushes, wind, ground perturbations

3. ADR-Compatible Environment Wrapper

class ADRWrapper:
    """Wraps dm_control environments to support ADR.
    
    Automatically applies randomization at reset and tracks performance.
    """
    
    def __init__(self, env, adr_manager, eval_fraction=0.1):
        self.env = env
        self.adr = adr_manager
        self.eval_fraction = eval_fraction  # Fraction of envs for boundary testing
        
    def reset(self, env_idx=None):
        """Reset with ADR parameters."""
        mode = 'boundary' if self._is_eval_env(env_idx) else 'training'
        params = self.adr.get_randomized_params(mode=mode)
        self._apply_randomization(params)
        return self.env.reset()
    
    def step(self, action):
        """Step and track performance for ADR."""
        timestep = self.env.step(action)
        self._record_performance(timestep)
        return timestep

4. Configuration System

YAML-based configuration for easy setup:

adr_config:
  enabled: true
  performance_thresholds:
    high: 0.95  # Expand ranges
    low: 0.70   # Contract ranges
  buffer_size: 100  # Episodes for averaging
  evaluation_fraction: 0.1  # 10% of envs test boundaries
  
  randomization_params:
    dynamics:
      friction:
        initial_range: [0.8, 1.2]
        min_range: [0.5, 1.5]
        max_range: [0.1, 3.0]
        delta: 0.05  # Step size for expansion
      
      mass:
        initial_range: [0.9, 1.1]
        min_range: [0.5, 1.5]
        max_range: [0.3, 2.0]
        delta: 0.05
        
    actuation:
      kp_scale:
        initial_range: [0.95, 1.05]
        max_range: [0.5, 1.5]
        delta: 0.02
        
    observation:
      noise_std:
        initial_range: [0.0, 0.01]
        max_range: [0.0, 0.1]
        delta: 0.005

5. Integration with dm_control Suite

from dm_control import suite
from dm_control.rl.adr import ADRManager, ADRWrapper
import yaml

# Load ADR configuration
with open('adr_config.yaml') as f:
    adr_config = yaml.safe_load(f)

# Create base environment
base_env = suite.load('walker', 'walk')

# Wrap with ADR
adr_manager = ADRManager(adr_config['adr_config'])
env = ADRWrapper(base_env, adr_manager)

# Training loop
for episode in range(10000):
    timestep = env.reset()
    while not timestep.last():
        action = policy(timestep.observation)
        timestep = env.step(action)
    
    # ADR automatically adjusts ranges based on performance

Technical Implementation Details

File Structure:

dm_control/
├── rl/
│   └── adr/
│       ├── __init__.py
│       ├── adr_manager.py       # Core ADR logic
│       ├── adr_wrapper.py       # Environment wrapper
│       ├── randomizers.py       # Parameter randomization functions
│       ├── performance_tracker.py  # Track boundary test results
│       └── configs/
│           └── default_adr.yaml
├── examples/
│   └── adr_training_example.py
└── tests/
    └── adr_test.py

Core Skills Used:

  • Python class design and OOP
  • NumPy for parameter sampling and statistics
  • YAML for configuration management
  • MuJoCo physics property manipulation
  • Statistical performance tracking
  • Clean API design for extensibility

Benefits

  1. Enables robust sim-to-real transfer without manual tuning
  2. Competitive with Isaac Gym - brings dm_control to feature parity
  3. Reusable across all dm_control tasks - works with suite, composer, locomotion
  4. Reduces research iteration time - no need to manually tune DR ranges
  5. Improves policy robustness - automatically finds optimal randomization
  6. Well-documented approach - based on established OpenAI research

Success Metrics

  • ADR successfully expands randomization ranges during training
  • Policies trained with ADR show better robustness to parameter variations
  • Performance on boundary tests guides automatic range adjustments
  • Works across dm_control suite - walker, humanoid, quadruped, manipulator
  • Minimal overhead - <5% slowdown compared to fixed randomization

Example Use Case

Before ADR (manual):

# Researcher manually tunes these... takes days of trial-and-error
friction_range = [0.5, 1.5]  # Too wide? Too narrow? Who knows?
mass_range = [0.8, 1.2]

After ADR (automatic):

# ADR automatically finds optimal ranges during training
env = ADRWrapper(base_env, adr_config)
# Trains robustly without manual tuning!

Testing Plan

  1. Unit tests for ADR range expansion/contraction logic
  2. Integration tests with walker, humanoid tasks
  3. Benchmark tests comparing fixed DR vs ADR
  4. Robustness tests - policy performance under parameter variations
  5. Performance tests - overhead measurement

Why I Want to Fix This

This contribution would:

  • Address a major gap vs Isaac Gym and other simulators
  • Enable cutting-edge research in sim-to-real transfer
  • Use core ML/Python skills - statistics, numpy, clean APIs
  • Have clear success criteria - ADR should adapt ranges automatically
  • Benefit the entire community - usable across all dm_control tasks

I have experience with RL, sim-to-real transfer, and dm_control environments. I've implemented similar domain randomization systems before and understand the theoretical foundations from the OpenAI ADR paper. I'm excited to bring this critical feature to dm_control and make it competitive with other leading simulators.

Implementation Timeline

  • Week 1-2: Implement core ADRManager and performance tracking
  • Week 3: Build ADRWrapper with environment integration
  • Week 4: Add configuration system and parameter randomizers
  • Week 5: Comprehensive testing across dm_control suite
  • Week 6: Documentation, examples, and tutorials
  • Week 7: Performance optimization and edge case handling
  • Week 8: Address review feedback

I'm ready to start immediately and would appreciate guidance on dm_control-specific implementation details and preferred code style.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions