-
Notifications
You must be signed in to change notification settings - Fork 733
Description
Description
I would like to contribute an Automatic Domain Randomization (ADR) framework to dm_control that adaptively adjusts environment parameters during training based on policy performance. This feature is available in Isaac Gym and is critical for robust sim-to-real transfer, but is currently missing from dm_control.
Problem Statement
Currently, dm_control lacks built-in support for domain randomization, forcing researchers to:
- Manually tune randomization ranges through trial-and-error, which is time-consuming and suboptimal
- Use fixed randomization distributions that don't adapt to the learning agent's capabilities
- Implement custom solutions for each project, leading to inconsistent and non-reusable code
- Miss out on robust sim-to-real transfer that ADR enables
Competitors like Isaac Gym provide ADR out-of-the-box, giving them a significant advantage for robotics research. As documented in the "Solving Rubik's Cube with a Robot Hand" paper (OpenAI 2019), ADR enables successful zero-shot sim-to-real transfer by automatically expanding randomization ranges when agents achieve consistent performance.
What is ADR?
Automatic Domain Randomization progressively increases environment randomization difficulty based on agent performance:
- Start with minimal randomization (near-nominal physics)
- Test on boundary conditions of randomization ranges
- Expand ranges when agent succeeds consistently on boundaries
- Contract ranges when agent fails consistently
- Result: Maximally robust policy without manual tuning
Proposed Solution
I will implement a modular ADR framework for dm_control consisting of:
1. Core ADR Manager
class ADRManager:
"""Manages automatic domain randomization for dm_control environments.
Attributes:
params: Dictionary of randomizable parameters and their bounds
performance_buffer: Tracks success rates at boundary conditions
thresholds: High/low thresholds for expanding/contracting ranges
"""
def __init__(self, config):
self.randomization_params = {} # e.g., {'friction': [0.5, 1.5]}
self.performance_threshold_high = 0.95 # Expand when success > 95%
self.performance_threshold_low = 0.7 # Contract when success < 70%
self.buffer_size = 100 # Episodes to average over
def get_randomized_params(self, mode='training'):
"""Returns physics parameters for current episode.
Args:
mode: 'training' (sample from ranges) or 'boundary' (test limits)
"""
def update_ranges(self, boundary_results):
"""Adjusts randomization ranges based on performance."""
def should_expand(self, param_name, boundary):
"""Check if range should expand for given parameter."""2. Randomizable Parameters
Support randomization of key physics properties:
- Dynamics: Mass, inertia, friction, damping, armature
- Actuation: Motor gains (kp, kd), force limits, control noise
- Observation: Sensor noise, latency, dropouts
- Geometry: Link lengths, COM positions (where applicable)
- External forces: Random pushes, wind, ground perturbations
3. ADR-Compatible Environment Wrapper
class ADRWrapper:
"""Wraps dm_control environments to support ADR.
Automatically applies randomization at reset and tracks performance.
"""
def __init__(self, env, adr_manager, eval_fraction=0.1):
self.env = env
self.adr = adr_manager
self.eval_fraction = eval_fraction # Fraction of envs for boundary testing
def reset(self, env_idx=None):
"""Reset with ADR parameters."""
mode = 'boundary' if self._is_eval_env(env_idx) else 'training'
params = self.adr.get_randomized_params(mode=mode)
self._apply_randomization(params)
return self.env.reset()
def step(self, action):
"""Step and track performance for ADR."""
timestep = self.env.step(action)
self._record_performance(timestep)
return timestep4. Configuration System
YAML-based configuration for easy setup:
adr_config:
enabled: true
performance_thresholds:
high: 0.95 # Expand ranges
low: 0.70 # Contract ranges
buffer_size: 100 # Episodes for averaging
evaluation_fraction: 0.1 # 10% of envs test boundaries
randomization_params:
dynamics:
friction:
initial_range: [0.8, 1.2]
min_range: [0.5, 1.5]
max_range: [0.1, 3.0]
delta: 0.05 # Step size for expansion
mass:
initial_range: [0.9, 1.1]
min_range: [0.5, 1.5]
max_range: [0.3, 2.0]
delta: 0.05
actuation:
kp_scale:
initial_range: [0.95, 1.05]
max_range: [0.5, 1.5]
delta: 0.02
observation:
noise_std:
initial_range: [0.0, 0.01]
max_range: [0.0, 0.1]
delta: 0.0055. Integration with dm_control Suite
from dm_control import suite
from dm_control.rl.adr import ADRManager, ADRWrapper
import yaml
# Load ADR configuration
with open('adr_config.yaml') as f:
adr_config = yaml.safe_load(f)
# Create base environment
base_env = suite.load('walker', 'walk')
# Wrap with ADR
adr_manager = ADRManager(adr_config['adr_config'])
env = ADRWrapper(base_env, adr_manager)
# Training loop
for episode in range(10000):
timestep = env.reset()
while not timestep.last():
action = policy(timestep.observation)
timestep = env.step(action)
# ADR automatically adjusts ranges based on performanceTechnical Implementation Details
File Structure:
dm_control/
├── rl/
│ └── adr/
│ ├── __init__.py
│ ├── adr_manager.py # Core ADR logic
│ ├── adr_wrapper.py # Environment wrapper
│ ├── randomizers.py # Parameter randomization functions
│ ├── performance_tracker.py # Track boundary test results
│ └── configs/
│ └── default_adr.yaml
├── examples/
│ └── adr_training_example.py
└── tests/
└── adr_test.py
Core Skills Used:
- Python class design and OOP
- NumPy for parameter sampling and statistics
- YAML for configuration management
- MuJoCo physics property manipulation
- Statistical performance tracking
- Clean API design for extensibility
Benefits
- Enables robust sim-to-real transfer without manual tuning
- Competitive with Isaac Gym - brings dm_control to feature parity
- Reusable across all dm_control tasks - works with suite, composer, locomotion
- Reduces research iteration time - no need to manually tune DR ranges
- Improves policy robustness - automatically finds optimal randomization
- Well-documented approach - based on established OpenAI research
Success Metrics
- ADR successfully expands randomization ranges during training
- Policies trained with ADR show better robustness to parameter variations
- Performance on boundary tests guides automatic range adjustments
- Works across dm_control suite - walker, humanoid, quadruped, manipulator
- Minimal overhead - <5% slowdown compared to fixed randomization
Example Use Case
Before ADR (manual):
# Researcher manually tunes these... takes days of trial-and-error
friction_range = [0.5, 1.5] # Too wide? Too narrow? Who knows?
mass_range = [0.8, 1.2]After ADR (automatic):
# ADR automatically finds optimal ranges during training
env = ADRWrapper(base_env, adr_config)
# Trains robustly without manual tuning!Testing Plan
- Unit tests for ADR range expansion/contraction logic
- Integration tests with walker, humanoid tasks
- Benchmark tests comparing fixed DR vs ADR
- Robustness tests - policy performance under parameter variations
- Performance tests - overhead measurement
Why I Want to Fix This
This contribution would:
- Address a major gap vs Isaac Gym and other simulators
- Enable cutting-edge research in sim-to-real transfer
- Use core ML/Python skills - statistics, numpy, clean APIs
- Have clear success criteria - ADR should adapt ranges automatically
- Benefit the entire community - usable across all dm_control tasks
I have experience with RL, sim-to-real transfer, and dm_control environments. I've implemented similar domain randomization systems before and understand the theoretical foundations from the OpenAI ADR paper. I'm excited to bring this critical feature to dm_control and make it competitive with other leading simulators.
Implementation Timeline
- Week 1-2: Implement core
ADRManagerand performance tracking - Week 3: Build
ADRWrapperwith environment integration - Week 4: Add configuration system and parameter randomizers
- Week 5: Comprehensive testing across dm_control suite
- Week 6: Documentation, examples, and tutorials
- Week 7: Performance optimization and edge case handling
- Week 8: Address review feedback
I'm ready to start immediately and would appreciate guidance on dm_control-specific implementation details and preferred code style.