This project presents a comparative study of Reinforcement Learning (RL) and Behaviour Cloning (BC) for humanoid locomotion control using the NVIDIA Isaac Lab simulation environment.
Two independent pipelines are implemented:
- RL → BC Pipeline: Train an RL policy, generate expert data, then train a BC model.
- AMASS → BC Pipeline: Train a BC model directly from motion capture data.
The goal is to evaluate:
- Whether BC can replicate RL performance
- How dataset source affects control performance
- Limitations of imitation learning in high-dimensional control
humanoid-il-project/
│
├── src/
│ ├── bc_model.py # Behaviour Cloning neural network
│ ├── train_bc_rl.py # BC training on RL expert data
│ ├── train_bc_amass.py # BC training on AMASS data
│ └── isaaclab_integration/
│ ├── rl_evaluate_env.py
│ ├── rl_evaluate_env_rough.py
│ ├── bc_evaluate_env.py
│ └── bc_evaluate_env_rough.py
│
├── scripts/
│ ├── process_amass.py # Converts AMASS npz → states/actions
│ └── plot_results.py # Generates plots
│
├── data/
│ └── processed/
│ ├── amass_states.npy
│ └── amass_actions.npy
│
├── results/
│ ├── *.csv
│ ├── *.pt
│ └── plots/
│
└── report/
└── report.pdf- Train PPO-based RL policy in Isaac Lab
- Generate expert trajectories (state-action pairs)
- Train BC model using supervised learning
- Evaluate both RL and BC in environment
- Download and process AMASS dataset
- Convert pose sequences into (state, action) pairs
- Train BC model
- Attempt evaluation in simulation (expected mismatch)
- Python 3.10
- PyTorch 2.x
- NVIDIA GPU (recommended)
- CUDA 12.x
- NVIDIA Driver ≥ 591.xx
- Isaac Lab / Isaac Sim 5.1
pip install torch numpy matplotlib gymnasiumFollow official instructions: https://github.com/NVIDIA-Omniverse/IsaacLab
Download from: https://amass.is.tue.mpg.de/
Use:
- CMU subset (e.g. folder
01/)
@misc{AMASS_CMU,
title = {CMU MoCap Dataset},
author = {Carnegie Mellon University},
url = {http://mocap.cs.cmu.edu}
}
python scripts/process_amass.pyOutputs:
data/processed/amass_states.npy
data/processed/amass_actions.npy
python src/train_bc_rl.pypython src/train_bc_amass.pyisaaclab.bat -p src/isaaclab_integration/rl_evaluate_env.pyisaaclab.bat -p src/isaaclab_integration/rl_evaluate_env_rough.pyisaaclab.bat -p src/isaaclab_integration/bc_evaluate_env.pypython scripts/plot_results.py-
Observation Space (87-dim):
- Base height, velocity, orientation
- Joint positions and velocities
- Contact forces
- Previous actions
-
Action Space (21-dim):
- Continuous joint torques
- Controls hips, knees, ankles, arms, torso
- Input: 87-dimensional state vector
- Hidden Layers: 2 × 256 units (ReLU)
- Output: 21-dimensional action vector
- Loss: Mean Squared Error (MSE)
| Model | Avg Reward | Stability |
|---|---|---|
| RL Policy | ~0.09–0.11 | High |
| BC (RL Data) | ~0.03–0.05 | Moderate |
| BC (AMASS) | ~0.03–0.04 | Low |
- RL outperforms BC in control tasks
- BC suffers from compounding errors
- AMASS BC achieves low loss but fails in simulation
- Representation mismatch is critical limitation
- AMASS data not aligned with action space
- No domain adaptation or retargeting
- Limited terrain generalisation
- BC evaluated without closed-loop correction
Hardware:
- GPU: RTX 3080 Ti
- CPU: Ryzen 7 5800X
- RAM: 32GB
Training:
- PPO learning rate: 3e-4
- BC learning rate: 1e-3
- BC epochs: 50
- Batch size: 256
- AMASS: Non-commercial research use only
- CMU MoCap: Free for research use
- Data not redistributed
- NVIDIA Isaac Lab
- AMASS dataset
- CMU Motion Capture Dataset
Huzaifa University of York BSc Computer Science
This repository is designed for academic research and reproducibility. All results reported in the dissertation can be reproduced using the provided scripts.