Skip to content

brueing/humanoid-il-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Humanoid Control: Reinforcement Learning vs Behaviour Cloning

Overview

This project presents a comparative study of Reinforcement Learning (RL) and Behaviour Cloning (BC) for humanoid locomotion control using the NVIDIA Isaac Lab simulation environment.

Two independent pipelines are implemented:

  1. RL → BC Pipeline: Train an RL policy, generate expert data, then train a BC model.
  2. AMASS → BC Pipeline: Train a BC model directly from motion capture data.

The goal is to evaluate:

  • Whether BC can replicate RL performance
  • How dataset source affects control performance
  • Limitations of imitation learning in high-dimensional control

Project Structure

humanoid-il-project/
│
├── src/
│   ├── bc_model.py                  # Behaviour Cloning neural network
│   ├── train_bc_rl.py              # BC training on RL expert data
│   ├── train_bc_amass.py           # BC training on AMASS data
│   └── isaaclab_integration/
│       ├── rl_evaluate_env.py
│       ├── rl_evaluate_env_rough.py
│       ├── bc_evaluate_env.py
│       └── bc_evaluate_env_rough.py
│
├── scripts/
│   ├── process_amass.py            # Converts AMASS npz → states/actions
│   └── plot_results.py             # Generates plots
│
├── data/
│   └── processed/
│       ├── amass_states.npy
│       └── amass_actions.npy
│
├── results/
│   ├── *.csv
│   ├── *.pt
│   └── plots/
│
└── report/
    └── report.pdf

Pipelines

Pipeline 1: RL → BC

  1. Train PPO-based RL policy in Isaac Lab
  2. Generate expert trajectories (state-action pairs)
  3. Train BC model using supervised learning
  4. Evaluate both RL and BC in environment

Pipeline 2: AMASS → BC

  1. Download and process AMASS dataset
  2. Convert pose sequences into (state, action) pairs
  3. Train BC model
  4. Attempt evaluation in simulation (expected mismatch)

Environment Setup

Requirements

  • Python 3.10
  • PyTorch 2.x
  • NVIDIA GPU (recommended)
  • CUDA 12.x
  • NVIDIA Driver ≥ 591.xx
  • Isaac Lab / Isaac Sim 5.1

Install Dependencies

pip install torch numpy matplotlib gymnasium

Isaac Lab Setup

Follow official instructions: https://github.com/NVIDIA-Omniverse/IsaacLab


Dataset

AMASS Dataset

Download from: https://amass.is.tue.mpg.de/

Use:

  • CMU subset (e.g. folder 01/)

Citation AMASS

@misc{AMASS_CMU,
  title  = {CMU MoCap Dataset},
  author = {Carnegie Mellon University},
  url    = {http://mocap.cs.cmu.edu}
}

Usage

1. Process AMASS Data

python scripts/process_amass.py

Outputs:

data/processed/amass_states.npy
data/processed/amass_actions.npy

2. Train Behaviour Cloning (RL Data)

python src/train_bc_rl.py

3. Train Behaviour Cloning (AMASS)

python src/train_bc_amass.py

4. Evaluate Models

RL Evaluation

isaaclab.bat -p src/isaaclab_integration/rl_evaluate_env.py

RL (Rough Terrain)

isaaclab.bat -p src/isaaclab_integration/rl_evaluate_env_rough.py

BC Evaluation

isaaclab.bat -p src/isaaclab_integration/bc_evaluate_env.py

5. Generate Plots

python scripts/plot_results.py

Observation and Action Spaces

  • Observation Space (87-dim):

    • Base height, velocity, orientation
    • Joint positions and velocities
    • Contact forces
    • Previous actions
  • Action Space (21-dim):

    • Continuous joint torques
    • Controls hips, knees, ankles, arms, torso

Model Architecture

Behaviour Cloning Model

  • Input: 87-dimensional state vector
  • Hidden Layers: 2 × 256 units (ReLU)
  • Output: 21-dimensional action vector
  • Loss: Mean Squared Error (MSE)

Results Summary

Model Avg Reward Stability
RL Policy ~0.09–0.11 High
BC (RL Data) ~0.03–0.05 Moderate
BC (AMASS) ~0.03–0.04 Low

Key Findings

  • RL outperforms BC in control tasks
  • BC suffers from compounding errors
  • AMASS BC achieves low loss but fails in simulation
  • Representation mismatch is critical limitation

Known Limitations

  • AMASS data not aligned with action space
  • No domain adaptation or retargeting
  • Limited terrain generalisation
  • BC evaluated without closed-loop correction

Reproducibility

Hardware:

  • GPU: RTX 3080 Ti
  • CPU: Ryzen 7 5800X
  • RAM: 32GB

Training:

  • PPO learning rate: 3e-4
  • BC learning rate: 1e-3
  • BC epochs: 50
  • Batch size: 256

License and Data Usage

  • AMASS: Non-commercial research use only
  • CMU MoCap: Free for research use
  • Data not redistributed

Acknowledgements

  • NVIDIA Isaac Lab
  • AMASS dataset
  • CMU Motion Capture Dataset

Author

Huzaifa University of York BSc Computer Science


Notes

This repository is designed for academic research and reproducibility. All results reported in the dissertation can be reproduced using the provided scripts.

About

Final Year Project - Imitation Learning for Humanoid Robot Training

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages