[Feature] PILCO by PSXBRosa · Pull Request #3537 · pytorch/rl

PSXBRosa · 2026-02-27T18:56:23Z

Description

This PR introduces the implementation of the PILCO (Probabilistic Inference for Learning Control) algorithm to TorchRL.

Key details of the implementation:

Gaussian Process Regression: I utilized the external libraries botorch and gpytorch for the GPR components. this avoids the overhead and complexity of maintaining a custom GPR implementation.
Moment Matching: I initially considered and experimented with a Monte Carlo approach for moment matching to simplify the underlying mathematics. While I couldn't get the MC approach to stabilize and work correctly (though it remains an interesting alternative), the current working implementation relies on the analytical forms for moment matching. This aligns directly with what was done in the original PILCO paper by Deisenroth and Rasmussen.
Credits: I want to give a huge thanks and credit to @nrontsis. I used the code from their repository (https://github.com/nrontsis/PILCO) as a valuable implementation reference and to test/validate different parts of my own PyTorch adaptation.

Motivation and Context

PILCO is a highly sample-efficient model-based reinforcement learning algorithm, making it a valuable addition to the library's algorithm suite.

close #3513

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

New feature (non-breaking change which adds core functionality)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

pytorch-bot · 2026-02-27T18:56:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3537

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 17 Awaiting Approval

As of commit 671b265 with merge base 4d2c3cb ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-02-27T18:56:34Z

⚠️ PR Title Label Error

Unknown or invalid prefix [Algorithm].

Current title: [Algorithm] PILCO

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix	Label Applied	Example
`[BugFix]`	BugFix	`[BugFix] Fix memory leak in collector`
`[Feature]`	Feature	`[Feature] Add new optimizer`
`[Doc]` or `[Docs]`	Documentation	`[Doc] Update installation guide`
`[Refactor]`	Refactoring	`[Refactor] Clean up module imports`
`[CI]`	CI	`[CI] Fix workflow permissions`
`[Test]` or `[Tests]`	Tests	`[Tests] Add unit tests for buffer`
`[Environment]` or `[Environments]`	Environments	`[Environments] Add Gymnasium support`
`[Data]`	Data	`[Data] Fix replay buffer sampling`
`[Performance]` or `[Perf]`	Performance	`[Performance] Optimize tensor ops`
`[BC-Breaking]`	bc breaking	`[BC-Breaking] Remove deprecated API`
`[Deprecation]`	Deprecation	`[Deprecation] Mark old function`
`[Quality]`	Quality	`[Quality] Fix typos and add codespell`

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

vmoens

Excellent first attempt!

Let's try to move most of this to core!

torchrl/modules/models/
    gp.py                    # BoTorchGPWorldModel (renamed: GPWorldModel?)
    rbf_controller.py        # RBFController

torchrl/objectives/
    pilco.py                 # SaturatingCost (the generic cost module)

sota-implementations/pilco/
    pilco.py                 # Training loop (stays here)
    utils.py                 # make_env, pendulum_cost (thin wrappers, stays here)
    config.yaml              # Config (stays here)

Missing tests:

Unit tests for RBFController moment matching (forward pass, squash_sin)
Unit tests for BoTorchGPWorldModel (fit, deterministic_forward, uncertain_forward)
At minimum a smoke test for the full PILCO loop (see workflow in sota-implementations CI workflow)
Numerical validation against the reference implementation (the author credits nrontsis/PILCO) if possible - ok if not

There are no docs. No docstrings on any class or method beyond the one-line pendulum_cost docstring. For core components, all public methods need proper docstrings with shapes documented (especially the moment matching formulas which are dense linear algebra). Docs must be linked in docs/source/reference/...

Avoid single letter variables unless they're indices (for in in range(...)) which are heavily used throughout the moment matching code (m, s, c, B, D, L, U, Q, t, z). These follow the paper notation, which is fine, but in core they should have comments referencing which equation in the paper each block corresponds to.

policy_for_env closure (pilco.py lines 166-200) -- this is an ad-hoc bridge between the Gaussian policy interface and a standard env that expects deterministic actions. In core, this should be a proper transform or wrapper (e.g., MeanActionSelector or similar) rather than a closure rebuilt every epoch.

vmoens · 2026-03-01T18:08:04Z