This repository contains implementations of CT-SAC and CT-TD3 and benchmarks for other Continuous-Time Reinforcement Learning (CT-RL) algorithms together standard Discrete-Time RL baselines. It includes environments for control tasks (DeepMind Control Suite) and financial trading.
It is recommended to use a Conda environment to manage dependencies.
-
Create and activate a Conda environment:
conda create -n ct-rl python=3.9 conda activate ct-rl
-
Install dependencies: You can install the required packages using the provided
requirements.txt.pip install -r requirements.txt
Note: If you need to generate a new requirements file based on imports, you can use
pipreqs.
The repository provides two main entry points for training: one for Continuous-Time algorithms and one for Discrete-Time algorithms.
To train a continuous-time algorithm (e.g., CT-SAC), use the benchmarks.run_ct_rl module:
python -m benchmarks.run_ct_rl --algo ct_sac --env_id cheetah-run --log_root log_test --save_root save_test --total_timesteps 80000Parameters:
--algo: The algorithm to use. Options:ct_sac,ct_td3,cppo,q_learning.--env_id: The environment ID. Options:cheetah-run,walker-run,humanoid-walk,quadruped-run,trading.--log_root: Directory to store TensorBoard logs and CSV metrics.--save_root: Directory to save trained models.--total_timesteps: Total number of timesteps to train.
To train a standard discrete-time algorithm (e.g., SAC) using Stable-Baselines3, use the benchmarks.run_discrete_rl module:
python -m benchmarks.run_discrete_rl --algo sac --env_id cheetah-run --log_root log_test --save_root save_test --total_timesteps 80000Parameters:
--algo: The algorithm to use. Options:sac,td3,ppo,trpo.--env_id: Same options as above.
This repository includes a comprehensive evaluation pipeline to generate plots, tables, and statistical tests.
The logs, saved_models, and data/trading/processed_data asset folders are too large to be included directly in the repository. They are hosted on SwissTransfer via 2 different links. Because these links expire every month due to SwissTransfer's policy, they will be updated here frequently for reproducibility:
Link1: https://www.swisstransfer.com/d/6579e0c9-ee61-4bba-a010-933d22ff242b
Link2: https://www.swisstransfer.com/d/146de302-80a7-4b20-920c-e89474e030c2
Password: icml2026
Please download from these SwissTransfer links the following folders logs, saved_models, and data/trading/processed_data in order to get train logs, saved checkpoints, and trading processed features. After that, please proceed to the below steps for reproducing our reports.
To generate the full performance report from trained logs:
python -m evaluations.performance_reportThis script will read from your log directories and produce the following in out/final_reports:
- RL Evaluation Plots: Learning curves comparing algorithms.
- Ablation Table: Analysis of top performing hyperparameters.
- Significance Testing: Statistical analysis using Welch t-test and paired t-test.
- Runtime Analysis: Execution time statistics.
Hyperparameters:
Detailed hyperparameter tuning spaces for each of the 5 environments and 8 algorithms (one file per algorithm) can be found in:
out/final_report/hyperparam_spaces
To evaluate the performance of models trained under irregular time settings when deployed in a regular time setting:
python -m evaluations.evaluation_on_regularIf using the trading environment, you need to download and preprocess the data first.
- Download Data: Use
data/trading/download_data.pyto fetch data using the Alpaca API. - Process Data: Use
data/trading/preprocess_data.pyto generate the feature sets required for the environment.
algorithms/: Implementations of all main continuous-time RL algorithms (CT-SAC, CT-TD3, CPPO, etc.).benchmarks/: Contains the main training scripts (run_ct_rl.pyandrun_discrete_rl.py) and hyperparameter configurations.common/: Shared utilities, including callbacks, replay buffers, and logger configurations.data/: Logic for downloading financial data and processing it for the trading environment.environment/: Implementations of the environments (DMC wrappers, Trading environment).evaluations/: Scripts for generating reports, plots, and statistical analysis from saved logs and models.models/: Definitions of neural networks (Stochastic policies, Value functions V, q, and Q=V+q) used by the algorithms.tests/: Unit and integration tests.
Visualizations of agent performance (videos and images) are generated in out/final_reports/media/images or out/final_reports/media/videos.
Walker Sample Run
The image below shows CT-SAC's performance over a single Walker episode, compared against continuous-time baselines
Cheetah Sample Run
The image below shows CT-SAC's performance over a single Cheetah episode, compared against discrete-time baselines
Trading Sample Run
The image below shows CT-SAC's performance over a single Trading (2-weeks) episode, compared against SAC with $15,000 and $5,500 PnL respectively.


