LLM-Powered Agents for Modeling Thermal Dynamics of Buildings
ThermalForge is a framework for automatically generating hybrid neuro-physical thermal dynamics models for residential buildings where modeling decisions are made by a Large Language Model (LLM). Specifically, ThermalForge implements a bi-level optimization approach where an LLM proposes model structures (physics-based equations or neural architectures) and then their parameters are calibrated against smart thermostat data.
This repository contains the source code for ThermalForge and the necessary configurations needed to replicate the experiments in the accompanying paper:
Exploring LLM-Powered Agents for Modeling Thermal Dynamics of Buildings in Proceedings of ACM BuildSys 2026
ThermalForge runs a two-phase modeling pipeline for each building:
-
Physics-Based Modeling — The LLM generates grey-box thermal models (RC networks) as PyTorch code. Each candidate is calibrated against observational data, and the LLM receives feedback to improve subsequent proposals.
-
Neural Residual Modeling — The best physics model's rolled-out predictions become input to a neural model that learns to model the residual. The LLM can generate full architectures, modify a U-Net template, or tune hyperparameters.
The framework is implemented as a LangGraph state graph with 7 nodes:
create_phy_dataloader → generate_phy_model ⇄ evaluate_phy_model
↓ (converged)
select_phy_model → generate_nn_model ⇄ evaluate_nn_model
↓ (converged)
select_nn_model → END
Requires Python ≥ 3.12. From the repository root:
# Option 1: conda + pip
conda create -n thermal_forge python=3.12 -y
conda activate thermal_forge
pip install -e .
# Option 2: venv + pip
python3.12 -m venv .venv
source .venv/bin/activate
pip install -e .
# Option 3: uv
uv syncAll ThermalForge settings — experiment parameters, LLM options, and infrastructure paths — are controlled through a single YAML config file. See src/thermal_forge/experiments/default.yaml for the full template with all available fields. The Configuration section has details on priority and overrides.
In order to leverage AWS services (such as S3 for model input/output storage, or SageMaker for running experiments at scale in the cloud) first configure a personal AWS account as follows. All steps assume you are signed in as the account root user or an existing admin.
-
Choose a region. Pick a single AWS region and use it consistently throughout these instructions. Make sure the region selector in the AWS console (top-right) is set accordingly.
-
Grant permissions for an IAM user. In the IAM console, go to Users → Create user (unless you have one already). On the permissions page of that user, choose Attach policies directly and attach
AdministratorAccess. This single policy covers everything the experiments need (Bedrock, S3, SageMaker, ECR, and IAM role passing for the Docker and SageMaker steps below). -
Install the AWS CLI. Download and install the AWS CLI on your local machine before continuing.
-
Create an access key for CLI use. On the new user's page, go to Security credentials → Create access key and select Command Line Interface (CLI) as the use case. Acknowledge the warning and click Next. Copy the Access Key ID and Secret Access Key since the secret is shown only once. Then run
aws configureon your local machine and provide the keys, the region you chose in step 1, andjsonas the default output format.
- Create an IAM role for SageMaker. When SageMaker runs your training jobs, it assumes a separate IAM role (distinct from your user permissions). To create it: in the IAM console, go to Roles → Create role, select AWS service as the trusted entity type, then choose SageMaker as the use case (SageMaker - Execution, specifically). On the permissions page,
AmazonSageMakerFullAccesswill be pre-selected, but after creating the role we will need to add one more permission. Name the role (e.g.,SageMakerThermalForgeRole) and create it. Note the Role ARN — you will need it when launching jobs. Once done, go back to the role and also attach theAmazonBedrockFullAccesspolicy (needed for LLM calls from within the training jobs).
- Create an S3 bucket. In the S3 console, click Create bucket, give it a name that contains the string
sagemaker(e.g.,sagemaker-thermalforge-<your-suffix>), leave the defaults, and click Create. The SageMaker execution role (created in the previous step) is granted S3 access by default only on buckets whose names containsagemaker. If you prefer a different name, you will need to attach an inline policy granting the execution roles3:GetObjectands3:PutObjecton that bucket once it exists.
- Build a custom Docker image. The Dockerfile extends an AWS Deep Learning Container (DLC) base image hosted in an AWS-managed ECR registry. You must authenticate Docker to that registry before building so that it can pull the base image. First, run the configuration script to set the correct base image for your region:
# Configure the Dockerfile (prompts for region and DLC account ID)
python scripts/configure_docker.py
# Or specify explicitly (DLC_ACCOUNT is the registry account for your region,
# found at https://docs.aws.amazon.com/sagemaker/latest/dg-ecr-paths/sagemaker-algo-docker-registry-paths.html)
python scripts/configure_docker.py --region REGION --account DLC_ACCOUNT
# Authenticate Docker to the DLC ECR registry (use the same DLC_ACCOUNT and REGION from above)
aws ecr get-login-password --region REGION | \
docker login --username AWS --password-stdin DLC_ACCOUNT.dkr.ecr.REGION.amazonaws.com
# Build the image
cd docker/
docker build -t thermalforge .
cd ..- Push to ECR. Authenticate Docker to your own account's Amazon Elastic Container Registry (ECR) and push the image. This is a different registry than the DLC one used in step 7, so a separate
docker loginis required. ReplaceACCOUNTwith your 12-digit AWS account ID (runaws sts get-caller-identity --query Account --output textto find it) andREGIONwith the region you chose in step 1:
# Authenticate Docker to your account's ECR registry
aws ecr get-login-password --region REGION | \
docker login --username AWS --password-stdin ACCOUNT.dkr.ecr.REGION.amazonaws.com
# Create the repository (first time only)
aws ecr create-repository --repository-name thermalforge --region REGION
# Tag and push
docker tag thermalforge ACCOUNT.dkr.ecr.REGION.amazonaws.com/thermalforge
docker push ACCOUNT.dkr.ecr.REGION.amazonaws.com/thermalforgeThe resulting image URI (ACCOUNT.dkr.ecr.REGION.amazonaws.com/thermalforge) is what you'll set as ecr_image_uri in your config YAML (or via the ECR_IMAGE_URI environment variable) when launching SageMaker jobs.
The paper uses a subset of the Ecobee Donate Your Data dataset, which contains data from Ecobee smart thermostat in North American homes. The specific subset used in the paper includes:
- 300 homes with ≥180 heating days
- Variables: indoor temperature (from 1 to 6 sensors), outdoor temperature, HVAC run time (stages 1–3), fan run time
- Resolution: 5-minute intervals, downsampled to 60-minute for modeling
- Split: 80% train / 10% validation / 10% test (by day)
To prepare the dataset, first install the data preprocessing dependency:
# If using pip/conda
pip install -e ".[data]"
# If using uv
uv sync --extra dataThen:
- Create an account at Building Benchmark Datasets and download the "processed" dataset (~282 MB), which arrives as
ecobee.zip. - Unzip
ecobee.zipand extract theclean_data.rararchive inside it to obtain the*.nc(netCDF) files. - Run the preprocessing script, which takes two arguments — the path to the local folder containing the netCDF files, and an S3 path for uploading the output
.npzfiles:
python -m thermal_forge.dataset.run_data_prep /path/to/netcdf/files/ s3://your-bucket/ecobee_data/This filters for homes with ≥180 heating-only days, removes days with missing data, and uploads the resulting .npz files to S3.
python -m thermal_forge.agent.agent \
--config default.yaml \
--datafile home_190.npz \
--seed 42This requires:
- AWS credentials configured for Bedrock access
sm_channel_trainset in the YAML config (or via theSM_CHANNEL_TRAINenvironment variable) pointing to the directory containing the.npzdata filesm_output_data_dirset in the YAML config (or viaSM_OUTPUT_DATA_DIR); defaults to"output"--datafileis the filename only (not a full path), and must follow the naming convention<prefix>_<num_days>.npz
Note: The
sm_prefix on these field names reflects SageMaker's conventions — SageMaker containers automatically setSM_CHANNEL_TRAINandSM_OUTPUT_DATA_DIRat runtime. Using the same names locally means the same config works in both environments without modification.
To run across hundreds of homes in parallel, we suggest using SageMaker. The config YAML must be placed in src/thermal_forge/experiments/ so that it is bundled and available inside each SageMaker container. This command must be run from the src/ directory:
cd src/
# Launch jobs (config and seed are passed to each training job)
python -m thermal_forge.sm.run_agent --config default.yaml --seed 42Infrastructure settings (sagemaker_role, ecr_image_uri, s3_data_folder, s3_output_base) can be set either as environment variables or in the YAML config file (see Configuration).
The Jupyter Notebook in scripts/plot_results.ipynb can be used as a reference for how to evaluate the performance of the resulting models using statistics and graphs like those shown in the paper (e.g., percentile of RMSE for 24-hour roll-outs).
All settings are controlled via YAML files in src/thermal_forge/experiments/. The default configuration matches the paper's primary experiment.
Settings are resolved in this order (highest wins):
- YAML config file — the experiment definition (selected via
--config) - Environment variables — for infrastructure fields (see below)
- Defaults — sensible values built into the code
The --seed CLI flag is the one exception that overrides the YAML value directly.
| Config File | Paper Section | Description |
|---|---|---|
default.yaml |
§4.2–4.3 | Expert physics prompt + full NN architecture search |
basic_prompt.yaml |
§4.2 | Basic physics prompt (no expert knowledge) |
agent_control.yaml |
§4.4 | LLM-controlled transitions (δ=1) |
unet_hparam.yaml |
§4.3 | U-Net hyperparameter optimization only |
unet_code.yaml |
§4.3 | U-Net code generation via LLM |
fixed_model.yaml |
§4.3 | Baseline RC model (no LLM physics generation) |
| Parameter | Default | Description |
|---|---|---|
fixed_model |
false |
Skip LLM physics generation, use baseline RC model |
nn_arch_search |
true |
LLM generates full NN architectures (vs. hparam tuning) |
agent_control |
false |
LLM decides when to stop iterating (δ parameter) |
llm_model_id |
Claude Sonnet 4.5 | Anthropic model via Bedrock |
llm_temperature |
0.5 |
LLM sampling temperature |
max_phy_gen_extra |
10 |
Physics model generation iterations |
max_nn_gen_extra |
5 |
Neural model generation iterations |
downsampling_factor |
12 |
5-min → 60-min resolution (12×) |
seed |
null |
Random seed for reproducible data splits |
These fields default to their corresponding environment variable, so they work automatically in SageMaker containers. Setting the same field in a YAML file takes precedence over the environment variable.
| Parameter | Environment Variable | Default | Description |
|---|---|---|---|
sm_channel_train |
SM_CHANNEL_TRAIN |
"" |
Directory containing .npz data files |
sm_output_data_dir |
SM_OUTPUT_DATA_DIR |
"output" |
Directory for saving outputs |
s3_data_folder |
S3_DATA_FOLDER |
"" |
S3 path to .npz data files |
s3_output_base |
S3_OUTPUT_BASE |
"" |
S3 path for job outputs |
sagemaker_role |
SAGEMAKER_ROLE |
"" |
IAM role ARN for SageMaker execution |
ecr_image_uri |
ECR_IMAGE_URI |
"" |
Docker image URI in ECR |
Create a custom YAML with both experiment and infrastructure settings:
exp_id: "my-experiment"
agent_control: true
max_phy_gen_extra: 5
llm_temperature: 0.8
# Infrastructure (overrides environment variables if set)
sm_channel_train: "/path/to/data"
sm_output_data_dir: "./output"Or use environment variables for infrastructure and YAML for experiments:
export SM_CHANNEL_TRAIN="/path/to/data"
export SM_OUTPUT_DATA_DIR="./output"
python -m thermal_forge.agent.agent --config default.yaml --datafile home_190.npz --seed 42├── docker/
│ └── Dockerfile # SageMaker-compatible container
├── src/thermal_forge/
│ ├── agent/
│ │ ├── agent.py # Entry point (CLI)
│ │ ├── graph.py # LangGraph agent (Algorithm 1)
│ │ ├── llm.py # Bedrock LLM client
│ │ ├── prompts.py # All prompt templates (see prompts/ folder)
│ │ └── state.py # Agent state definition
│ ├── config/
│ │ └── config.py # ThermalForgeConfig dataclass + YAML loading
│ ├── experiments/ # YAML experiment configurations
│ │ ├── default.yaml # Paper's primary experiment
│ │ ├── basic_prompt.yaml # §4.2 basic prompt
│ │ ├── agent_control.yaml # §4.4 δ=1
│ │ ├── unet_hparam.yaml # §4.3 option 1
│ │ ├── unet_code.yaml # §4.3 option 2
│ │ ├── fixed_model.yaml # Baseline
│ ├── prompts/ # Full prompt text (readable markdown)
│ │ ├── README.md # Index with paper notation mapping
│ │ ├── gen_phy_basic.md # P_pg: physics model generation
│ │ ├── gen_phy_expert.md # P_pg: physics generation with expert knowledge
│ │ ├── eval_phy.md # P_pz: physics feedback elicitation
│ │ ├── route_phy.md # P_pa: physics convergence check
│ │ ├── gen_nn_full_search.md # P_ng: full neural architecture search
│ │ ├── gen_nn_unet_search.md # P_ng: U-Net code generation
│ │ ├── eval_nn_hparams.md # P_ng: U-Net hyperparameter optimization
│ │ ├── eval_nn_full_search.md # P_nz: neural feedback (full search)
│ │ ├── eval_nn_unet_search.md # P_nz: neural feedback (U-Net)
│ │ └── route_nn.md # P_na: neural convergence check
│ ├── dataset/
│ │ ├── ecobee_phy_dataset.py # Physics-phase dataset
│ │ ├── ecobee_nn_dataset.py # Neural-phase dataset
│ │ └── run_data_prep.py # netCDF → .npz preprocessing + S3 upload
│ ├── model/
│ │ ├── rc_thermal/ # Baseline RC thermal model
│ │ └── unet1d/ # 1D U-Net (encoder, decoder)
│ ├── train/
│ │ ├── train_phy.py # Physics model training loop
│ │ └── train_nn.py # Neural model training loop
│ ├── sm/
│ │ └── run_agent.py # SageMaker job launcher
│ └── utils/
│ ├── agent_utils.py # Model instantiation, predictions, token stats
│ └── data_utils.py # S3 file listing
├── pyproject.toml # Package metadata and dependencies
└── LICENSE # CC-BY-NC-4.0
| Paper Concept | Code |
|---|---|
| Algorithm 1 (agent loop) | agent/graph.py — LangGraph StateGraph with 7 nodes |
| Bi-level optimization (Eq. 1–2) | Outer: LLM calls in generate_* nodes; Inner: train/train_*.py |
| Prompt P_pg (physics generation) | agent/prompts.py::gen_phy_instructions / gen_phy_instructions_expert |
| Prompt P_ng (neural generation) | agent/prompts.py::gen_nn_instructions_full_search / _unet_search |
| Prompt P_pz / P_nz (feedback) | agent/prompts.py::eval_phy_instructions / eval_nn_instructions_* |
| Prompt P_pa / P_na (agent check) | agent/prompts.py::route_phy_instructions / route_nn_instructions |
| δ parameter (agent control) | config: agent_control → graph.py::route_phy_eval / route_nn_eval |
| Figure 2 (prediction example) | Generated from outputs.npy / targets.npy saved by agent_utils.py |
| Table 1 (token usage) | token_stats.pkl saved by agent_utils.py::save_token_stats |
If you use ThermalForge in your research, please cite it as follows:
@inproceedings{thermalforge2026,
title = {Exploring LLM-Powered Agents for Modeling Thermal Dynamics of Buildings},
author = {Krzysztof Walczak and Bergés, Mario},
booktitle = {Proceedings of the 13th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (BuildSys '26)},
year = {2026},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
doi = {10.1145/3744256.3812580},
url = {https://doi.org/10.1145/3744256.3812580},
location = {Banff, AB, Canada},
series = {BuildSys '26}
}CC-BY-NC-4.0 — see LICENSE.