From 3f81bfbdbf835e5c4ef8b400bb49b80b43d91d6e Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 23 Feb 2026 22:31:49 +0000 Subject: [PATCH 1/8] Initial plan From 4a44dbcd58da861dfa330ec9e527f391ff3a503b Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 23 Feb 2026 22:35:14 +0000 Subject: [PATCH 2/8] Add .github/copilot-instructions.md for coding agent onboarding Co-authored-by: EZoni <59625522+EZoni@users.noreply.github.com> --- .github/copilot-instructions.md | 119 ++++++++++++++++++++++++++++++++ 1 file changed, 119 insertions(+) create mode 100644 .github/copilot-instructions.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 00000000..9839f7bd --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,119 @@ +# Copilot Coding Agent Instructions for Synapse + +## Project Overview + +Synapse (**Synergistic Software Platform for AI, Physics Simulations, and Experiments**) is a modular framework for building digital twin components at Lawrence Berkeley National Laboratory. It couples experimental data, simulations, and ML models trained on combined data. The platform targets NERSC infrastructure (Spin for cloud services, Superfacility API for HPC on Perlmutter). + +## Repository Structure + +``` +synapse/ +├── dashboard/ # Trame-based web GUI application +│ ├── app.py # Main entry point (Trame web app) +│ ├── *_manager.py # Feature managers (model, parameters, outputs, calibration, optimization, state, sfapi, error) +│ ├── utils.py # Shared utilities (DB access, plotting, config) +│ ├── environment.yml # Conda dependencies for GUI +│ └── environment-lock.yml +├── ml/ # ML training module +│ ├── train_model.py # Main training script (GP, NN, ensemble) +│ ├── Neural_Net_Classes.py # PyTorch neural network classes +│ ├── training_pm.sbatch # SLURM batch script for Perlmutter +│ ├── environment.yml # Conda dependencies for ML +│ └── environment-lock.yml +├── experiments/ # Experiment configs (cloned from private repos) +├── dashboard.Dockerfile # Docker image for the GUI +├── ml.Dockerfile # Docker image for ML training (CUDA 12.4) +├── publish_container.py # Script to build & push Docker containers to NERSC registry +├── .pre-commit-config.yaml # Ruff linter/formatter hooks +└── .github/workflows/codeql.yml # CodeQL security scanning +``` + +## Language and Dependencies + +- **Language**: Python 3.12 (managed via Conda) +- **Dashboard dependencies**: trame (web framework), plotly, pymongo, botorch, pytorch, lume-model, sfapi_client, mlflow +- **ML dependencies**: pytorch (CUDA 12.4), gpytorch, botorch, lume-model, mlflow, pymongo, scikit-learn +- **Environment management**: Conda with `conda-lock` for reproducible environments. Each component (`dashboard/`, `ml/`) has its own `environment.yml` and `environment-lock.yml`. + +## Linting and Formatting + +This project uses **Ruff** (v0.15.2) for linting and formatting, configured via `.pre-commit-config.yaml`. There is no `ruff.toml` or `pyproject.toml` — Ruff runs with default rules. + +```bash +# Run the linter (with auto-fix) +ruff check --fix . + +# Run the formatter +ruff format . + +# Run both via pre-commit (if installed) +pre-commit run --all-files +``` + +Always run `ruff check` and `ruff format` before committing changes. + +## Building + +There is no traditional build step (no `setup.py`, `pyproject.toml`, or `Makefile`). The project runs directly as Python scripts within Conda environments and is containerized via Docker for deployment. + +### Docker builds (from repository root) + +```bash +# Build the dashboard container +docker build --platform linux/amd64 --output type=image,oci-mediatypes=true -t synapse-gui -f dashboard.Dockerfile . + +# Build the ML training container +docker build --platform linux/amd64 --output type=image,oci-mediatypes=true -t synapse-ml -f ml.Dockerfile . + +# Automated build and publish (interactive) +python publish_container.py --gui --ml +``` + +## Testing + +There is **no automated test suite** in this repository — no pytest, unittest, or similar framework is configured. There are no test files. Validation is done manually by running the dashboard or ML training scripts. + +## CI/CD + +The only CI workflow is **CodeQL Advanced** (`.github/workflows/codeql.yml`), which runs security scanning on Python code for pushes and PRs to `main`. + +## Key Architecture Patterns + +### Dashboard (Trame GUI) + +- Built on [Trame](https://kitware.github.io/trame/) — a Python framework for interactive web applications. +- Uses the **manager pattern**: each feature area has a dedicated `*_manager.py` class that handles its UI components and business logic. +- `state_manager.py` manages the global Trame server, state, and controller. +- Data flows through MongoDB (PyMongo) for experiment/simulation data and ML models. +- NERSC Superfacility API integration is in `sfapi_manager.py`. + +### ML Training + +- `train_model.py` supports three model types: Gaussian Process (GP), Neural Network (NN), and Ensemble. +- Uses PyTorch, BoTorch, and GPyTorch for model training. +- CUDA is auto-detected for GPU acceleration. +- Models are serialized and stored in MongoDB. +- MLflow is used for experiment tracking. + +### Data Storage + +- **MongoDB** is used for all persistent data (experiments, simulation data, ML models). +- Database access requires SSH tunneling to NERSC when running locally. +- Environment variables: `SF_DB_HOST`, `SF_DB_READONLY_PASSWORD` (dashboard), `SF_DB_ADMIN_PASSWORD` (ML training). + +## Common Pitfalls and Workarounds + +1. **No `pyproject.toml` or `ruff.toml`**: Ruff uses default rules. Do not create these files unless the project explicitly adopts them. +2. **Conda, not pip**: Dependencies are managed via `conda` and `conda-lock`, not pip. Do not add `requirements.txt` or modify `pyproject.toml` for dependencies. Update `environment.yml` in the relevant component directory and regenerate the lock file. +3. **Separate environments**: The dashboard and ML components have independent Conda environments (`synapse-gui` and `synapse-ml`). Changes to dependencies must be made in the correct `environment.yml`. +4. **Docker builds from root**: Dockerfiles reference paths relative to the repository root. Always run `docker build` from the repository root directory. +5. **No test infrastructure**: Since there is no test framework, validate changes by running the linter (`ruff check .`) and verifying logic through code review. +6. **Experiment configs are external**: The `experiments/` directory contains cloned private repositories. These are not checked into this repository (excluded via `.gitignore`). +7. **NERSC-specific infrastructure**: Much of the deployment depends on NERSC services (Spin, Superfacility API, Perlmutter, MongoDB). Code changes affecting deployment or data access should be tested against NERSC services when possible. + +## Making Changes + +- **Python code**: Edit files directly in `dashboard/` or `ml/`. Run `ruff check --fix .` and `ruff format .` after changes. +- **Dependencies**: Edit the appropriate `environment.yml` file. Regenerate the lock file with `conda-lock`. +- **Docker**: Modify `dashboard.Dockerfile` or `ml.Dockerfile`. Rebuild with the commands above. +- **New features**: Follow the manager pattern for dashboard features — create a new `*_manager.py` file and integrate it with `app.py` and `state_manager.py`. From 2e9cdeba67330894a5718d08edc4ed2ac03b2851 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 24 Feb 2026 18:43:14 +0000 Subject: [PATCH 3/8] Apply review suggestions: remove Ruff version, format pip as code, remove MongoDB from NERSC services Co-authored-by: EZoni <59625522+EZoni@users.noreply.github.com> --- .github/copilot-instructions.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 9839f7bd..1c9b43f2 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -37,7 +37,7 @@ synapse/ ## Linting and Formatting -This project uses **Ruff** (v0.15.2) for linting and formatting, configured via `.pre-commit-config.yaml`. There is no `ruff.toml` or `pyproject.toml` — Ruff runs with default rules. +This project uses **Ruff** for linting and formatting, configured via `.pre-commit-config.yaml`. There is no `ruff.toml` or `pyproject.toml` — Ruff runs with default rules. ```bash # Run the linter (with auto-fix) @@ -104,12 +104,12 @@ The only CI workflow is **CodeQL Advanced** (`.github/workflows/codeql.yml`), wh ## Common Pitfalls and Workarounds 1. **No `pyproject.toml` or `ruff.toml`**: Ruff uses default rules. Do not create these files unless the project explicitly adopts them. -2. **Conda, not pip**: Dependencies are managed via `conda` and `conda-lock`, not pip. Do not add `requirements.txt` or modify `pyproject.toml` for dependencies. Update `environment.yml` in the relevant component directory and regenerate the lock file. +2. **Conda, not pip**: Dependencies are managed via `conda` and `conda-lock`, not `pip`. Do not add `requirements.txt` or modify `pyproject.toml` for dependencies. Update `environment.yml` in the relevant component directory and regenerate the lock file. 3. **Separate environments**: The dashboard and ML components have independent Conda environments (`synapse-gui` and `synapse-ml`). Changes to dependencies must be made in the correct `environment.yml`. 4. **Docker builds from root**: Dockerfiles reference paths relative to the repository root. Always run `docker build` from the repository root directory. 5. **No test infrastructure**: Since there is no test framework, validate changes by running the linter (`ruff check .`) and verifying logic through code review. 6. **Experiment configs are external**: The `experiments/` directory contains cloned private repositories. These are not checked into this repository (excluded via `.gitignore`). -7. **NERSC-specific infrastructure**: Much of the deployment depends on NERSC services (Spin, Superfacility API, Perlmutter, MongoDB). Code changes affecting deployment or data access should be tested against NERSC services when possible. +7. **NERSC-specific infrastructure**: Much of the deployment depends on NERSC services (Spin, Superfacility API, Perlmutter). Code changes affecting deployment or data access should be tested against NERSC services when possible. ## Making Changes From e91344815a0888c77ab114f8f070d63fd6293a99 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 24 Feb 2026 18:47:03 +0000 Subject: [PATCH 4/8] Remove MLflow item from ML Training section since it is not yet set up in main Co-authored-by: EZoni <59625522+EZoni@users.noreply.github.com> --- .github/copilot-instructions.md | 1 - 1 file changed, 1 deletion(-) diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 1c9b43f2..a95f5153 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -93,7 +93,6 @@ The only CI workflow is **CodeQL Advanced** (`.github/workflows/codeql.yml`), wh - Uses PyTorch, BoTorch, and GPyTorch for model training. - CUDA is auto-detected for GPU acceleration. - Models are serialized and stored in MongoDB. -- MLflow is used for experiment tracking. ### Data Storage From d531ef034710ab7bdcfc6b93f76fbbda7c72e701 Mon Sep 17 00:00:00 2001 From: Edoardo Zoni <59625522+EZoni@users.noreply.github.com> Date: Wed, 25 Mar 2026 13:20:08 -0700 Subject: [PATCH 5/8] Apply suggestions from code review Co-authored-by: Edoardo Zoni <59625522+EZoni@users.noreply.github.com> --- .github/copilot-instructions.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index a95f5153..9f1f480c 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -84,7 +84,8 @@ The only CI workflow is **CodeQL Advanced** (`.github/workflows/codeql.yml`), wh - Built on [Trame](https://kitware.github.io/trame/) — a Python framework for interactive web applications. - Uses the **manager pattern**: each feature area has a dedicated `*_manager.py` class that handles its UI components and business logic. - `state_manager.py` manages the global Trame server, state, and controller. -- Data flows through MongoDB (PyMongo) for experiment/simulation data and ML models. +- Data flows through MongoDB (PyMongo) for experiment and simulation data. +- Data flows through MLflow for ML models. - NERSC Superfacility API integration is in `sfapi_manager.py`. ### ML Training @@ -92,11 +93,12 @@ The only CI workflow is **CodeQL Advanced** (`.github/workflows/codeql.yml`), wh - `train_model.py` supports three model types: Gaussian Process (GP), Neural Network (NN), and Ensemble. - Uses PyTorch, BoTorch, and GPyTorch for model training. - CUDA is auto-detected for GPU acceleration. -- Models are serialized and stored in MongoDB. +- Models are serialized and stored in an MLflow tracking server. ### Data Storage -- **MongoDB** is used for all persistent data (experiments, simulation data, ML models). +- **MongoDB** is used for persistent data from experiments and simulations. +- **MLflow** is used for persistent data from ML models. - Database access requires SSH tunneling to NERSC when running locally. - Environment variables: `SF_DB_HOST`, `SF_DB_READONLY_PASSWORD` (dashboard), `SF_DB_ADMIN_PASSWORD` (ML training). From 91ece90ea347586965cfe8872a9a204222061485 Mon Sep 17 00:00:00 2001 From: Edoardo Zoni <59625522+EZoni@users.noreply.github.com> Date: Wed, 25 Mar 2026 13:39:46 -0700 Subject: [PATCH 6/8] Apply suggestions from code review Co-authored-by: Edoardo Zoni <59625522+EZoni@users.noreply.github.com> --- .github/copilot-instructions.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 54dcf554..10ec452f 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -22,7 +22,7 @@ synapse/ │ └── environment-lock.yml ├── experiments/ # Experiment configs (cloned from private repos) ├── tests/ # Integration tests (ML pipeline) -│ ├── test_ml_pipeline.py # Full train/save/load ML lifecycle test +│ ├── test_ml_pipeline.py # Full ML training pipeline test │ └── check_model.py # Model checking utility ├── dashboard.Dockerfile # Docker image for the GUI ├── ml.Dockerfile # Docker image for ML training (CUDA 12.4) @@ -74,7 +74,7 @@ python publish_container.py --gui --ml ## Testing -There is no pytest/unittest framework configured, but `tests/test_ml_pipeline.py` exercises the full ML lifecycle (training → upload to MLflow → download → accuracy check). It requires a local MLflow server: +There is no pytest/unittest framework configured, but `tests/test_ml_pipeline.py` tests the full ML training pipeline (training → upload to MLflow → download from MLflow → check accuracy). It requires a local MLflow server: ```bash # Start a local MLflow server From 32ea9e99091864ef79e70f47e39219384f0674c8 Mon Sep 17 00:00:00 2001 From: Edoardo Zoni Date: Thu, 26 Mar 2026 16:12:45 -0700 Subject: [PATCH 7/8] Move and rename instructions --- .github/copilot-instructions.md => AGENTS.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename .github/copilot-instructions.md => AGENTS.md (100%) diff --git a/.github/copilot-instructions.md b/AGENTS.md similarity index 100% rename from .github/copilot-instructions.md rename to AGENTS.md From 56802978d0f0a60f64548701eea1d6c904060c9d Mon Sep 17 00:00:00 2001 From: Edoardo Zoni <59625522+EZoni@users.noreply.github.com> Date: Thu, 26 Mar 2026 16:13:42 -0700 Subject: [PATCH 8/8] Fix title in instructions --- AGENTS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index 10ec452f..33e89417 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,4 +1,4 @@ -# Copilot Coding Agent Instructions for Synapse +# Coding Agent Instructions for Synapse ## Project Overview