From 9e7feded98cd85c4d6f6e99ea6643b62126c86aa Mon Sep 17 00:00:00 2001 From: Steffen Date: Thu, 4 Jun 2026 14:31:34 -0400 Subject: [PATCH 1/3] updated skills for claude and codex --- .claude/skills/custom-experiment/SKILL.md | 90 ++++++++++++++ .claude/skills/debug-experiment/SKILL.md | 83 ------------- .claude/skills/explain/SKILL.md | 63 ---------- .claude/skills/explore-examples/SKILL.md | 64 ++++++++++ .claude/skills/install-apeiron/SKILL.md | 93 +++++++++++++++ .claude/skills/integrate-apeiron/SKILL.md | 86 +++++++++++++ .claude/skills/lint-check/SKILL.md | 58 --------- .claude/skills/new-config/SKILL.md | 103 ---------------- .claude/skills/new-detector/SKILL.md | 108 ----------------- .claude/skills/new-harness/SKILL.md | 128 -------------------- .claude/skills/new-updater/SKILL.md | 139 ---------------------- .claude/skills/run-experiment/SKILL.md | 64 ---------- .claude/skills/visualize/SKILL.md | 64 ---------- .codex/skills/custom-experiment/SKILL.md | 130 ++++++++++++++++++++ .codex/skills/explore-examples/SKILL.md | 91 ++++++++++++++ .codex/skills/install-apeiron/SKILL.md | 102 ++++++++++++++++ .codex/skills/integrate-apeiron/SKILL.md | 112 +++++++++++++++++ 17 files changed, 768 insertions(+), 810 deletions(-) create mode 100644 .claude/skills/custom-experiment/SKILL.md delete mode 100644 .claude/skills/debug-experiment/SKILL.md delete mode 100644 .claude/skills/explain/SKILL.md create mode 100644 .claude/skills/explore-examples/SKILL.md create mode 100644 .claude/skills/install-apeiron/SKILL.md create mode 100644 .claude/skills/integrate-apeiron/SKILL.md delete mode 100644 .claude/skills/lint-check/SKILL.md delete mode 100644 .claude/skills/new-config/SKILL.md delete mode 100644 .claude/skills/new-detector/SKILL.md delete mode 100644 .claude/skills/new-harness/SKILL.md delete mode 100644 .claude/skills/new-updater/SKILL.md delete mode 100644 .claude/skills/run-experiment/SKILL.md delete mode 100644 .claude/skills/visualize/SKILL.md create mode 100644 .codex/skills/custom-experiment/SKILL.md create mode 100644 .codex/skills/explore-examples/SKILL.md create mode 100644 .codex/skills/install-apeiron/SKILL.md create mode 100644 .codex/skills/integrate-apeiron/SKILL.md diff --git a/.claude/skills/custom-experiment/SKILL.md b/.claude/skills/custom-experiment/SKILL.md new file mode 100644 index 0000000..39bb6ea --- /dev/null +++ b/.claude/skills/custom-experiment/SKILL.md @@ -0,0 +1,90 @@ +--- +name: custom-experiment +description: | + Run an apeiron experiment on the user's OWN dataset and model end-to-end. + Use when the user wants to bring their own data + architecture (beyond the + shipped MNIST/CIFAR examples), scaffold a custom model harness, write a config + for it, smoke-test it, and run the full experiment. Self-contained: it creates + the harness, data utilities, and TOML, registers them in the example factory, + and runs. For trying the bundled examples instead, use explore-examples; for + adding apeiron to a separate project's training loop, use integrate-apeiron. +argument-hint: " [config_output_path]" +user-invocable: true +allowed-tools: + - Bash + - Read + - Write + - Edit + - Glob + - Grep +--- + +Scaffold and run an apeiron experiment on the user's own data and model. + +## Arguments +- `$1`: Short name for the dataset/harness (lowercase, e.g. `fashionmnist`, `mytabular`). Used for the `examples/$1/` dir and the `data.name` factory key. +- `$2`: Optional output path for the TOML config. Defaults to `examples/$1/$1.toml`. + +## Procedure + +### 1. Gather the specifics from the user +Ask only for what isn't already provided: +- Dataset source and how to load it (torchvision, HuggingFace, local files, custom `Dataset`). +- Model architecture (CNN, MLP, ViT, …), input shape, number of classes/outputs. +- Type of drift to simulate on the stream (e.g. affine transforms for images, feature noise for tabular). apeiron's examples simulate drift inside `update_data_stream()`. +- Pretrained weights? Path if so (optional — harness should tolerate their absence). +- Which drift detector and CL updater to start with (default `ADWINDetector` + `base`). + +### 2. Read the current patterns (don't hardcode signatures — they rot) +Mirror the live source rather than assuming method names: +```bash +cat src/apeiron/model/torch_model_harness.py # the ABC + abstract methods to implement +cat examples/mnist/model.py # canonical harness +cat examples/mnist/utils.py # data-loading + drift-sim pattern +cat examples/utils.py # get_example() factory to extend +grep -nA6 "class .*Cfg" src/apeiron/config/configuration.py # config fields +``` +Implement exactly the `@abstractmethod`s the ABC declares (currently includes `get_optmizer` — note that spelling — `update_data_stream`, `get_stream_dataloader`, `get_hist_dataloaders`, `get_train_dataloaders`, `get_criterion`). Set `self.eval_metrics` with at least an `accuracy` entry from `apeiron.evaluation.metrics`. + +### 3. Scaffold the files +- `examples/$1/__init__.py` — empty. +- `examples/$1/model.py` — `BaseModelHarness` subclass calling `super().__init__(cfg=cfg, model=)`, implementing every abstract method, applying cumulative drift in `update_data_stream()`, and returning `(None, None)` from `get_hist_dataloaders()` on the first task. +- `examples/$1/utils.py` — dataset loaders, a deterministic drift transform, a `TransformedView` wrapper, and a `make_loader(...)` factory (follow the MNIST utils structure). +- Config at `$2` (default `examples/$1/$1.toml`) with `[model]`, `[data]` (`name = "$1"`), `[train]`, `[drift_detection]`, optional `[continual_learning]`, `[visualization]`. Read an existing config for the exact key set. + +### 4. Register in the factory (in-repo) +Add a branch to `get_example()` in `examples/utils.py`: +```python +elif cfg.data.name == "$1": + from examples.$1.model import + return (cfg=cfg) +``` + +### 5. Validate +```bash +python -c "import tomllib; tomllib.load(open('$2','rb')); print('TOML OK')" +poetry run python -c "from examples.utils import get_example; print('factory OK')" +``` +If `pretrained_path` is set, confirm the file exists; warn if missing (run will train from scratch). + +### 6. Smoke-test before the full run +Run a tiny, fast pass to catch wiring errors cheaply, then **confirm with the user** before the real run: +```bash +poetry run python -m src.main --config $2 \ + --set train.max_iter=2 \ + --set drift_detection.max_stream_updates=2 \ + --set drift_detection.detection_interval=1 \ + --set device=cpu \ + --set logging.backend=none +``` +If it fails, read the traceback, fix the harness/config, and re-run the smoke test. Do not proceed until it completes cleanly. + +### 7. Full run and report +```bash +poetry run python -m src.main --config $2 +``` +Report drift events, final accuracy, and the output CSV path (the config's `visualization.input`). Note the package emits this CSV for inspection; it does not ship a built-in dashboard renderer. + +## Notes +- Registration here uses the in-repo factory pattern. To instead drive apeiron from your *own* project without editing this repo, use the integrate-apeiron skill. +- The repo also has older `new-harness` / `new-config` skills covering pieces of this; they are stale (pre-`src/apeiron/` layout) and slated for refresh — prefer this skill. diff --git a/.claude/skills/debug-experiment/SKILL.md b/.claude/skills/debug-experiment/SKILL.md deleted file mode 100644 index 6db06ab..0000000 --- a/.claude/skills/debug-experiment/SKILL.md +++ /dev/null @@ -1,83 +0,0 @@ ---- -name: debug-experiment -description: | - Debug a failed or misbehaving BaseSim experiment. Use when the user reports - errors, unexpected behavior, no drift being detected, poor accuracy, OOM - errors, or other issues with an experiment run. -argument-hint: "" -user-invocable: true -allowed-tools: - - Bash - - Read - - Glob - - Grep ---- - -Debug a failed or misbehaving BaseSim continual learning experiment. - -## Arguments -- `$ARGUMENTS`: Either a config file path, an error message, or a description of the problem. - -## Diagnostic Checklist - -### 1. Environment Issues -- Check Poetry environment: `poetry env info` -- Check Python version (requires >=3.13): `python --version` -- Verify dependencies installed: `poetry check` -- Install if needed: `poetry install` - -### 2. Config Parsing Errors -- Validate TOML syntax: `python -c "import tomllib; tomllib.load(open('', 'rb'))"` -- Required sections: `[model]`, `[data]`, `[train]`, `[drift_detection]` -- Valid `update_mode` values: `base`, `jvp_reg`, `ewc_online`, `kfac_online`, `none` -- Valid `detector_name` values: `ADWINDetector`, `KSWINDetector`, `PageHinkleyDetector`, `ModelPerformanceDetector`, `ModelEvalDetector`, `EnsembleDetector` - -### 3. No Drift Detected -- Check `detection_interval` is > 0 (0 disables detection) -- Check `max_stream_updates` is sufficient (default 20) -- Detector sensitivity tuning: - - **ADWIN**: Lower `adwin_delta` for more sensitivity (default 0.002) - - **KSWIN**: Lower `kswin_alpha` (default 0.005), increase `kswin_window_size` (default 100) - - **PageHinkley**: Lower `ph_threshold` (default 50), lower `ph_delta` (default 0.005) -- Check `metric_index` matches the intended metric (0=first eval metric, typically accuracy) -- Check `aggregation` method: "mean", "median", or "last" - -### 4. Too Many False Drift Detections -- Increase detector thresholds (higher delta/alpha/threshold values) -- Increase `detection_interval` to aggregate more batches before checking -- Switch `aggregation` to "mean" for a smoother signal - -### 5. CUDA / Device Errors -- Check CUDA: `python -c "import torch; print(torch.cuda.is_available(), torch.cuda.device_count())"` -- Try `device = "cpu"` in config as fallback -- For OOM: reduce `batch_size`, reduce `grad_accumulation_steps` -- Check nvidia-smi: `nvidia-smi` - -### 6. Model Loading Errors -- Check `pretrained_path` exists: `ls -la ` -- Known pretrained paths: - - MNIST: `examples/mnist/mnist.pth` - - CIFAR ViT: `examples/cifar/cifar10_vit.pth` - - CIFAR VGG: `examples/cifar/cifar10_vgg11.pth` -- For state dict mismatch, check if `_orig_mod.` prefix stripping is needed (see cifar/imagenet model.py) - -### 7. Poor CL Performance / Catastrophic Forgetting -- `update_mode = "base"` has NO forgetting prevention -- switch to `jvp_reg`, `ewc_online`, or `kfac_online` -- JVP: increase `jvp_lambda` for stronger regularization (default 0.001, MNIST example uses 10) -- EWC: increase `ewc_lambda` for stronger weight consolidation (default 1000.0) -- KFAC: increase `kfac_lambda` (default 0.01) -- Check `max_iter` is sufficient for convergence (default 600) - -### 8. WandB Issues -- Login: `wandb login` -- Disable entirely: set env var `WANDB_MODE=disabled` -- Check connectivity: `wandb status` - -## Procedure - -1. Read the config file if a path is provided. -2. If an error message is given, identify the category from the checklist above. -3. Read relevant source files to trace the error origin. -4. Run diagnostic commands to verify the environment state. -5. Suggest specific, actionable fixes with code or config references. -6. If multiple issues exist, prioritize by severity (environment > config > tuning). diff --git a/.claude/skills/explain/SKILL.md b/.claude/skills/explain/SKILL.md deleted file mode 100644 index 0c8e31d..0000000 --- a/.claude/skills/explain/SKILL.md +++ /dev/null @@ -1,63 +0,0 @@ ---- -name: explain -description: | - Explain parts of the BaseSim framework architecture, code flow, or concepts. - Use when the user asks how the framework works, what a module does, how drift - detection or continual learning is implemented, or wants a codebase overview. -argument-hint: "[topic: architecture|drift|updaters|config|monitoring|harness|profiling|logging|pipeline]" -user-invocable: true -context: fork -agent: Explore -allowed-tools: - - Read - - Glob - - Grep - - Bash ---- - -Explain the BaseSim (SIM: Self Improving Model) framework architecture and internals. - -## Arguments -- `$ARGUMENTS`: Optional topic to focus on. If empty, provide a high-level architecture overview. - -## Available Topics - -- **architecture** -- Full system overview, data flow, component interactions -- **drift** -- Drift detection algorithms, DriftSignal, LearningRegime, detector lifecycle -- **updaters** -- CL update strategies, BaseUpdater hook lifecycle, EWC/JVP/KFAC internals -- **config** -- TOML configuration system, dataclass hierarchy, override mechanisms -- **monitoring** -- ContinuousMonitor loop, batch processing, drift checking, stream extension -- **harness** -- BaseModelHarness ABC, how to implement, data stream pattern -- **profiling** -- FLOPSProfiler, what is measured, limitations -- **logging** -- Logger stages, WandB integration, CSV output, console verbosity -- **pipeline** -- End-to-end flow from `main.py` through monitoring, drift, training, and back - -## Key Source File Locations - -- Entry point: `src/main.py` -- Config system: `src/config/configuration.py` -- Model harness ABC: `src/model/torch_model_harness.py` -- Continuous monitor: `src/driver/continuous_monitor.py` -- Continuous trainer: `src/training/continuous_trainer.py` -- Updater base + implementations: `src/training/updater/` -- Drift detector base + implementations: `src/drift_detection/detectors/` -- Drift detector loader: `src/drift_detection/load_drift_detector.py` -- Example harnesses: `examples/mnist/model.py`, `examples/cifar/model.py`, `examples/imagenet/model.py` -- Example factory: `examples/utils.py` -- Logger: `src/logger/` -- Profiler: `src/profilers/` -- Visualization: `src/visualization/` - -## Procedure - -1. If a topic is specified in `$ARGUMENTS`, read the relevant source files and provide a detailed explanation. -2. If no topic is specified, provide the full architecture overview covering all components. -3. Include in your explanation: - - What each component does and why - - How components interact (caller/callee relationships) - - Key classes, their responsibilities, and their abstract interfaces - - Data flow through the system - - Extension points for customization -4. Reference specific file paths and line numbers when explaining internals. -5. For algorithmic explanations (drift detection, EWC, JVP, KFAC), explain the underlying concept and how it maps to the code. -6. Keep explanations grounded in the actual source code -- read files before explaining them. diff --git a/.claude/skills/explore-examples/SKILL.md b/.claude/skills/explore-examples/SKILL.md new file mode 100644 index 0000000..5189e62 --- /dev/null +++ b/.claude/skills/explore-examples/SKILL.md @@ -0,0 +1,64 @@ +--- +name: explore-examples +description: | + Run a bundled apeiron example experiment to explore the framework's + capabilities. Use when the user wants to try the software, run a default/demo + experiment, see drift detection and continual learning in action, or pick from + the shipped MNIST/CIFAR configs. Presents a menu of available example configs, + runs the chosen one, and reports where the metrics CSV landed. For running the + user's OWN data/model/config, use the custom-experiment skill instead. +argument-hint: "[config_path]" +user-invocable: true +allowed-tools: + - Bash + - Read + - Glob + - Grep +--- + +Run a bundled apeiron example so the user can see the framework working end-to-end. + +## Arguments +- `$1`: Optional path to a specific bundled config. If given, skip the menu and run it directly (still apply steps 3–5). If omitted, present the menu (step 1). + +## Procedure + +### 1. Build the menu dynamically (do not hardcode the list — it rots) +Discover the shipped configs and summarize each from its own contents: +```bash +find examples -name "*.toml" -type f | sort +``` +For each config, read the key fields to describe it (`data.name`, `model.name`, `drift_detection.detector_name`, `continual_learning.update_mode`). Present a numbered menu like: +`1) examples/mnist/mnist.toml — MNIST, ADWIN detector, base updater` +Then ask the user which to run. + +### 2. Default to MNIST; flag missing pretrained weights for others +- **MNIST is the guaranteed hands-off path** — `examples/mnist/mnist.pth` ships with the repo. Recommend it for a first run. +- For any non-MNIST choice (e.g. CIFAR), check the config's `pretrained_path` before running: + ```bash + ls -la 2>/dev/null || echo "MISSING" + ``` + If the weight file is missing, tell the user plainly: this example needs weights that don't ship with the repo, so the run will train from scratch (slow) or fail to load. Let them decide whether to continue or switch to MNIST. + +### 3. Ask which metrics-logging backend to use (per run) +The config default is `wandb`. Before running, ask the user to choose, and pass it as an override so no edits are needed: +- **none** — `--set logging.backend=none` (no account/network; best for a quick local look) +- **wandb** — `--set logging.backend=wandb` (run `wandb login` first if not authenticated) +- **mlflow** — `--set logging.backend=mlflow` (local tracking by default) + +### 4. Show the config and run it +- Briefly summarize the chosen config (dataset, model, detector, updater, device, batch size) so the user can confirm. +- Run from the project root: + ```bash + poetry run python -m src.main --config --set logging.backend= + ``` +- This is a real training/monitoring run and may take a while. Stream output; do not silently background it. + +### 5. Report results +- Summarize from the run output: whether drift was detected and how many times, final accuracy, and the output CSV path (the config's `visualization.input`). +- The package emits this CSV for inspection; it does not ship a built-in dashboard renderer, so point the user at the CSV for further plotting. + +## Notes +- Quick first run, copy-paste safe: `poetry run python -m src.main --config examples/mnist/mnist.toml --set logging.backend=none` +- Useful overrides to demonstrate capabilities: `--set drift_detection.detector_name=PageHinkleyDetector`, `--set continual_learning.update_mode=ewc_online`, `--set device=cpu`. +- If `poetry` isn't set up yet, point the user at the install/dev-setup step first. diff --git a/.claude/skills/install-apeiron/SKILL.md b/.claude/skills/install-apeiron/SKILL.md new file mode 100644 index 0000000..0ba3e01 --- /dev/null +++ b/.claude/skills/install-apeiron/SKILL.md @@ -0,0 +1,93 @@ +--- +name: install-apeiron +description: | + Install the apeiron continual-learning package as a dependency into an + existing Python project so the user can `import apeiron`. Use when the user + wants to add apeiron to their own project or training framework, set it up as + a path/git dependency, or get `from apeiron import ...` working in another + codebase. Handles Poetry presence, Python 3.13 verification, and automatic + GPU-vs-CPU PyTorch selection. SKIP for developing inside THIS repo itself — + that is just `poetry install`. +argument-hint: "[target_project_dir] [--git ]" +user-invocable: true +allowed-tools: + - Bash + - Read + - Edit + - Write + - Glob + - Grep +--- + +Install apeiron as a dependency in the user's own Python project, hands-off. + +## Arguments +- `$1`: Target project directory (the project that will depend on apeiron). Defaults to the current working directory. +- `--git `: Optional. Install apeiron from this git URL instead of a local path. If omitted, prefer a local path dependency (see step 2). + +Do not assume any value not given — read it from the repo or ask. + +## Goal +After this skill runs, the following must succeed from inside the target project's environment: +```bash +python -c "from apeiron import BaseModelHarness, ContinuousMonitor, build_config; print('apeiron OK')" +``` + +## Procedure + +### 1. Resolve the target project and its package manager +- Target dir = `$1` or the current working directory. Confirm it contains a `pyproject.toml` (Poetry/PEP 621) or `requirements.txt`/`setup.py` (pip). If none, ask the user how they manage dependencies. +- Detect the manager: Poetry if `[tool.poetry]` or `poetry.lock` is present; otherwise pip/uv. Poetry is the primary path below; a pip fallback is in step 6. + +### 2. Resolve the apeiron source (do not hardcode versions or paths) +- If `--git ` was given, use it as a git dependency. +- Else look for a local apeiron checkout. The current repo IS apeiron when its `pyproject.toml` has `name = "apeiron"`. Confirm with: + ```bash + grep -m1 'name = "apeiron"' pyproject.toml && pwd + ``` + Use that absolute path as a **path (develop) dependency**. If the current repo is not apeiron and no `--git` was given, ask the user for the apeiron path or git URL. + +### 3. Verify Python (guide, don't auto-manage interpreters) +- Read apeiron's required range dynamically rather than assuming it: + ```bash + grep 'requires-python' + ``` +- Check the interpreter the target project will use (`python --version`, or `poetry env info --python`). If it is outside the range, stop and give the user exact instructions (e.g. install the matching CPython via pyenv/uv and point Poetry at it with `poetry env use `). Do not silently install or switch interpreters. + +### 4. Ensure Poetry is available (auto-install if missing) +- `command -v poetry` — if missing, install it: `pipx install poetry` (preferred) or `pip install --user poetry`, then re-check `poetry --version`. + +### 5. Detect compute backend and select the PyTorch wheel +- Probe for an NVIDIA GPU: + ```bash + nvidia-smi -L 2>/dev/null && echo "GPU_PRESENT" || echo "NO_GPU" + ``` +- **GPU present:** do nothing special — the default CUDA-enabled torch wheels resolve normally. Report that CUDA wheels will be used. +- **No GPU:** pin torch to the CPU-only index so the install is smaller and portable. For a Poetry target, add an explicit source and route torch to it before adding apeiron: + ```toml + [[tool.poetry.source]] + name = "pytorch-cpu" + url = "https://download.pytorch.org/whl/cpu" + priority = "explicit" + + [tool.poetry.dependencies] + torch = { source = "pytorch-cpu" } + ``` + Then `poetry lock`. Report that CPU-only wheels will be used. + +### 6. Add the dependency +From the target project directory: +- **Poetry, local path:** `poetry add --editable ` +- **Poetry, git:** `poetry add "git+"` +- **pip/uv fallback, local path:** `pip install -e ` (for the no-GPU case, first run `pip install torch --index-url https://download.pytorch.org/whl/cpu`) +- **pip/uv fallback, git:** `pip install "apeiron @ git+"` + +### 7. Verify and report +- Run the import check from step **Goal** inside the target environment (`poetry run python -c ...` for Poetry). +- On success, report: the apeiron source used (path/git), Python version, compute backend chosen (CUDA/CPU), and the manager the dependency was added to. +- Suggest next steps: `/run-experiment` to try a bundled example, or the integration skill if they are wiring apeiron into an existing training loop. + +## Troubleshooting +- **`ModuleNotFoundError: apeiron`** after install — the editable/path link didn't register; re-run the add in the *target* project dir, not the apeiron repo. +- **torch pulls CUDA wheels on a CPU box** — the explicit `pytorch-cpu` source from step 5 was not applied before `poetry lock`; re-lock after adding it. +- **Python version conflict** — apeiron pins a narrow CPython range (see step 3); the target project must use a matching interpreter. Point Poetry at it with `poetry env use`. diff --git a/.claude/skills/integrate-apeiron/SKILL.md b/.claude/skills/integrate-apeiron/SKILL.md new file mode 100644 index 0000000..bb40a63 --- /dev/null +++ b/.claude/skills/integrate-apeiron/SKILL.md @@ -0,0 +1,86 @@ +--- +name: integrate-apeiron +description: | + Add apeiron's continual-learning / drift-detection capabilities to a user's + EXISTING training framework. Use when the user already has their own training + loop (vanilla PyTorch, Lightning, HF Trainer, etc.) and wants to bolt on drift + detection and/or CL adaptation rather than adopt apeiron's runner. Inspects the + user's repo, recommends the lightest viable integration path, writes the + adapter glue into their repo, and smoke-tests it. Assumes apeiron is importable + (`import apeiron`) — if not, run install-apeiron first. For a self-contained + apeiron run on custom data, use custom-experiment instead. +argument-hint: "[target_project_dir]" +user-invocable: true +allowed-tools: + - Bash + - Read + - Write + - Edit + - Glob + - Grep +--- + +Integrate apeiron into the user's existing training framework, lightest path first. + +## Arguments +- `$1`: The user's project directory. Defaults to the current working directory. + +## Background: what apeiron exposes (verify against source, don't assume) +- **Drift detectors are standalone** — `detector.update(metric_value: float) -> DriftSignal`; the signal carries `drift_detected`, `regime` (`LearningRegime`), and `drift_score`. This is the lowest-coupling entry point. Build one via `from apeiron.drift_detection import ADWINDetector` (or the others). +- **ContinuousMonitor drives the full loop** but requires a `BaseModelHarness` wrapping the model + data stream, plus a `Config` and a detector. Mirror `src/main.py` for exact construction. +- **CL updaters (EWC/JVP/KFAC) are harness-coupled** — they take a `modelHarness`, so using them implies the harness/monitor route. + +Read these before writing any glue so it matches the current API: +```bash +cat src/main.py # full wiring reference +cat src/apeiron/drift_detection/detectors/base.py # DriftSignal / LearningRegime fields +cat src/apeiron/drift_detection/load_drift_detector.py # building a detector from config +grep -nA12 "class ContinuousMonitor" src/apeiron/driver/continuous_monitor.py +``` + +## Procedure + +### 1. Confirm apeiron is importable +```bash +python -c "import apeiron; print('apeiron', apeiron.__file__)" 2>&1 +``` +If this fails, stop and direct the user to the `install-apeiron` skill, then resume. + +### 2. Discover the user's framework (detect at runtime) +In `$1`, find the training loop and the evaluation signal: +- Detect the stack: `grep -rlE "pytorch_lightning|lightning|transformers|Trainer|accelerate" $1` and look for a manual loop (`loss.backward()`, `optimizer.step()`). +- Locate where a scalar quality metric per step/epoch is available (val accuracy, loss) — this is what a detector consumes. +- Locate the model object and the data iterator (needed only if the full path is chosen). +Summarize what you found before proposing anything. + +### 3. Recommend the lightest path that meets the need, and confirm +Based on what the user wants out of apeiron: +- **Just detect drift / trigger their own retrain** → *detectors-only* (no harness). Lowest coupling — recommend this unless they need apeiron's CL math. +- **Want apeiron's CL regularizers (EWC/JVP/KFAC) or the full monitor→adapt loop** → *harness + ContinuousMonitor*. +Present the recommendation with its trade-offs and get the user's pick before writing code. + +### 4a. Detectors-only adapter (lightest) +Write a small module into the user's repo (e.g. `/apeiron_drift.py`) that: +- Constructs a detector once (choice of ADWIN/KSWIN/PageHinkley). +- Exposes a hook called from their existing eval step: `signal = detector.update(metric); if signal.drift_detected: `. +- Leaves the decision of what to do on drift (log / retrain / reload) to a user-provided callback. +Wire the hook into their loop with a minimal, clearly-marked edit. + +### 4b. Harness + monitor adapter (full) +When CL adaptation is wanted: +- Write a `BaseModelHarness` subclass in their repo that wraps their existing model and data loaders, implementing the abstract methods (read `src/apeiron/model/torch_model_harness.py` and `examples/mnist/model.py` for the current set — includes `get_optmizer`, `update_data_stream`, `get_stream_dataloader`, `get_hist_dataloaders`, `get_train_dataloaders`, `get_criterion`). +- Build a `Config` (via `build_config` from a small TOML, or constructed directly) selecting the detector and `continual_learning.update_mode`. +- Construct and run `ContinuousMonitor` exactly as `src/main.py` does. + +### 5. Smoke-test the integration +Prove the wiring with a tiny run before handing back: +- Detectors-only: a short script feeding a handful of synthetic metric values through the hook, asserting a `DriftSignal` comes back and the drift callback fires on an obvious shift. +- Full path: run their loop (or the monitor) for a couple of iterations with minimal settings (small `max_iter`, few stream updates, `device=cpu`, `logging.backend=none`). +Fix and re-run until it completes cleanly. + +### 6. Report +Summarize: detected stack, chosen path, files added/edited (with the exact insertion points), how to invoke it, and what happens on drift. Note any assumptions the user should revisit (e.g. which metric drives detection, detector sensitivity params). + +## Notes +- Keep edits to the user's training loop minimal and clearly commented so they remain easy to revert. +- Do not assume detector/monitor/harness signatures — they are read from source in this skill precisely so the glue doesn't rot. diff --git a/.claude/skills/lint-check/SKILL.md b/.claude/skills/lint-check/SKILL.md deleted file mode 100644 index 99a7ca3..0000000 --- a/.claude/skills/lint-check/SKILL.md +++ /dev/null @@ -1,58 +0,0 @@ ---- -name: lint-check -description: | - Run lint, format, and type checks on the BaseSim codebase. Use when the user - wants to verify code quality before committing, or after making changes. - Runs ruff linter, ruff formatter, mypy type checker, and pytest. -argument-hint: "[fix]" -user-invocable: true -allowed-tools: - - Bash - - Read ---- - -Run code quality checks on the BaseSim Framework. - -## Arguments -- `$0`: (Optional) Pass "fix" to auto-fix lint and format issues where possible. - -## Checks to Run - -### 1. Ruff Linter -```bash -poetry run ruff check . -``` -If `$0` is "fix": -```bash -poetry run ruff check --fix . -``` - -### 2. Ruff Formatter -```bash -poetry run ruff format --check . -``` -If `$0` is "fix": -```bash -poetry run ruff format . -``` - -### 3. Mypy Type Checker -```bash -poetry run mypy . -``` - -### 4. Pytest -```bash -poetry run pytest -q -``` - -## Procedure - -1. Run all four checks sequentially, capturing output from each. -2. Report a summary: - - Number of lint issues found (and auto-fixed if "fix" mode) - - Number of format issues - - Number of type errors - - Number of test failures -3. If "fix" was specified, re-run the check variants after fixing to confirm resolution. -4. Report any remaining issues that require manual attention. diff --git a/.claude/skills/new-config/SKILL.md b/.claude/skills/new-config/SKILL.md deleted file mode 100644 index d027800..0000000 --- a/.claude/skills/new-config/SKILL.md +++ /dev/null @@ -1,103 +0,0 @@ ---- -name: new-config -description: | - Generate a new TOML config file for a BaseSim experiment. Use when the user - wants to create a configuration for a specific dataset, detector, updater, - and training parameter combination. Can also modify an existing config. -argument-hint: " [base_config_to_copy_from]" -user-invocable: true -allowed-tools: - - Bash - - Read - - Write - - Glob ---- - -Generate a new TOML configuration file for a BaseSim experiment. - -## Arguments -- `$0`: Output path for the new config file (e.g., `examples/mnist/mnist_ewc.toml`) -- `$1`: (Optional) Existing config to use as a starting template - -## Reference: Configuration Dataclasses -!`grep -A 5 "class.*Cfg" src/config/configuration.py` - -## Available Options - -### Datasets (data.name) -- `mnist` -- MNIST handwritten digits (auto-downloads). Model: name="dummy", pretrained_path="examples/mnist/mnist.pth" -- `cifar10` -- CIFAR-10 images (auto-downloads). Model: name="vit16b" or "vgg11", pretrained_path="examples/cifar/cifar10_vit.pth" or "examples/cifar/cifar10_vgg11.pth" -- `imagenet` -- ImageNet (requires local data). Model: name="vit16b", pretrained_path="examples/imagenet/imagenet_vit.pth" - -### Drift Detectors (drift_detection.detector_name) -| Detector | Best For | Key Params | -|---|---|---| -| `ADWINDetector` | Gradual + abrupt changes | adwin_delta=0.002, adwin_minor_threshold=0.3, adwin_moderate_threshold=0.6 | -| `KSWINDetector` | Distribution changes | kswin_alpha=0.005, kswin_window_size=100, kswin_stat_size=30 | -| `PageHinkleyDetector` | Abrupt mean changes | ph_min_instances=30, ph_delta=0.005, ph_threshold=50, ph_alpha=0.9999 | -| `ModelPerformanceDetector` | Batch-level analysis | (evidently defaults) | -| `ModelEvalDetector` | Direct eval comparison | metric_index | - -### CL Updaters (continual_learning.update_mode) -| Mode | Strategy | Key Params | -|---|---|---| -| `base` | Vanilla SGD | (none) | -| `jvp_reg` | JVP regularization | jvp_lambda (default 0.001), jvp_deltax_norm (default 1) | -| `ewc_online` | Elastic Weight Consolidation | ewc_lambda (default 1000.0), ewc_ema_decay (default 0.95) | -| `kfac_online` | KFAC approximation | kfac_lambda (default 0.01), kfac_ema_decay (default 0.95) | -| `none` | No CL updates | (none) | - -## Minimal Template -```toml -seed = 1337 -device = "auto" -multi_gpu = false - -[model] -name = "" -pretrained_path = "" - -[data] -name = "" -path = "" - -[train] -batch_size = 64 -num_workers = 4 -init_lr = 0.001 -max_iter = 600 -grad_accumulation_steps = 1 - -[continual_learning] -update_mode = "base" - -[drift_detection] -detector_name = "ADWINDetector" -detection_interval = 10 -aggregation = "mean" -metric_index = 0 -reset_after_learning = false -max_stream_updates = 20 - -[visualization] -baseline = 90.0 -input = "output/experiment.csv" -output = "output/experiment_dashboard.png" -``` - -## Procedure - -1. If `$1` is provided, read it as the base template. Otherwise use the minimal template above. -2. Ask the user what they want to configure (or apply values they already specified): - - Dataset and model - - Drift detector and its hyperparameters - - CL update strategy and its hyperparameters - - Training parameters (batch size, learning rate, max iterations) -3. Fill in model.name and model.pretrained_path based on dataset choice. -4. Add the appropriate detector-specific hyperparameters for the chosen detector. -5. Set visualization input/output paths based on the experiment name. -6. Write the final TOML to `$0`. -7. Validate the config is parseable: - ```bash - python -c "import tomllib; tomllib.load(open('$0', 'rb')); print('Config is valid TOML')" - ``` diff --git a/.claude/skills/new-detector/SKILL.md b/.claude/skills/new-detector/SKILL.md deleted file mode 100644 index 0f97176..0000000 --- a/.claude/skills/new-detector/SKILL.md +++ /dev/null @@ -1,108 +0,0 @@ ---- -name: new-detector -description: | - Create a new drift detector for the BaseSim framework. Use when the user - wants to implement a custom drift detection algorithm beyond the built-in - ADWIN, KSWIN, PageHinkley, ModelPerformance, ModelEval, and Ensemble detectors. -argument-hint: "" -user-invocable: true -allowed-tools: - - Bash - - Read - - Write - - Glob - - Grep ---- - -Scaffold a new drift detector for the BaseSim framework. - -## Arguments -- `$0`: Class name for the new detector (e.g., "CUSUMDetector", "DDMDetector", "HDDMDetector") - -## Reference: Base Class and Existing Implementations - -### Base class interface (BaseDriftDetector, DriftSignal, LearningRegime) -!`cat src/drift_detection/detectors/base.py` - -### Statistical detector implementations (ADWIN, KSWIN, PageHinkley) -!`cat src/drift_detection/detectors/statistical_detectors.py` - -### Detector factory/loader -!`cat src/drift_detection/load_drift_detector.py` - -### Module exports -!`cat src/drift_detection/__init__.py` - -### Config dataclass for detector parameters -!`grep -A 30 "class DriftDetectionCfg" src/config/configuration.py` - -## Required Interface - -Every detector must subclass `BaseDriftDetector` and implement: - -```python -from drift_detection.detectors.base import BaseDriftDetector, DriftSignal, LearningRegime - -class $0(BaseDriftDetector): - def __init__(self, , name: str = ""): - super().__init__(name) - self._is_initialized = True - # Store hyperparams, initialize internal state - - def update(self, value: float, **kwargs) -> DriftSignal: - """Process new metric value, return drift signal. - - Must return DriftSignal with: - - regime: LearningRegime (STABLE, CONTINUAL_LEARNING, FINE_TUNING, RETRAIN) - - drift_detected: bool - - drift_score: float (0-1, higher = more drift) - - confidence: Optional[float] (0-1) - - metadata: Optional[dict] (extra info) - """ - - def reset(self) -> None: - """Reset to initial state (called after CL if reset_after_learning=true).""" -``` - -## Files to Create/Modify - -### 1. Create detector implementation -Either add to `src/drift_detection/detectors/statistical_detectors.py` if it's a simple statistical detector, or create a new file `src/drift_detection/detectors/.py` for complex detectors. - -### 2. Update `src/drift_detection/load_drift_detector.py` -Add a new `elif detector_name == "$0":` branch in the `load_drift_detector()` factory function. The branch should: -- Extract relevant hyperparameters from `cfg.drift_detection` -- Construct and return an instance of the new detector - -### 3. Update `src/drift_detection/__init__.py` -Add the new detector class to the imports and `__all__` list. - -### 4. Update `src/config/configuration.py` -Add any new hyperparameters to `DriftDetectionCfg` with sensible defaults. Follow the naming convention of existing params (prefix with detector abbreviation, e.g., `adwin_`, `kswin_`, `ph_`). - -## Procedure - -1. Ask the user about the detection algorithm they want to implement: - - What statistical test or method does it use? - - What are its hyperparameters? - - When should it signal CONTINUAL_LEARNING vs FINE_TUNING vs RETRAIN? -2. Read the existing detector implementations for pattern reference. -3. Create the detector class following the established patterns: - - Constructor stores hyperparams and initializes state - - `update()` processes values incrementally and returns `DriftSignal` - - `reset()` clears state completely -4. Wire it into the loader factory in `load_drift_detector.py`. -5. Add config parameters to `DriftDetectionCfg`. -6. Update `__init__.py` exports. -7. Verify imports work: - ```bash - cd /home/user/BaseSim_Framework && poetry run python -c "from drift_detection import $0; print('Import OK:', $0)" - ``` - -## Design Guidelines -- Detectors should be **stateful** and process one value at a time via `update()` -- The `update()` method must return a `DriftSignal` every call (even when no drift) -- Use `LearningRegime.STABLE` for no-drift signals -- `drift_score` should be normalized to 0-1 where possible -- Support `reset()` for the `reset_after_learning` config option -- Keep external dependencies minimal (prefer `river` for streaming algorithms) diff --git a/.claude/skills/new-harness/SKILL.md b/.claude/skills/new-harness/SKILL.md deleted file mode 100644 index 1a4b9c9..0000000 --- a/.claude/skills/new-harness/SKILL.md +++ /dev/null @@ -1,128 +0,0 @@ ---- -name: new-harness -description: | - Create a new model harness for integrating a custom model and dataset into - the BaseSim framework. Use when the user wants to add support for a new - dataset (beyond MNIST, CIFAR-10, ImageNet) or a new model architecture. - Generates the harness class, data utilities, TOML config, and registers - it in the example factory. -argument-hint: " [dataset_name]" -user-invocable: true -allowed-tools: - - Bash - - Read - - Write - - Glob - - Grep ---- - -Scaffold a new model harness for a custom dataset/model in the BaseSim framework. - -## Arguments -- `$0`: Name for the new example directory (e.g., "fashionmnist", "svhn", "custom_tabular") -- `$1`: (Optional) Dataset identifier for `data.name` in config, defaults to `$0` - -## Reference: Existing Implementations - -Read these files before generating code to ensure consistency with current patterns: - -### Base class interface -!`cat src/model/torch_model_harness.py` - -### Canonical example (MNIST harness) -!`cat examples/mnist/model.py` - -### Data utilities pattern -!`cat examples/mnist/utils.py` - -### Factory that registers harnesses -!`cat examples/utils.py` - -## Files to Create - -### 1. `examples/$0/__init__.py` -Empty init file. - -### 2. `examples/$0/model.py` -Subclass of `BaseModelHarness` implementing all abstract methods: - -```python -class Harness(BaseModelHarness): - def __init__(self, cfg: Config): - model = () - super().__init__(cfg=cfg, model=model) - self.eval_metrics = {"accuracy": accuracy} # from evaluation.metrics - # Load pretrained weights if available - # Initialize task_counter, data state, aug_history - - def get_optmizer(self) -> Optimizer: - return torch.optim.Adam(self.model.parameters(), lr=self.cfg.train.init_lr) - - def update_data_stream(self) -> None: - # Increment task_counter - # Apply drift simulation (e.g., affine transforms for images) - # Rebuild data loaders with new transforms - # Track augmentation history for replay - - def get_stream_dataloader(self) -> DataLoader: - # Return data_loader for current data - - def get_train_dataloaders(self) -> Tuple[DataLoader, DataLoader]: - # Return (train_loader, val_loader) for current data - # Call _dispose_current_loaders() first if loaders exist - - def get_hist_dataloaders(self) -> Tuple[Optional[DataLoader], Optional[DataLoader]]: - # Return historical data loaders for CL replay - # Return (None, None) when task_counter == 1 - - def get_criterion(self) -> CriterionFn: - return nn.CrossEntropyLoss() -``` - -Key patterns from existing harnesses: -- `eval_metrics` must be `Dict[str, Callable[[Tensor, Tensor], scalar]]` -- Use `_dispose_current_loaders()` helper for memory cleanup before rebuilding loaders -- `update_data_stream()` increments `task_counter` and applies cumulative augmentation drift -- `get_hist_dataloaders()` returns `(None, None)` when `task_counter == 1` (no history yet) - -### 3. `examples/$0/utils.py` -Dataset loading utilities following the MNIST pattern: -- `get__train()` / `get__val()` -- load raw dataset -- `FixedAffine` -- Custom transform for deterministic drift simulation -- `TransformedView` -- Dataset wrapper applying transforms -- `sample_aug(seed)` -- Sample random augmentation parameters deterministically -- `make_loader(dataset, batch_size, num_workers, shuffle)` -- DataLoader factory - -### 4. `examples/$0/.toml` -TOML config following the project convention. Set appropriate: -- `[model]` name and pretrained_path -- `[data]` name matching the factory branch -- `[train]` reasonable defaults for the dataset -- `[drift_detection]` with ADWINDetector defaults -- `[visualization]` with output paths - -### 5. Update `examples/utils.py` -Add an `elif cfg.data.name == "":` branch that imports and returns the new harness. - -## Procedure - -1. Create the `examples/$0/` directory. -2. Generate `__init__.py` (empty). -3. Generate `model.py` with the harness class. Ask the user about: - - Model architecture (CNN, ViT, MLP, etc.) - - Dataset source (torchvision, custom, HuggingFace, etc.) - - Number of classes, input dimensions - - Type of drift simulation appropriate for the domain -4. Generate `utils.py` with data loading utilities. -5. Generate the TOML config file. -6. Update `examples/utils.py` factory with the new branch. -7. Verify the import chain works: - ```bash - cd /home/user/BaseSim_Framework && poetry run python -c "from examples.utils import get_example; print('Factory imports OK')" - ``` - -## Important Notes -- The model's `__init__` receives `cfg: Config` and must call `super().__init__(cfg=cfg, model=)` -- `eval_metrics` must include at minimum an `accuracy` metric from `evaluation.metrics` -- For non-image datasets, adapt the `FixedAffine` pattern to domain-appropriate transforms (e.g., feature noise for tabular, token perturbation for text) -- Pretrained weights are optional -- the harness should handle missing weight files gracefully diff --git a/.claude/skills/new-updater/SKILL.md b/.claude/skills/new-updater/SKILL.md deleted file mode 100644 index 4d2ea56..0000000 --- a/.claude/skills/new-updater/SKILL.md +++ /dev/null @@ -1,139 +0,0 @@ ---- -name: new-updater -description: | - Create a new continual learning update strategy for the BaseSim framework. - Use when the user wants to add a new regularization method beyond the - built-in base, JVP, EWC, and KFAC updaters. -argument-hint: " [update_mode_name]" -user-invocable: true -allowed-tools: - - Bash - - Read - - Write - - Glob - - Grep ---- - -Scaffold a new continual learning updater for the BaseSim framework. - -## Arguments -- `$0`: Class name for the new updater (e.g., "SIUpdater", "MASUpdater", "PackNetUpdater", "AGEMUpdater") -- `$1`: (Optional) update_mode string for config lookup. Defaults to lowercase `$0` without "Updater" suffix. - -## Reference: Base Class and Existing Implementations - -### Base updater interface with all hook methods -!`cat src/training/updater/base.py` - -### EWC implementation (most complete example of regularization-based updater) -!`cat src/training/updater/ewc.py` - -### JVP regularization implementation -!`cat src/training/updater/jvp_reg.py` - -### Updater factory -!`cat src/training/updater/create_updater.py` - -### CL config parameters -!`grep -A 20 "class ContinualLearningCfg" src/config/configuration.py` - -### How the trainer calls updater hooks -!`cat src/training/continuous_trainer.py` - -## Required Interface - -Every updater subclasses `BaseUpdater`. The available hooks are called in this order per training iteration: - -``` -cl_preprocessing() # Once, before CL loop starts - for each optimizer step: - update_pre_fwd_bwd() # Before gradient computation - for each accumulation step: - fwd_bwd(batch, hist_batch) -> loss # Forward + backward - update_post_fwd_bwd() -> reg_loss # After backward, before optimizer.step() - optimizer.step() - update_post_optimizer_call() # After optimizer step -cl_postprocessing() # Once, after CL loop ends -``` - -```python -from training.updater.base import BaseUpdater -from config.configuration import Config -from model.torch_model_harness import BaseModelHarness - -class $0(BaseUpdater): - def __init__(self, cfg: Config, modelHarness: BaseModelHarness) -> None: - super().__init__(cfg, modelHarness) - # self.criterion and self.model are set by BaseUpdater - # Initialize regularization-specific state here - - def cl_preprocessing(self) -> None: - """Save reference parameters, compute importance weights, etc.""" - - def fwd_bwd(self, batch, hist_batch=None) -> float: - """Forward + backward. Override for custom loss computation. - Return the loss scalar. Must call loss.backward().""" - - def update_post_fwd_bwd(self) -> float: - """Apply gradient penalties/modifications after backward. - Return regularization loss value (for logging).""" - - def update_post_optimizer_call(self) -> None: - """Update running statistics after parameter update.""" - - def cl_postprocessing(self) -> None: - """Commit estimates, update EMA buffers, etc.""" -``` - -## Common CL Strategy Patterns - -### Regularization-based (EWC, SI, MAS) -- `cl_preprocessing()`: Snapshot reference parameters -- `update_post_fwd_bwd()`: Add penalty term based on parameter importance -- `cl_postprocessing()`: Update importance estimates with EMA - -### Gradient-based (A-GEM, OGD) -- `fwd_bwd()`: Custom forward/backward with gradient projection -- `update_post_fwd_bwd()`: Project gradients onto feasible region - -### Replay-based (with updater hooks) -- `fwd_bwd()`: Use `hist_batch` for experience replay -- Already partially supported by `jvp_reg` pattern - -## Files to Create/Modify - -### 1. Create `src/training/updater/.py` -The new updater implementation file. - -### 2. Update `src/training/updater/create_updater.py` -Add a new branch: -```python -if cfg.continual_learning.update_mode == "": - from training.updater. import $0 - return $0(cfg, modelHarness) -``` - -### 3. Update `src/config/configuration.py` -Add new hyperparameters to `ContinualLearningCfg` with sensible defaults. Follow the naming pattern: `_` (e.g., `si_lambda`, `si_epsilon`). - -## Procedure - -1. Discuss the CL algorithm with the user: - - What type of regularization or constraint does it apply? - - What state needs to be maintained between tasks? - - What are its hyperparameters? -2. Read the existing updater implementations for structural reference. -3. Create the updater file following the EWC pattern structure. -4. Wire into the factory in `create_updater.py`. -5. Add config parameters to `ContinualLearningCfg`. -6. Verify imports and factory work: - ```bash - cd /home/user/BaseSim_Framework && poetry run python -c "from training.updater.create_updater import create_updater; print('Factory imports OK')" - ``` - -## Design Guidelines -- All hooks decorated with `@torch.no_grad()` except `fwd_bwd()` (which needs gradients) -- Regularization losses from `update_post_fwd_bwd()` are logged separately from generation loss -- Use `self.model.parameters()` or `self.model.named_parameters()` for parameter access -- Divide loss by `self.cfg.train.grad_accumulation_steps` in `fwd_bwd()` (see base implementation) -- Keep device management consistent -- use `self.cfg.device` for tensor placement diff --git a/.claude/skills/run-experiment/SKILL.md b/.claude/skills/run-experiment/SKILL.md deleted file mode 100644 index 7d824ce..0000000 --- a/.claude/skills/run-experiment/SKILL.md +++ /dev/null @@ -1,64 +0,0 @@ ---- -name: run-experiment -description: | - Run a BaseSim continual learning experiment. Use when the user wants to - execute a training/monitoring run with a TOML config file. Supports - overriding config values. Examples: run MNIST experiment, run CIFAR with - EWC updater, run experiment with PageHinkley detector. -argument-hint: " [--set key=val ...]" -user-invocable: true -allowed-tools: - - Bash - - Read - - Glob - - Grep ---- - -Run a BaseSim continual learning experiment. - -## Arguments -- `$0`: Path to a TOML config file (e.g., `examples/mnist/mnist.toml`). If not provided, list available configs and ask. -- Remaining arguments: Optional `--set key=val` overrides passed through to the runner. - -## Available Configs -- `examples/mnist/mnist.toml` -- MNIST with ADWIN detector -- `examples/mnist/mnist-generic.toml` -- MNIST generic -- `examples/cifar/cifar10_vit.toml` -- CIFAR-10 with Vision Transformer -- `examples/cifar/cifar10_vgg11.toml` -- CIFAR-10 with VGG11 - -## Common Overrides -- Change detector: `--set drift_detection.detector_name=KSWINDetector` -- Change CL updater: `--set continual_learning.update_mode=ewc_online` -- Change batch size: `--set train.batch_size=128` -- Change learning rate: `--set train.init_lr=0.0001` -- Set max CL iterations: `--set train.max_iter=300` -- Force CPU: `--set device=cpu` -- Change max stream updates: `--set drift_detection.max_stream_updates=50` - -## Procedure - -1. **Validate config exists.** If `$0` is empty or not a valid file path, list available configs: - ```bash - find examples/ -name "*.toml" -type f - ``` - Then ask the user which config to use. - -2. **Show the config** so the user can confirm settings before running: - Read the TOML file and display a summary of key settings (dataset, detector, updater, batch size, device). - -3. **Check Poetry environment** is ready: - ```bash - poetry env info --path 2>/dev/null || echo "Poetry env not found. Run: poetry install" - ``` - -4. **Run the experiment**: - ```bash - poetry run python -m src.main --config $ARGUMENTS - ``` - The `$ARGUMENTS` variable includes the config path and any `--set` overrides. - -5. **Report results** after completion: - - Whether drift was detected and how many times - - Final accuracy metrics (from terminal output) - - Path to the output CSV file (from the config's `visualization.input` field) - - Suggest running `/visualize` to generate a dashboard from the results diff --git a/.claude/skills/visualize/SKILL.md b/.claude/skills/visualize/SKILL.md deleted file mode 100644 index fb80ba6..0000000 --- a/.claude/skills/visualize/SKILL.md +++ /dev/null @@ -1,64 +0,0 @@ ---- -name: visualize -description: | - Visualize experiment results from a BaseSim run. Use when the user wants to - see metrics dashboards, accuracy curves, drift detection events, or loss - plots from a completed experiment. Generates PNG dashboards from CSV metrics. -argument-hint: "" -user-invocable: true -allowed-tools: - - Bash - - Read - - Glob ---- - -Visualize results from a BaseSim continual learning experiment. - -## Arguments -- `$0`: Path to a TOML config file that contains a `[visualization]` section specifying the input CSV and output PNG paths. - -## What Gets Generated -The visualization module creates a 6-panel dashboard PNG showing: -1. **Test Accuracy** over time with baseline threshold -2. **Historical Test Accuracy** (forgetting measure) -3. **Loss Metrics** (generation loss, forgetting loss) -4. **Computational Performance** (FLOPs per operation) -5. **Throughput** (TFLOP/s) -6. **Execution Time** analysis - -Drift events are marked as red vertical lines on all panels. - -## Procedure - -1. **Validate config.** Read the TOML file and extract the `[visualization]` section: - - `input`: path to CSV metrics file (generated during experiment) - - `output`: path for the output dashboard PNG - - `baseline`: accuracy threshold for the dashboard - -2. **Check input CSV exists:** - ```bash - ls -la - ``` - If not found, inform the user they need to run an experiment first. Suggest: - ``` - /run-experiment - ``` - -3. **Preview the CSV data** to confirm it has valid content: - ```bash - head -20 - wc -l - ``` - -4. **Run the visualizer:** - ```bash - poetry run python -m src.visualize --config $0 - ``` - -5. **Report results:** - - Confirm the dashboard PNG was generated at the configured output path - - Summarize key metrics from the CSV: - - Number of drift events - - Accuracy range (min, max, final) - - Number of CL training events triggered - - Total evaluation steps diff --git a/.codex/skills/custom-experiment/SKILL.md b/.codex/skills/custom-experiment/SKILL.md new file mode 100644 index 0000000..e460a8f --- /dev/null +++ b/.codex/skills/custom-experiment/SKILL.md @@ -0,0 +1,130 @@ +--- +name: custom-experiment +description: Scaffold and run an apeiron experiment for the user's own dataset and model. Use when the user wants to bring custom data or architecture beyond shipped examples, create a model harness, write a config, register it in the example factory, smoke-test it, and run the full experiment. For bundled demos, use explore-examples. For integrating apeiron into an existing external training loop, use integrate-apeiron. +metadata: + short-description: Build a custom apeiron experiment +--- + +# Custom Experiment + +Scaffold and run an apeiron experiment on the user's own data and model. + +## Inputs + +- Short name: lowercase identifier such as `fashionmnist` or `mytabular`. Use it for `examples//` and `data.name`. +- Optional config output path: default to `examples//.toml`. + +Ask only for missing details that cannot be inferred: + +- dataset source and loading method +- model architecture, input shape, and number of classes or outputs +- drift simulation to apply to the stream +- optional pretrained weights path +- starting drift detector and continual-learning updater, defaulting to `ADWINDetector` and `base` + +## Procedure + +### 1. Read Current Patterns + +Mirror live source instead of assuming signatures: + +```bash +cat src/apeiron/model/torch_model_harness.py +cat examples/mnist/model.py +cat examples/mnist/utils.py +cat examples/utils.py +grep -nA6 "class .*Cfg" src/apeiron/config/configuration.py +``` + +Implement exactly the abstract methods declared by the current harness ABC. Preserve known current spelling such as `get_optmizer` if the source still declares it that way. + +Set `self.eval_metrics` with at least an `accuracy` entry from `apeiron.evaluation.metrics` when the task is classification. + +### 2. Scaffold Files + +Create: + +- `examples//__init__.py` +- `examples//model.py` +- `examples//utils.py` +- the config TOML at the requested output path or `examples//.toml` + +`model.py` should: + +- define a `BaseModelHarness` subclass +- call `super().__init__(cfg=cfg, model=)` +- implement every abstract method from the current ABC +- apply cumulative drift in `update_data_stream()` +- return `(None, None)` from `get_hist_dataloaders()` for the first task when no history exists + +`utils.py` should: + +- load the dataset +- include deterministic drift transforms +- provide a lightweight transformed-view wrapper +- expose a `make_loader(...)` helper following the MNIST example pattern + +The TOML config should follow existing examples for the exact key set and include: + +- `[model]` +- `[data]` with `name = ""` +- `[train]` +- `[drift_detection]` +- `[continual_learning]` when needed +- `[visualization]` + +### 3. Register In The Example Factory + +Add a branch to `get_example()` in `examples/utils.py`: + +```python +elif cfg.data.name == "": + from examples..model import + return (cfg=cfg) +``` + +Match the surrounding factory style exactly. + +### 4. Validate + +Run: + +```bash +python -c "import tomllib; tomllib.load(open('', 'rb')); print('TOML OK')" +poetry run python -c "from examples.utils import get_example; print('factory OK')" +``` + +If `pretrained_path` is configured, confirm the file exists. Warn if it is missing and make the harness tolerate training from scratch when possible. + +### 5. Smoke-Test + +Run a small CPU-only smoke test before any full run: + +```bash +poetry run python -m src.main --config \ + --set train.max_iter=2 \ + --set drift_detection.max_stream_updates=2 \ + --set drift_detection.detection_interval=1 \ + --set device=cpu \ + --set logging.backend=none +``` + +If it fails, read the traceback, fix the harness or config, and re-run the smoke test until it completes. + +Confirm with the user before starting a full experiment run. + +### 6. Full Run And Report + +Run: + +```bash +poetry run python -m src.main --config +``` + +Report drift events, final accuracy or metric, and the output CSV path from `visualization.input`. + +## Notes + +- This skill uses the in-repo example factory pattern. +- To wire apeiron into an existing project without adopting the example runner, use `integrate-apeiron`. +- Older piecewise Claude skills such as `new-harness` or `new-config` may be stale against the current `src/apeiron/` layout. diff --git a/.codex/skills/explore-examples/SKILL.md b/.codex/skills/explore-examples/SKILL.md new file mode 100644 index 0000000..0137e24 --- /dev/null +++ b/.codex/skills/explore-examples/SKILL.md @@ -0,0 +1,91 @@ +--- +name: explore-examples +description: Run a bundled apeiron example experiment to explore the framework. Use when the user wants to try apeiron, run a default or demo experiment, see drift detection and continual learning behavior, or choose from shipped MNIST/CIFAR configs. Presents available example configs, runs the chosen one, and reports the metrics output. For the user's own data and model, use custom-experiment instead. +metadata: + short-description: Run bundled apeiron examples +--- + +# Explore Examples + +Run one of apeiron's bundled examples end to end. + +## Inputs + +- Optional config path: if the user provides one, run that bundled config. +- If no config path is provided, discover the available configs and let the user choose. + +## Procedure + +### 1. Discover Configs + +Build the menu dynamically from the repo: + +```bash +find examples -name "*.toml" -type f | sort +``` + +For each config, read enough TOML fields to summarize it: + +- `data.name` +- `model.name` +- `drift_detection.detector_name` +- `continual_learning.update_mode` + +Present a numbered menu and ask the user which config to run. Recommend MNIST for a first run when no preference is given. + +### 2. Check Pretrained Weights + +- MNIST is the expected low-friction path when `examples/mnist/mnist.pth` exists. +- For non-MNIST configs, read any configured `pretrained_path`. +- If the referenced file is missing, tell the user plainly that the run may train from scratch or fail to load weights, then ask whether to continue or switch configs. + +### 3. Choose Logging Backend + +Before running, ask which metrics backend to use and pass it as an override instead of editing the config: + +- `none`: no account or network, best for local smoke runs. +- `wandb`: requires an authenticated Weights & Biases session. +- `mlflow`: uses MLflow tracking. + +Default to `none` when the user asks for a quick local run. + +### 4. Summarize And Run + +Summarize the selected config: dataset, model, detector, updater, device, and batch size. + +Run from the project root: + +```bash +poetry run python -m src.main --config --set logging.backend= +``` + +This is a real training and monitoring run. Stream output and do not silently background it. + +### 5. Report Results + +Summarize from the run output: + +- whether drift was detected +- number of drift events when available +- final accuracy or final reported metric +- output CSV path from `visualization.input` + +The package emits a CSV for inspection; it does not ship a built-in dashboard renderer. + +## Useful Commands + +Quick local first run: + +```bash +poetry run python -m src.main --config examples/mnist/mnist.toml --set logging.backend=none +``` + +Useful overrides: + +```bash +--set drift_detection.detector_name=PageHinkleyDetector +--set continual_learning.update_mode=ewc_online +--set device=cpu +``` + +If Poetry is not set up, complete the repo's development install before running examples. diff --git a/.codex/skills/install-apeiron/SKILL.md b/.codex/skills/install-apeiron/SKILL.md new file mode 100644 index 0000000..12ced61 --- /dev/null +++ b/.codex/skills/install-apeiron/SKILL.md @@ -0,0 +1,102 @@ +--- +name: install-apeiron +description: Install the apeiron continual-learning package into an existing Python project so `import apeiron` works. Use when the user wants to add apeiron to another project or training framework, set it up as a path or git dependency, or fix imports in an external codebase. Handles package-manager detection, Python compatibility checks, and CPU vs CUDA PyTorch selection. Do not use for developing inside the apeiron repo itself; that path is just the repo's normal development install. +metadata: + short-description: Install apeiron into another Python project +--- + +# Install Apeiron + +Install apeiron as a dependency in the user's own Python project. + +## Inputs + +- Target project directory: use the path the user gives, otherwise the current working directory. +- Optional git URL: when the user asks for a git dependency, use that URL instead of a local path dependency. + +Do not assume missing values. Read them from the repo when possible, and ask only when the source or target cannot be discovered safely. + +## Success Criteria + +From inside the target project's environment, this command must pass: + +```bash +python -c "from apeiron import BaseModelHarness, ContinuousMonitor, build_config; print('apeiron OK')" +``` + +## Procedure + +### 1. Resolve The Target + +- Confirm the target directory contains `pyproject.toml`, `requirements.txt`, or `setup.py`. +- Detect the manager: + - Poetry when `[tool.poetry]` or `poetry.lock` is present. + - Otherwise use the existing pip or uv workflow. +- If no dependency-management files are present, ask the user how dependencies are managed. + +### 2. Resolve The Apeiron Source + +- If the user provided a git URL, use it as the dependency source. +- Otherwise prefer a local checkout. +- The current repo is apeiron when its `pyproject.toml` identifies the package as `apeiron`. Confirm this from the file before using the current repo path. +- If no local checkout can be found and no git URL was provided, ask for the apeiron path or git URL. + +### 3. Check Python Compatibility + +- Read apeiron's Python requirement from its `pyproject.toml`; do not hardcode it. +- Check the target project's interpreter with `python --version` or `poetry env info --python`. +- If the interpreter is outside apeiron's required range, stop and give exact remediation steps, such as installing a matching CPython and pointing Poetry at it with `poetry env use `. +- Do not silently install or switch interpreters. + +### 4. Ensure Poetry When Needed + +- For Poetry targets, check `command -v poetry`. +- If Poetry is missing and installing it is necessary, request permission before running package-install commands such as `pipx install poetry` or `pip install --user poetry`. +- Re-check `poetry --version` before continuing. + +### 5. Select The PyTorch Backend + +- Probe for an NVIDIA GPU with `nvidia-smi -L` when available. +- If a GPU is present, use the default torch resolution and report that CUDA-capable wheels will be used. +- If no GPU is present, prefer CPU-only PyTorch wheels. +- For Poetry targets, add or preserve an explicit PyTorch CPU source before locking: + +```toml +[[tool.poetry.source]] +name = "pytorch-cpu" +url = "https://download.pytorch.org/whl/cpu" +priority = "explicit" + +[tool.poetry.dependencies] +torch = { source = "pytorch-cpu" } +``` + +Then run `poetry lock`. + +### 6. Add The Dependency + +From the target project directory: + +- Poetry local path: `poetry add --editable ` +- Poetry git URL: `poetry add "git+"` +- pip or uv local path: `pip install -e ` +- pip or uv git URL: `pip install "apeiron @ git+"` + +For a CPU-only pip install, install torch from `https://download.pytorch.org/whl/cpu` before installing apeiron. + +### 7. Verify And Report + +- Run the import check from the success criteria inside the target environment. +- For Poetry targets, use `poetry run python -c ...`. +- Report: + - apeiron source used, path or git URL + - target Python version + - package manager used + - compute backend selected, CPU or CUDA +- Suggest `explore-examples` for a bundled demo or `integrate-apeiron` for wiring apeiron into an existing training loop. + +## Troubleshooting + +- `ModuleNotFoundError: apeiron`: re-run the dependency add from the target project directory. +- CUDA wheels on a CPU machine: make sure the CPU-only torch source was added before locking or installing. +- Python version conflict: point the target environment at a compatible interpreter instead of changing apeiron's requirement. diff --git a/.codex/skills/integrate-apeiron/SKILL.md b/.codex/skills/integrate-apeiron/SKILL.md new file mode 100644 index 0000000..8c6ccdf --- /dev/null +++ b/.codex/skills/integrate-apeiron/SKILL.md @@ -0,0 +1,112 @@ +--- +name: integrate-apeiron +description: Add apeiron drift-detection or continual-learning behavior to an existing training framework. Use when the user already has a training loop, such as vanilla PyTorch, Lightning, Hugging Face Trainer, or Accelerate, and wants to integrate apeiron rather than adopt apeiron's runner. Inspects the target repo, recommends the lightest viable integration path, writes adapter glue, and smoke-tests it. If `import apeiron` fails, use install-apeiron first. +metadata: + short-description: Integrate apeiron into a training loop +--- + +# Integrate Apeiron + +Integrate apeiron into an existing training framework with the least coupling that meets the user's goal. + +## Input + +- Target project directory: use the path the user gives, otherwise the current working directory. + +## Background + +Verify APIs against source before writing glue: + +```bash +cat src/main.py +cat src/apeiron/drift_detection/detectors/base.py +cat src/apeiron/drift_detection/load_drift_detector.py +grep -nA12 "class ContinuousMonitor" src/apeiron/driver/continuous_monitor.py +``` + +Current conceptual paths: + +- Drift detectors are standalone: `detector.update(metric_value: float) -> DriftSignal`. +- `DriftSignal` carries fields such as `drift_detected`, `regime`, and `drift_score`; confirm exact fields in source. +- `ContinuousMonitor` drives the full apeiron loop and requires a `BaseModelHarness`, config, and detector. +- CL updaters such as EWC, JVP, and KFAC are harness-coupled, so using them implies the harness and monitor path. + +## Procedure + +### 1. Confirm Apeiron Is Importable + +From the target project environment, run: + +```bash +python -c "import apeiron; print('apeiron', apeiron.__file__)" +``` + +If this fails, stop and use or recommend `install-apeiron` before continuing. + +### 2. Discover The Framework + +In the target project, locate: + +- framework indicators such as `pytorch_lightning`, `lightning`, `transformers`, `Trainer`, or `accelerate` +- manual PyTorch loops with `loss.backward()` and `optimizer.step()` +- the scalar quality metric available per step or epoch, such as validation accuracy or loss +- the model object and data iterator when the full monitor path may be needed + +Summarize what you found before proposing changes. + +### 3. Choose The Integration Path + +Recommend the lightest path that satisfies the user's goal: + +- Detectors-only adapter: best when the user wants drift detection or wants to trigger their own retraining. +- Harness plus `ContinuousMonitor`: required when the user wants apeiron CL regularizers or the full monitor-to-adapt loop. + +Confirm the chosen path with the user before editing their training loop. + +### 4. Detectors-Only Adapter + +For the light path, add a small module such as `/apeiron_drift.py` that: + +- constructs one detector, such as ADWIN, KSWIN, or PageHinkley +- exposes a hook called from the existing eval step +- calls `signal = detector.update(metric)` +- calls a user-provided callback when `signal.drift_detected` +- leaves the drift response, such as log, retrain, or reload, to the user's callback + +Wire the hook into the loop with a minimal, clearly marked edit. + +### 5. Harness And Monitor Adapter + +For the full path: + +- Write a `BaseModelHarness` subclass wrapping the existing model and data loaders. +- Read `src/apeiron/model/torch_model_harness.py` and `examples/mnist/model.py` for the current abstract methods. +- Preserve current method names exactly, including `get_optmizer` if that is what the ABC declares. +- Build a config with `build_config` from a small TOML or construct the current config object directly. +- Construct and run `ContinuousMonitor` using `src/main.py` as the wiring reference. + +### 6. Smoke-Test + +Prove the wiring before handing back: + +- Detectors-only: run a short script feeding synthetic metric values through the hook and assert that a `DriftSignal` is returned. Use an obvious shift when trying to exercise the callback. +- Full path: run only a tiny CPU-bound loop or monitor pass with minimal iterations, no network logging, and small stream settings. + +Read failures, fix the glue, and re-run until the smoke test passes. + +### 7. Report + +Report: + +- detected framework +- chosen integration path +- files added or edited +- exact insertion point in the loop +- how to run the smoke test +- what happens when drift is detected +- assumptions the user should revisit, especially the detection metric and detector sensitivity + +## Notes + +- Keep edits to the user's loop minimal and easy to revert. +- Do not hardcode detector, monitor, or harness signatures; read them from source first. From 60ecec0fc2360a84019fa7d68a20ee5416985b5b Mon Sep 17 00:00:00 2001 From: Steffen Date: Thu, 4 Jun 2026 14:34:47 -0400 Subject: [PATCH 2/3] update config files to remove stale settings --- CLAUDE.md | 41 +++++++++++++++-------------- examples/cifar/cifar10_vgg11.toml | 2 -- examples/cifar/cifar10_vit.toml | 4 +-- examples/mnist/mnist.toml | 2 -- poetry.lock | 4 +-- src/apeiron/config/configuration.py | 4 +-- tests/test_config.py | 2 +- 7 files changed, 26 insertions(+), 33 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 9f3eed2..e9bcb1c 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -9,11 +9,6 @@ A PyTorch continuous learning framework for real-time concept drift detection an poetry run python -m src.main --config ``` -### Visualizing results -```bash -poetry run python -m src.visualize --config -``` - ### Running tests ```bash poetry run pytest @@ -30,19 +25,21 @@ poetry run mypy . ### Entry Points - `src/main.py` -- Main experiment runner. Builds config, loads model harness, runs ContinuousMonitor. -- `src/visualize.py` -- Visualization entry point. Reads CSV metrics and generates dashboard PNGs. + +The installable package lives under `src/apeiron/` (imported as `apeiron`; see `pyproject.toml` `packages = [{ include = "apeiron", from = "src" }]`). ### Core Pipeline -1. **Config** (`src/config/configuration.py`): TOML-based config parsed into frozen dataclasses (`Config`, `ModelCfg`, `DataCfg`, `TrainCfg`, `ContinualLearningCfg`, `DriftDetectionCfg`, `VisualizationCfg`). Supports `--set key=val` CLI overrides and `APP_` env var overrides. -2. **Model Harness** (`src/model/torch_model_harness.py`): Abstract `BaseModelHarness` providing `get_stream_dataloader()`, `get_train_dataloaders()`, `get_hist_dataloaders()`, `update_data_stream()`, `get_criterion()`, `get_optmizer()`, and `eval_metrics` dict. -3. **Driver** (`src/driver/continuous_monitor.py`): `ContinuousMonitor` orchestrates the monitoring loop -- evaluates batches, checks drift at intervals, dispatches CL training on drift. -4. **Drift Detection** (`src/drift_detection/`): `BaseDriftDetector` ABC with `update(value) -> DriftSignal`. Implementations: ADWINDetector, KSWINDetector, PageHinkleyDetector, ModelPerformanceDetector, ModelEvalDetector, EnsembleDetector. -5. **Training** (`src/training/continuous_trainer.py`): `ContinuousTrainer` runs outer/inner CL loops with gradient accumulation. -6. **Updaters** (`src/training/updater/`): `BaseUpdater` with hooks `cl_preprocessing()`, `fwd_bwd()`, `update_pre_fwd_bwd()`, `update_post_fwd_bwd()`, `update_post_optimizer_call()`, `cl_postprocessing()`. Implementations: base (vanilla), jvp_reg (JVP regularization), ewc_online (EWC), kfac_online (KFAC), none (no-op). -7. **Evaluation** (`src/evaluation/metrics.py`): `accuracy()` and `accuracy_topk()`. -8. **Logger** (`src/logger/`): Singleton `Logger` combining WandB metrics and console output. Stages: eval, drift, cl. -9. **Profilers** (`src/profilers/`): `FLOPSProfiler` using PyTorch FlopCounterMode. -10. **Visualization** (`src/visualization/metrics.py`): Dashboard generation from CSV metrics. +1. **Config** (`src/apeiron/config/configuration.py`): TOML-based config parsed into frozen dataclasses (`Config`, `ModelCfg`, `DataCfg`, `TrainCfg`, `ContinualLearningCfg`, `DriftDetectionCfg`, `LoggingCfg`, `VisualizationCfg`). Supports `--set key=val` CLI overrides and `APP_` env var overrides. +2. **Model Harness** (`src/apeiron/model/torch_model_harness.py`): Abstract `BaseModelHarness` providing `get_stream_dataloader()`, `get_train_dataloaders()`, `get_hist_dataloaders()`, `update_data_stream()`, `get_criterion()`, `get_optmizer()`, and `eval_metrics` dict. +3. **Driver** (`src/apeiron/driver/continuous_monitor.py`): `ContinuousMonitor` orchestrates the monitoring loop -- evaluates batches, checks drift at intervals, dispatches CL training on drift. +4. **Drift Detection** (`src/apeiron/drift_detection/`): `BaseDriftDetector` ABC with `update(value) -> DriftSignal`. Implementations: ADWINDetector, KSWINDetector, PageHinkleyDetector, ModelPerformanceDetector, ModelEvalDetector, EnsembleDetector. +5. **Training** (`src/apeiron/training/continuous_trainer.py`): `ContinuousTrainer` runs outer/inner CL loops with gradient accumulation. +6. **Updaters** (`src/apeiron/training/updater/`): `BaseUpdater` with hooks `cl_preprocessing()`, `fwd_bwd()`, `update_pre_fwd_bwd()`, `update_post_fwd_bwd()`, `update_post_optimizer_call()`, `cl_postprocessing()`. Implementations: base (vanilla), jvp_reg (JVP regularization), ewc_online (EWC), kfac_online (KFAC), none (no-op). +7. **Evaluation** (`src/apeiron/evaluation/metrics.py`): `accuracy()` and `accuracy_topk()`. +8. **Logger** (`src/apeiron/logger/`): `Logger` with pluggable metrics backends -- `WandBLogger` and `MLFlowLogger` (configured via `[logging] backend = "wandb"|"mlflow"|"none"`), plus console output. Stages: eval, drift, cl. +9. **Profilers** (`src/apeiron/profilers/`): `FLOPSProfiler` (`count_flops.py`) using PyTorch FlopCounterMode. + +Note: `[visualization]` config (`VisualizationCfg`) is parsed but there is no bundled dashboard/renderer in the current package; runs emit a CSV at `visualization.input` for external plotting. ### Example Harnesses - `examples/mnist/model.py`: `MNIST_CNN` -- CNN on MNIST with affine drift simulation. @@ -52,18 +49,22 @@ poetry run mypy . ### Configuration Format (TOML) Required sections: `[model]` (name, pretrained_path), `[data]` (name, path), `[train]` (batch_size, num_workers, init_lr), `[drift_detection]` (detector_name, detection_interval, etc). -Optional sections: `[continual_learning]` (update_mode, lambda params), `[visualization]` (baseline, input, output). +Optional sections: `[continual_learning]` (update_mode, lambda params), `[logging]` (backend = "wandb"|"mlflow"|"none", experiment_name, mlflow_tracking_uri), `[visualization]` (baseline, input, output -- parsed but not rendered by the package). Top-level keys: `seed`, `device` ("auto"|"cpu"|"cuda"|"mps"), `multi_gpu`. ### Available Drift Detectors -| Detector | Algorithm | Key Params | +The `detector_name` config value must be one of the strings the loader accepts +(`src/apeiron/drift_detection/load_drift_detector.py`): + +| `detector_name` | Algorithm | Key Params | |---|---|---| | `ADWINDetector` | Adaptive windowing (river) | adwin_delta, adwin_minor_threshold, adwin_moderate_threshold | | `KSWINDetector` | KS-test windowing (river) | kswin_alpha, kswin_window_size, kswin_stat_size | | `PageHinkleyDetector` | Page-Hinkley test (river) | ph_min_instances, ph_delta, ph_threshold, ph_alpha | | `ModelPerformanceDetector` | evidently batch analysis | (uses evidently defaults) | -| `ModelEvalDetector` | Direct eval comparison | metric_index | -| `EnsembleDetector` | Multi-detector voting | voting strategy | +| `EvalDetector` | Direct eval comparison (`ModelEvalDetector`) | metric_index | + +Note: `EnsembleDetector` is recognized by the loader but raises `NotImplementedError` (sub-detector configuration is not wired up yet) -- do not use it. ### Available CL Update Modes | Mode | Strategy | Key Params | diff --git a/examples/cifar/cifar10_vgg11.toml b/examples/cifar/cifar10_vgg11.toml index 741878c..394a58b 100644 --- a/examples/cifar/cifar10_vgg11.toml +++ b/examples/cifar/cifar10_vgg11.toml @@ -46,6 +46,4 @@ adwin_minor_threshold = 0.3 adwin_moderate_threshold = 0.6 [visualization] -baseline = 90.0 input = "output/cifar.csv" -output = "output/cifar10_vgg11_dashboard.png" diff --git a/examples/cifar/cifar10_vit.toml b/examples/cifar/cifar10_vit.toml index 54edf69..4d7cc69 100644 --- a/examples/cifar/cifar10_vit.toml +++ b/examples/cifar/cifar10_vit.toml @@ -47,6 +47,4 @@ adwin_minor_threshold = 0.3 adwin_moderate_threshold = 0.6 [visualization] -baseline = 90.0 -input = "output/cifar.csv" -output = "output/cifar10_vit_dashboard.png" \ No newline at end of file +input = "output/cifar.csv" \ No newline at end of file diff --git a/examples/mnist/mnist.toml b/examples/mnist/mnist.toml index ea06a6b..017fbe3 100644 --- a/examples/mnist/mnist.toml +++ b/examples/mnist/mnist.toml @@ -56,6 +56,4 @@ experiment_name = "mnist-continual-learning" # Optional: project/experiment nam # mlflow_tracking_uri = "http://localhost:5000" # Optional: MLflow tracking server [visualization] -baseline = 90.0 input = "output/mnist.csv" -output = "output/mnist_dashboard.png" diff --git a/poetry.lock b/poetry.lock index 1b68275..f89f9e5 100644 --- a/poetry.lock +++ b/poetry.lock @@ -1,4 +1,4 @@ -# This file is automatically @generated by Poetry 2.4.1 and should not be changed by hand. +# This file is automatically @generated by Poetry 2.2.1 and should not be changed by hand. [[package]] name = "aiohappyeyeballs" @@ -6154,4 +6154,4 @@ type = ["pytest-mypy"] [metadata] lock-version = "2.1" python-versions = ">=3.13,<3.14" -content-hash = "324bc17fb7a08126b6063959ece15ea6e7d0677dd78bc7e25eabc2f78562ed3d" +content-hash = "98a4b09d08bfbd332266d02866b7ffdf266b0ec01736902c169cc80cca046fcb" diff --git a/src/apeiron/config/configuration.py b/src/apeiron/config/configuration.py index c86c7ed..df55747 100644 --- a/src/apeiron/config/configuration.py +++ b/src/apeiron/config/configuration.py @@ -156,9 +156,7 @@ class DriftDetectionCfg: @dataclass(frozen=True) class VisualizationCfg: - baseline: float = 95.0 # baseline accuracy threshold for drift detection - input: str = "output/cl_only.csv" # input CSV file path - output: str = "output/drift_dashboard.png" # output dashboard image path + input: str = "output/cl_only.csv" # CSV path where run metrics are written @dataclass(frozen=True) diff --git a/tests/test_config.py b/tests/test_config.py index 58dbac0..7c90b0b 100644 --- a/tests/test_config.py +++ b/tests/test_config.py @@ -278,7 +278,7 @@ def test_drift_detection_defaults(self): def test_visualization_cfg(self): viz = VisualizationCfg() - assert viz.baseline == 95.0 + assert viz.input == "output/cl_only.csv" # --------------------------------------------------------------------------- From 260e12a83f7dab77a6e4c1a8aed15ad993f74015 Mon Sep 17 00:00:00 2001 From: Steffen Date: Thu, 4 Jun 2026 14:41:38 -0400 Subject: [PATCH 3/3] fix readme --- README.md | 45 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 39 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index a2faef3..0484579 100644 --- a/README.md +++ b/README.md @@ -104,6 +104,43 @@ poetry run python -m src.main \ --set train.max_iter=200 ``` +## Agent Skills (Claude Code & Codex) + +This repo ships task-oriented **agent skills** that walk an AI coding agent +through the common Apeiron workflows. The same four skills are maintained for +both tools: + +- **Claude Code** — `.claude/skills//SKILL.md` +- **Codex** — `.codex/skills//SKILL.md` + +| Skill | What it does | +|---|---| +| `install-apeiron` | Add Apeiron as a dependency to **another** project (path/git), verify `import apeiron`, pick CPU vs CUDA PyTorch. | +| `explore-examples` | Run a bundled example (MNIST/CIFAR) to see drift detection + CL in action; picks a config and reports the metrics CSV. | +| `custom-experiment` | Scaffold a harness, data utils, and TOML for **your own** dataset/model, register it in the example factory, smoke-test, and run. | +| `integrate-apeiron` | Add Apeiron's drift detection / CL to an **existing** training loop; inspects your repo and writes the lightest adapter that fits. | + +### Using them + +**Claude Code** — the skills are exposed as slash commands. Type `/` and the +skill name, e.g.: + +``` +/explore-examples +/install-apeiron ../my-project +``` + +You can also just describe the task in plain language ("add apeiron to my +training loop") and the matching skill triggers from its description. + +**Codex** — the equivalent skills live under `.codex/skills/`. Invoke a skill by +name or describe the task; Codex selects the skill whose description matches your +request. The skills are tool-agnostic in intent — only the file format differs +between the two trees. + +> Keep the two trees in sync: a change to a workflow should be reflected in both +> `.claude/skills//SKILL.md` and `.codex/skills//SKILL.md`. + ## Documentation Detailed docs are in `docs/`: @@ -121,13 +158,8 @@ poetry run ruff check . poetry run mypy . ``` -## Deployment - -Platform-specific deployment guides: - -- [NERSC Perlmutter](./src/apeiron/deployment/perlmutter/README.md) -## What `main.py` Does +### What `main.py` Does - Builds the `DummyCNN_MNIST` model defined in `src/model/DummyCNN_MNIST.py`, a cross-entropy loss, and an Adam optimizer. - Loads the MNIST training split, stacks the tensors, and iterates over 10 tasks (digits 0–9). Each task applies random rotation and translation to encourage continual adaptation. - Maintains replay buffers (`memory_image`, `memory_label`, etc.) so past samples remain available for rehearsal while training new tasks. @@ -148,3 +180,4 @@ Training logs report the task id, training/test accuracy, and replay-memory accu Platform-specific deployment guides: - [OLCF Frontier](./src/apeiron/deployment/frontier/README.md) +- [NERSC Perlmutter](./src/apeiron/deployment/perlmutter/README.md) \ No newline at end of file