Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions .claude/skills/custom-experiment/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
name: custom-experiment
description: |
Run an apeiron experiment on the user's OWN dataset and model end-to-end.
Use when the user wants to bring their own data + architecture (beyond the
shipped MNIST/CIFAR examples), scaffold a custom model harness, write a config
for it, smoke-test it, and run the full experiment. Self-contained: it creates
the harness, data utilities, and TOML, registers them in the example factory,
and runs. For trying the bundled examples instead, use explore-examples; for
adding apeiron to a separate project's training loop, use integrate-apeiron.
argument-hint: "<short_name> [config_output_path]"
user-invocable: true
allowed-tools:
- Bash
- Read
- Write
- Edit
- Glob
- Grep
---

Scaffold and run an apeiron experiment on the user's own data and model.

## Arguments
- `$1`: Short name for the dataset/harness (lowercase, e.g. `fashionmnist`, `mytabular`). Used for the `examples/$1/` dir and the `data.name` factory key.
- `$2`: Optional output path for the TOML config. Defaults to `examples/$1/$1.toml`.

## Procedure

### 1. Gather the specifics from the user
Ask only for what isn't already provided:
- Dataset source and how to load it (torchvision, HuggingFace, local files, custom `Dataset`).
- Model architecture (CNN, MLP, ViT, …), input shape, number of classes/outputs.
- Type of drift to simulate on the stream (e.g. affine transforms for images, feature noise for tabular). apeiron's examples simulate drift inside `update_data_stream()`.
- Pretrained weights? Path if so (optional — harness should tolerate their absence).
- Which drift detector and CL updater to start with (default `ADWINDetector` + `base`).

### 2. Read the current patterns (don't hardcode signatures — they rot)
Mirror the live source rather than assuming method names:
```bash
cat src/apeiron/model/torch_model_harness.py # the ABC + abstract methods to implement
cat examples/mnist/model.py # canonical harness
cat examples/mnist/utils.py # data-loading + drift-sim pattern
cat examples/utils.py # get_example() factory to extend
grep -nA6 "class .*Cfg" src/apeiron/config/configuration.py # config fields
```
Implement exactly the `@abstractmethod`s the ABC declares (currently includes `get_optmizer` — note that spelling — `update_data_stream`, `get_stream_dataloader`, `get_hist_dataloaders`, `get_train_dataloaders`, `get_criterion`). Set `self.eval_metrics` with at least an `accuracy` entry from `apeiron.evaluation.metrics`.

### 3. Scaffold the files
- `examples/$1/__init__.py` — empty.
- `examples/$1/model.py` — `BaseModelHarness` subclass calling `super().__init__(cfg=cfg, model=<nn.Module>)`, implementing every abstract method, applying cumulative drift in `update_data_stream()`, and returning `(None, None)` from `get_hist_dataloaders()` on the first task.
- `examples/$1/utils.py` — dataset loaders, a deterministic drift transform, a `TransformedView` wrapper, and a `make_loader(...)` factory (follow the MNIST utils structure).
- Config at `$2` (default `examples/$1/$1.toml`) with `[model]`, `[data]` (`name = "$1"`), `[train]`, `[drift_detection]`, optional `[continual_learning]`, `[visualization]`. Read an existing config for the exact key set.

### 4. Register in the factory (in-repo)
Add a branch to `get_example()` in `examples/utils.py`:
```python
elif cfg.data.name == "$1":
from examples.$1.model import <HarnessClass>
return <HarnessClass>(cfg=cfg)
```

### 5. Validate
```bash
python -c "import tomllib; tomllib.load(open('$2','rb')); print('TOML OK')"
poetry run python -c "from examples.utils import get_example; print('factory OK')"
```
If `pretrained_path` is set, confirm the file exists; warn if missing (run will train from scratch).

### 6. Smoke-test before the full run
Run a tiny, fast pass to catch wiring errors cheaply, then **confirm with the user** before the real run:
```bash
poetry run python -m src.main --config $2 \
--set train.max_iter=2 \
--set drift_detection.max_stream_updates=2 \
--set drift_detection.detection_interval=1 \
--set device=cpu \
--set logging.backend=none
```
If it fails, read the traceback, fix the harness/config, and re-run the smoke test. Do not proceed until it completes cleanly.

### 7. Full run and report
```bash
poetry run python -m src.main --config $2
```
Report drift events, final accuracy, and the output CSV path (the config's `visualization.input`). Note the package emits this CSV for inspection; it does not ship a built-in dashboard renderer.

## Notes
- Registration here uses the in-repo factory pattern. To instead drive apeiron from your *own* project without editing this repo, use the integrate-apeiron skill.
- The repo also has older `new-harness` / `new-config` skills covering pieces of this; they are stale (pre-`src/apeiron/` layout) and slated for refresh — prefer this skill.
83 changes: 0 additions & 83 deletions .claude/skills/debug-experiment/SKILL.md

This file was deleted.

63 changes: 0 additions & 63 deletions .claude/skills/explain/SKILL.md

This file was deleted.

64 changes: 64 additions & 0 deletions .claude/skills/explore-examples/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
name: explore-examples
description: |
Run a bundled apeiron example experiment to explore the framework's
capabilities. Use when the user wants to try the software, run a default/demo
experiment, see drift detection and continual learning in action, or pick from
the shipped MNIST/CIFAR configs. Presents a menu of available example configs,
runs the chosen one, and reports where the metrics CSV landed. For running the
user's OWN data/model/config, use the custom-experiment skill instead.
argument-hint: "[config_path]"
user-invocable: true
allowed-tools:
- Bash
- Read
- Glob
- Grep
---

Run a bundled apeiron example so the user can see the framework working end-to-end.

## Arguments
- `$1`: Optional path to a specific bundled config. If given, skip the menu and run it directly (still apply steps 3–5). If omitted, present the menu (step 1).

## Procedure

### 1. Build the menu dynamically (do not hardcode the list — it rots)
Discover the shipped configs and summarize each from its own contents:
```bash
find examples -name "*.toml" -type f | sort
```
For each config, read the key fields to describe it (`data.name`, `model.name`, `drift_detection.detector_name`, `continual_learning.update_mode`). Present a numbered menu like:
`1) examples/mnist/mnist.toml — MNIST, ADWIN detector, base updater`
Then ask the user which to run.

### 2. Default to MNIST; flag missing pretrained weights for others
- **MNIST is the guaranteed hands-off path** — `examples/mnist/mnist.pth` ships with the repo. Recommend it for a first run.
- For any non-MNIST choice (e.g. CIFAR), check the config's `pretrained_path` before running:
```bash
ls -la <pretrained_path> 2>/dev/null || echo "MISSING"
```
If the weight file is missing, tell the user plainly: this example needs weights that don't ship with the repo, so the run will train from scratch (slow) or fail to load. Let them decide whether to continue or switch to MNIST.

### 3. Ask which metrics-logging backend to use (per run)
The config default is `wandb`. Before running, ask the user to choose, and pass it as an override so no edits are needed:
- **none** — `--set logging.backend=none` (no account/network; best for a quick local look)
- **wandb** — `--set logging.backend=wandb` (run `wandb login` first if not authenticated)
- **mlflow** — `--set logging.backend=mlflow` (local tracking by default)

### 4. Show the config and run it
- Briefly summarize the chosen config (dataset, model, detector, updater, device, batch size) so the user can confirm.
- Run from the project root:
```bash
poetry run python -m src.main --config <config_path> --set logging.backend=<choice>
```
- This is a real training/monitoring run and may take a while. Stream output; do not silently background it.

### 5. Report results
- Summarize from the run output: whether drift was detected and how many times, final accuracy, and the output CSV path (the config's `visualization.input`).
- The package emits this CSV for inspection; it does not ship a built-in dashboard renderer, so point the user at the CSV for further plotting.

## Notes
- Quick first run, copy-paste safe: `poetry run python -m src.main --config examples/mnist/mnist.toml --set logging.backend=none`
- Useful overrides to demonstrate capabilities: `--set drift_detection.detector_name=PageHinkleyDetector`, `--set continual_learning.update_mode=ewc_online`, `--set device=cpu`.
- If `poetry` isn't set up yet, point the user at the install/dev-setup step first.
Loading
Loading